Most SQL dialects don't support robust table-generating functions. There are a few common use cases for table-generating functions, such as creating a sequence of numbers or dates, un-nesting a JSON object or array into rows, or splitting a string on a delimiter into rows.
In this article, we'll show an example for the last use case: Taking a string containing n distinct comma-separated values and splitting it into n unique rows.
Suppose we have a table of products containing
|123||Scrunch Cloth Pants Set||Pants, Throwback, 80s|
|1287||Le Suit Citrus Breeze Skirt Suit||Skirts, Dresses, Business Casual|
|486||MaxStudio Plaid Romper||Jumpsuits, Casual, Rompers|
Our goal is to un-nest the tags to create a table to map products to tags. We will assume that a table of numbers already exists in the database, though this can be created using this pattern.
Two functions are useful here:
regexp_countfunction, which tells us how many times a particular pattern is found in our string.
split_partfunction, which takes a string, splits it on some delimiter, and returns the first, second, ... , nth value specified from the split string.
NOTE: These functions are not available in all dialects.
Here's what we can do:
regexp_countto determine how many instances of our delimiter (", ") are found for each string.
- Add one to that value, telling us how many values are separated by our delimiter.
- Join in our numbers table on an inequality so that we fan out our tagged products n times, where n is the value returned from
regexp_countfor each product.
- Use the
split_partfunction and our sequence of numbers to get the 1st, 2nd, nth tags for each product.
Here's an example using Amazon Redshift:
SELECT row_number() OVER(order by 1) AS product_tag_id , products.id as product_id , split_part(products.tags, ', ', numbers.num) AS tag_name FROM products JOIN numbers ON numbers.num <= regexp_count(products.tags, ',\\s') + 1
For MySQL, you could use
SUBSTRING_INDEX instead of
SELECT DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(vals, ',', n.digit+1), ',', -1) val FROM (select "Skirts, Dresses, Business Casual" as vals) tt1 INNER JOIN (SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) n ON LENGTH(REPLACE(vals, ',' , '')) <= LENGTH(vals)-n.digit;
That's it! Super easy!