Few SQL dialects support robust table-generating functions. Some common use cases for table-generating functions are creating a sequence of numbers or dates, un-nesting a JSON object or array into rows, or splitting a string on a delimiter into rows.
In this brief article, we will demonstrate the last example: how to take a string containing n distinct comma-separated values and split it into n unique rows.
Suppose we have a table of products containing
|123||Scrunch Cloth Pants Set||Pants, Throwback, 80s|
|1287||Le Suit Citrus Breeze Skirt Suit||Skirts, Dresses, Business Casual|
|486||MaxStudio Plaid Romper||Jumpsuits, Casual, Rompers|
Our goal is to un-nest the tags to create a product-tag mapping table. I am assuming a table of numbers already exists in the database (though this can be created using this pattern).
Please note that the following functions are not available in all dialects.
Two functions are useful here:
split_partfunction, which takes a string, splits it on some delimiter, and returns the first, second, … , _n_th value specified from the split string.
regexp_count, which tells us how many times a particular pattern is found in our string.
Our strategy is the following:
- Determine how many instances of our delimiter (", ") are found for each string.
- Add one to that value, telling us how many values are separated by our delimiter.
- Join in our numbers table on an inequality such that we fan out our tagged products n times, where n is the the value returned from 2 above for each product.
- Use the
split_partfunction and our sequence of numbers to get the 1st, 2nd, _n_th tags for each product.
select row_number() over(order by 1) as product_tag_id , products.id as product_id , split_part(products.tags, ', ', numbers.num) as tag_name from products join numbers on numbers.num <= regexp_count(products.tags, ',\\s') + 1
That's it! Super easy!