Selecting a wildcard of virtual columns from a flex table
robann
✭
Suppose I have imported Avro/JSON data into a flex table and the data is flattened. There are many 'keys' in the raw column that are available as virtual columns. As a convoluted example lets assume some of the keys are:
person.salary
person.name.firstname
person.name.lastname
person.name.nickname
person.name.title
person.favourite.colour
What I would like to do is to select only and all of the data with keys like person.name.*
i.e. return data like:
person.name.firstname | person.name.lastname | person.name.nickname | person.name.title |
---|---|---|---|
bob | smith | bigguy | mr |
jane | jones | ace | mrs |
I know I can do this by explicitly listing all the virtual columns, but let's assume I don't know all of the keys that could have been added under 'name'.
Is there any way to execute this and return this subset of the data? I'm hoping to wildcard/like the key name.
Thanks,
Robert.
Tagged:
0
Comments
Consider the following flex data:
d2=> select maptostring(raw) from rest;
maptostring
{
"restaurant.name": "Bob's pizzeria",
"restaurant.cuisine": "Italian",
"restaurant.location.city": "Cambridge",
"restaurant.location.zip": "02140",
"restaurant.menu": {
"0.item": "cheese pizza",
"0.price": "$8.25",
"1.item": "chicken pizza",
"1.price": "$11.99",
"2.item": "spinach pizza",
"2.price": "$10.50"
}
}
{
"restaurant.name": "Bill's pizzeria",
"restaurant.cuisine": "Italian",
"restaurant.location.city": "Somerville",
"restaurant.location.zip": "02141",
"restaurant.menu": {
"0.item": "cheese pizza",
"0.price": "$8.25",
"1.item": "chicken pizza",
"1.price": "$11.99",
"2.item": "spinach pizza",
"2.price": "$10.50"
}
}
{
"restaurant.name": "Bart's pizzeria",
"restaurant.cuisine": "Italian",
"restaurant.location.city": "Newton",
"restaurant.location.zip": "02142",
"restaurant.menu": {
"0.item": "cheese pizza",
"0.price": "$8.25",
"1.item": "chicken pizza",
"1.price": "$11.99",
"2.item": "spinach pizza",
"2.price": "$10.50"
}
}
(3 rows)
I can use MAPITEMS to extract the flattened restaurant locations with a wildcard as follows:
d2=> select * from (select mapitems(raw) over () from rest) r1 where keys::varchar like 'restaurant.location%';
keys | values
--------------------------+------------
restaurant.location.city | Cambridge
restaurant.location.zip | 02140
restaurant.location.city | Somerville
restaurant.location.zip | 02141
restaurant.location.city | Newton
restaurant.location.zip | 02142
(6 rows)
The problem is that every key and value is on their own row. In your example, which of the zip codes is the one for Barts? For the data to be useful you need all the data items on an individual row. i.e.
| Bart's pizzeria | Newton | 02142 |
| Bill's pizzeria | Somerville | 02141 |
Is there anyway to combine the mapitems data with the row data to collapse into an individual row? Without having to know ( and hard-code) all the virtual column names?
Thanks,
Robert.
You can group the results and IMPLODE into a single row, or use array functions to collapse the list to an array.
Thanks for the help, I think I'll have to change the way the application retrieves the data but I should be able to make this work.
Thanks,
Robert