Is it possible to update a Flex Table view?

Nicholas_Chamma · December 2013

Say I load some JSON data into a table and create a Flex Table view using COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW(). Later, I load additional JSON data into the same table. I'd like to update the Flex Table view to reflect the additional columns or changed data types introduced by the new JSON data. How would I do that?

[Deleted User] · December 2013

The simplest way is to call COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW() again.

The Flex Table Data Functions (p. 71 on the linked doc) may be called repeatedly. Each time, they will perform their updates on the latest dataset to produce a view reflecting the contents of the table at the time they are run without knowledge that they had previously been run.

Thanks,
James

Nicholas_Chamma_1 · December 2013

Excellent. That's great.

Nicholas_Chamma_1 · December 2013

James,

Your comment that the function call is not aware of previous calls is interesting. Does that mean Vertica will recompute keys and datatypes for all the data each time the function is called, even though only part of the data in the table is new? That could be very costly as the source Flex Table grows.

Is there a way to incrementally update the Flex Table view definition as new batches of "dark" data are loaded into the same table? Basically, we want to avoid having to recompute keys and datatypes for data we have already done that for.

This brings up a second question. Say I have a Flex Table and a matching view. If I load new data into the Flex table that does not match the view definition and I don't rerun the COMPUTE_... function, what happens? Do I get errors if I try to query the view at all?

Nick

[Deleted User] · December 2013

Glad to help. There are a few things to respond to here:

(1st paragraph) This is correct.

(2nd paragraph) In this initial FlexTable release, there is no fully automated way to do so, but we've already put thought into supporting this directly. We're very interested in hearing any use cases you have for this and/or preferences for how you think it should look.
Potentially useful for this issue is that we allow you to manually update the keys table. A custom query could be used to update the keys, for example by looking at which rows in your Flex Table are from a more recent epoch than the latest epoch in your keys table.

(3rd paragraph) Flexible Tables assume variable schema and will not give errors for requesting data which does not exist. Queries will include both values associated with columns which exist in any rows being returned and NULLs for rows which do not contain the column.

James

Nicholas_Chamma_1 · December 2013

Say we write a custom query to update the flex keys for given table after a new batch of data is loaded in. Wouldn't such a query have to determine the keys and datatypes for that batch of data, essentially duplicating the functionality of COMPUTE_FLEXTABLE_KEYS()?

We are considering using Flex Tables to allow our analysts to query JSON data via SQL without us having to transform it into a tabular format first, which is what we currently do. A given set of JSON data will go into a given table, and over time its schema will evolve. If we have to recompute keys and types for the whole table every time a new batch is loaded, then I assume that steps will take increasingly long to complete as the table grows. It will probably also make it impossible for us to trickle load a Flex Table and have its keys and data types always be up-to-date.

[Deleted User] · December 2013

The automated tools currently have the behavior you describe in your second paragraph--they consider the full contents of a table when making their choices and will take an amount of time proportional to the amount of data they are operating on. Incremental changes to the computed view are technically feasible, but not available at this time. We'd love to hear what you think that functionality should look like, keeping in mind that it will likely be infeasible for most datasets and use cases to mix DML and DDL and regenerate the keyset on every load.

There are many options available to solve this problem incrementally, and the best choice is very dependent on your data's attributes and lifecycle. In the meantime, there is no single "best practice".
Of note is that we allow you to make any changes you want to the keys table, and have those changes reflected in the view when you next call BUILD_FLEXTABLE_VIEW(). This means we fully support computing incremental keysets then merging the new keyset into an existing keys table.
If you'd like to see the query COMPUTE_FLEXTABLE_KEYS() and COMPUTE_FLEXTABLE_KEYS_AND_BUILD_VIEW() use to compute the keys, you can look at your vertica.log for the query after running either function. Assuming you're working with a Flexible Table called "foo", the keys query will start with: "DELETE FROM public.foo_keys;". The interesting part of the query is:

SELECT keys, COUNT(*) as frequency, max(length) as length, max(type_oid) as type_oid FROM ( SELECT mapkeysInfo(__raw__) OVER() FROM public.foo ) all_keyinfo GROUP BY keys;'

If you'd like to filter that to only include rows made since a given load, check the epoch that the rows were loaded on, then filter by that, ala:

SELECT epoch, keys, COUNT(*) as frequency, max(length) as length, max(type_oid) as type_oid FROM ( SELECT epoch, mapkeysInfo(__raw__) OVER(PARTITION BY epoch) FROM public.foo ) all_keyinfo WHERE epoch > {EpochWhereYouLastComputedKeys} GROUP BY keys, epoch;

James

Nicholas_Chamma · December 2013

Alright, this approach and these queries look promising. We will likely implement an incremental load of Flex Tables in the next couple of months using the suggestions you have provided here.

We'd love to hear what you think that functionality should look like, keeping in mind that it will likely be infeasible for most datasets and use cases to mix DML and DDL and regenerate the keyset on every load.

I didn't understand your comment about mixing DML and DDL.

We'll have a good idea of what that functionality should look like when we've had a go at implementing our own. Until then, I imagine something that works as follows would suit our needs:

There's an UPDATE_FLEXTABLE_KEYS() function that takes as input the target table and some kind of filter. The filter is used to limit the data used to compute keys and data types. In our case, we'd use the filter to get the data most recently loaded into the table. The function would merge the newly computed keys and data types into the existing ones for that table.

This functionality fits the following usage pattern: You have a Flex Table that grows on a regular basis. You never delete from the table. The schema of the data loaded into the table slowly evolves with time. You need a quick way to keep the Flex View up-to-date with the data in the base table as it grows.

[Deleted User] · December 2013

Glad to help, and thank you very much for the detail in your UPDATE_FLEXTABLE_KEYS() ask. We have considered a more basic version which would simply take a flag as to whether all keys should be updated vs. only looking at new rows added since the keys were last updated. For your version, I was wondering if you could provide one or two concrete examples of the form you expect the filter argument to be in.

Regarding my DML vs. DDL comment: One of the larger design challenges with Flexible Tables is reconciling that since each row in a table might have its own unique schema, load operations (DML) can cause apparent table structure changes (DDL). In the case of a theoretical feature such as automatic maintenance of the Flex Table VIEW, the line is further blurred as loading data into the Flexible Table could cause new columns to appear in the VIEW.

Thanks!
James

Nicholas_Chamma · December 2013

We have considered a more basic version which would simply take a flag as to whether all keys should be updated vs. only looking at new rows added since the keys were last updated.

That is simpler than what I proposed and fully covers the usage pattern I detailed. I would rather use a function designed like that.

I cannot think of a common usage pattern where it would be preferable to provide a filter over a simple flag as you described.

[Deleted User] · December 2013

Glad to hear, and thank you for the feedback. This is very useful for us going forward.
James

Is it possible to update a Flex Table view?

Comments

Leave a Comment