Varchar columns in Vertica 7
We currently have several columns in a Vertica table that are of type VARCHAR. Some of them have been defined as VARCHAR(255). However, the data load to this table (from Sqoop) fails if the number of characters in these columns exceed 255. No data is uploaded even if there is only one record out of a million that exceeds this limit.
Can we increase the limit to 65000 for all the varchar columns i.e. make all varchar columns as VARCHAR(65000). Is there a downside to this approach in terms of memory management or query performance. Will this slow down the queries?
Please clarify. Thank you.
Ravi
Can we increase the limit to 65000 for all the varchar columns i.e. make all varchar columns as VARCHAR(65000). Is there a downside to this approach in terms of memory management or query performance. Will this slow down the queries?
Please clarify. Thank you.
Ravi
0
Comments
I have two table test and test2 with 12060 (repeated values) When seeing the storage used by these two tables i can see the footprint is different : So far i am ok with that !
but what when i query the tables would the cost be different ?
-as shown here the cost seems to be different and the bigger data type value projection seems to consume more. Now let's run a profile and see how much resources each query will use: Conclusion - even when using a varchar data type Vertica will still require more resources if the it's value is bigger then needed.
- this my personal opinion and i pulled out this conclusions form my own tests.
My experience has been that varchar columns that are way bigger than they need to be will have performance implications. While it might sound reasonable to just make them 65000, you might be better off making them only slightly bigger than they currently are - 500, or maybe 1,000. Or, force them to be 255 with some sort of preprocessing script. Seems like that would be the perfect job for a Perl script, perhaps.
For example, if I have table myschema.table1, with a column "widget_description" of type VARCHAR(255), how do I reliably lengthen the column should I need to.
if I do:
ALTER TABLE myschema.table1 ALTER COLUMN widget_description SET DATA TYPE varchar(512)
Sometimes this works.
Other times, I often get an error message to the effect of "column widget_description being used in projection myschema.table1_b1" or something like that.
To lengthen a column, i end up having to create a new table, and then inserting the entire contents of the original table into the new table, then dropping the old table, and then renaming the new table to the original table's name.
Check you object definition : As an example try to add a column and next alter that column data type, no error will be shown.
when I do export_objects, I get "SEGMENTED BY hash(column1, column2, column3, etc..)"
More importantly, how do I lengthen a VARCHAR column without creating an entirely new table and renaming it?
To fix this you need to ask yourself if you really need segmented projections, if your table is big and you need to have in segmented, make sure you create it without that column in the segmentation clause :
Se example :
--create table with a segmentation clause on col1 --alter the column 'col1' "IS IN SEGMENT CLAUSE" --alter the column 'col2' "IS NOT IN SEGMENT CLAUSE" Now if you insist/need on having you column as the segmentation clause then you need to recreate the super projection(table) with your desired data types and vals.
His setting are all default, so that is why he is getting stuck. And varchar column will not be a good fit for segmentation clause, data skew will suffer.(the cluster will be as slow as its slowest node)
If I understand correctly, the answer is to use a high cardinality column, preferably an int, to segment, and to specify it in the CREATE TABLE statement.
take look at this article to see how to control data skew better
--Sharon