The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!
How can we uniquely identify each record in Vertica database in absence of primary key?

Answers
Vertica, in this respect, is a real relational database: If there is no declared identifier, there is no identifier. And it is columnar.
Row based DBMSs have a way to physically identify the location of a certain row. That's where, for example, the ROWID of Oracle comes from.
Vertica being columnar, you can not even locate the actual position of the first column in the sort order:
If, for example, you have
gender
as the first column in the sort order, and the column is encoded as Run-Length-Encoding (RLE), then, you have, in that file that contains the column, for example, the value 'F', an integer of 502, the value 'M' and an integer of 498 for a table containing 1000 rows.You could calculate the hash of all columns (with a small risk of hash collisions), but if you have two rows like this:
There is now way of discerning one row from the other.
Even if you apply a
ROW_NUMBER() OVER(PARTITION BY <all_columns_of_the_table> )
- which would lead one of the above rows to get a 1 and the other to get a 2- there will be no way of determining which was assigned to which row.These are the two, not completely satisfactory, ways of working around this behaviour. Which is not a problem, but a behavioural feature.
The very easy way if you can't fix your model, is to perform a SELECT DISTINCT, but you'll pay the fee each time you run the query.
The better thing to do is to recreate your table with a PK / UK and insert select distinct from your current table, you pay only once.
I don't think there is. But if you run this type of query often, then create a projection with the column in question as the first in the sort order; encode this column RLE (run-length encoding) in that projection. Then, a
SELECT DISTINCT foo FROM bar;
will run as fast as a query of previously accumulated histograms.