Issue with ORDER BY on a big sized projection

kunalb · December 2016

Hi everyone,

We have a 3 node Vertica 7.x cluster in our production database. We have a table/projection (right now there is just one super projection) which has roughly 400 million rows in it. I am running a query which does an ORDER BY sort on the table but the database does not return any data and gets stuck there in processing.

As expected, we are coming up with an improved query specific projection suggested by Vertica Management Console which by the way has an ORDER BY clause suggested as a part of a new projection. But before deploying it on production and see if that helps run the query or not, I was confused even if the new projection's CREATE statement will work? Given the fact that, the table on which we are creating this new projection doesn't return any data for a SELECT statement having an ORDER BY in it. Also, even the EXPLAIN on the query doesn't return anything. I broke down the query into pieces and it gets stuck during the execution of ORDER BY.

I don't wish to leave the cluster in an inconsistent state or something. It would be very helpful if anyone can throw some light on both issues I have mentioned above (about slow unresponsive ORDERBY on big table and things happening under the hood while deploying new projection with an ORDERBY clause).

My query is like:

select T1.assetName AS name, T1.workspaceId, T1.assetId, T1.parentId, T1.revId, T1.type, T1.size, T1.status, T1.extension, T1.createdWhen, T1.modifiedWhen, T4.emailAddress, T4.userId, T4.groupId, T4.customerId AS realmId, FROM reports.asset_metadata_fact T1

INNER JOIN reports.workspace_dim T4 ON T1.workspaceId=T4.workspaceId

WHERE T4.workspacetype in ('BACKUP','SYNC') AND T1.revId=(SELECT MAX(Y.revId)

FROM reports.asset_metadata_fact as Y WHERE Y.assetId=T1.assetId) AND T4.customerId=? AND T1.status in (?,?) AND T4.groupId in (?) AND T1.assetName ILIKE ? order by T1.assetName asc limit 500 with parameter values [298072911221096448, ACTIVE, DELETED, 298072912311615488, work%]

The table with bigger 400 million rows is asset_metadata_fact. The other table workspace_dim has 30K something entries.

Thanks ,

Kunal

ckotsidimos · December 2016

From my experience, we have tables with some billion rows

The projection will be created through the create statement, but when you will refresh it expect a downtime of many hours in order the projection to be populated. After that the queries will run.

What you can do is to create a new table, create the projection there and copy the data from table 1 to table 2. In this way you will not have to refresh the projection.

We're Moving!

Create My New Community Account Now

Issue with ORDER BY on a big sized projection

Comments

Leave a Comment