Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Which data is exported with "EXPORT TO PARQUET" ? How to get the real size ?

Hello
when we use the EXPORT command to Parquet, which data is really downloaded ? The one of the super projection ? We try to figure out the size of an archive, and as it is difficult to get the file size by itself, we try to compute it from the ROS size with this kind of query (we work on partitions)
select p.table_schema,
pr.anchor_table_name,
p.partition_key,
ROUND(SUM(ros_size_bytes)/(1024^2)) as space_Mb
from partitions P,
projections PR
where p.table_schema = pr.projection_schema
and p.projection_name = pr.projection_name
group by p.table_schema, pr.anchor_table_name, p.partition_key;
But I am afraid we take too much data here, when a table has several projections ?
How can we estimate the exported size (on top of number of rows) ?
Thanks

Best Answer

Answers

  • swalkausswalkaus Employee
    edited April 2020

    EXPORT TO PARQUET exports the results of the SQL query provided. For example, the following would export all of the rows from the "sales" table.

     EXPORT TO PARQUET(directory = 'hdfs:///sales')  AS SELECT * FROM sales;
    

    You can add a predicate to export a subset of the data, for example, only export sales from today.

     EXPORT TO PARQUET(directory = 'hdfs:///sales/daily')  AS SELECT * FROM sales WHERE sale_date = now()::DATE;
    

    Of course, the SQL clause can be much more complex. You could join to other tables, perform aggregations, etc. Here's a slightly more complex example:

    EXPORT TO PARQUET(directory = 'hdfs:///sales/region/summary')  AS
      SELECT c.region, s.sale_date::date, sum(s.price)
         FROM sales s JOIN customers c ON s.customer_id = c.id
      GROUP BY 1, 2;
    
  • My question was about the size of data exported. From the DB information (ROS size) can we estimate the file size that will be created ?
    May be this is not possible and we can only measure the files size once created....
    Having the files created on several nodes add complexity .

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.