export to parquet permission denied

Hi guys,

I mounted an azure blob storage on my local linux, and I'm trying to export Vertica data into azure blob storage in parquet format, but I'm getting the error of permission denied.

Fusion=>
Fusion=> export to parquet (directory='/mnt/xxxx/DIM_DATA/ITEM/xxxx') as select * from common.dim_product;
ERROR 5861: Error calling setup() in User Function ParquetExportFinalize at [src/ParquetExportFinalize.cpp:73], error code: 0, message: ERROR: Unable to make directory [/mnt/xxxx/DIM_DATA/ITEM/xxxxWvGZn4tW/] due to 'Error creating directory [/mnt/xxxx/DIM_DATA/ITEM/xxxxWvGZn4tW/] Permission denied'
Fusion=>

Actually I give 777 to /mnt/xxxx folder, permission should not be an issue, I use Python pandas to export parquet file into the directory using the same linux user without any issue, I'm not sure what happened using Vertica export to parquet command.

Thanks,
Hong

Comments

  • kguankguan Employee

    Could you please check the permission of your export parquet directory (/mnt/xxxx/DIM_DATA/ITEM/) ?

    I can successfully export to the directory with permission 777, but got the same error as yours to the directory with permission 775.

    $ ls -lt /export_parquet/
    drwxrwxrwx 3 root root 4096 Mar 25 13:43 test777
    drwxrwxr-x 2 root root 4096 Mar 25 13:27 test775

    => export to parquet (directory='/export_parquet/test777/t_o') as select * from t_o;

    Rows Exported

            25
    

    (1 row)

    => export to parquet (directory='/export_parquet/test775/t_o') as select * from t_o;
    ERROR 5861: Error calling setup() in User Function ParquetExportFinalize at [src/ParquetExportFinalize.cpp:85], error code: 0, message: ERROR: Unable to make directory [/export_parquet/test775/t_oMCOQNO29/] due to 'Error creating directory [/export_parquet/test775/t_oMCOQNO29/] Permission denied'

  • Thanks Kguan for the reply! as I mentioned, the directories no matter parent or child directories are all 777, write access should not be an issue.

    I'm even not able to dump to any location using export_to_parquet, and BTW I didn't use sudo to run export command, but use a personal account, does it matter? I suppose we should use personal account only, error messages are all same, no permission to create directory.

    Python code

    import pyodbc
    import pandas as pd

    conn = pyodbc.connect("DRIVER=HPVerticaDriver;SERVER=AAA;DATABASE=AAA;PORT=5433;UID=AAA;PWD=AAA")

    def dump_via_exp():
    #sql = "EXPORT TO PARQUET(directory = '/mnt/xxxx/DIM_DATA/ITEM/DIM_PRODUCT') AS SELECT * FROM COMMON.DIM_PRODUCT"
    sql = "EXPORT TO PARQUET(directory = '/home/xxxx/python/data/DIM_PRODUCT') AS SELECT * FROM COMMON.DIM_PRODUCT"
    with conn.cursor() as c:
    c.execute(sql)

    if name=='main':
    dump_via_exp()

    Thanks,
    Hong

  • kguankguan Employee

    Hi honghu,
    I tried your Python script on Vertica v9.2.0-6 but could not reproduce the issue you described. I'm also using a normal user (without sudo), and export to a directory which is owned by root and with permission 777.

    What Vertica version are you using for your project?

    Also, could you please try to export to a Linux NFS mount directory? I'm not quite familiar with azure blob storage, and not sure whether that could be the cause or not.

  • Hi Kguan,

    We probably find the root cause, it seems the EXPORT TO PARQUET command is executed on VERTICA node? Because once we specified output directory as /home/dbadmin (which doesn't exist on local linux), it could run successfully, but I'm not able to get the exported file as it was not exported to local directory.

    Is there a way to export parquet file to local directory? We don't work on VERTICA server.

  • kguankguan Employee

    Hi Hong, Yes, that could the be cause. Vetica currently doesn't support the local directory on client machine. The export parquet destination can be in HDFS, S3, or an NFS mount point on the 'local file system' of the Vertica server.

  • Ok, then it may not be an option for us... Thanks Kguan for answering my questions.

  • AlimAlim Vertica Employee

    Hi ,
    Not able to copy data from vertica tables to hadoop directory , please check below error .

    dbadmin=> export to parquet (directory = 'hdfs://:9000/test_data/new', fileMode='432', dirMode='rwxrw-r-x') AS SELECT * FROM hadoop.t2;
    ERROR 5861: Error calling processPartition() in User Function ParquetExport at [src/ParquetExport.cpp:956], error code: 0, message: ERROR: E
    rror opening file [hdfs://:9000/test_data/newyhCFhPa3/ee43df3a-v_asa_dev_node0003-139841805215488-0.parquet] for write: Curl Error: Couldn't resol
    ve host name
    OS Error: Success
    Error Details: Could not resolve host: master-node

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @Alim - Make sure you've configured the hdfs Scheme correctly.

    See:
    https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/HadoopIntegrationGuide/libhdfs/ConfiguringAccessToHDFS.htm

    Looks like Vertica is having an issue resolving the host name master-node.

    Also, check your HDFS URL Format.

    See:
    https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/HadoopIntegrationGuide/libhdfs/HdfsURL.htm

  • AlimAlim Vertica Employee
    edited February 2020

    Hi Jim,
    Thanks for the update,
    After restarting the Hadoop master node it got resolved . Just one more query when i'm going to read .parquet file it's not in readable format so is there any command or way to read the .parquet file which stored in hadoop directory .

    Thanks ,
    Alim Shaikh

  • @Alim you can use open-source parquet cli tools to read a parquet file.
    https://mvnrepository.com/artifact/org.apache.parquet/parquet-tools/1.9.0

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file