Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Vertica Hadoop Connector

What are you doing now that you can move data between HP Vertica and Hadoop? Tell us your experiences, thoughts and best practices!

Comments

  • trying to implement hcatalog connector in HP version 7 and HDP 2.0.
    I can connect to see metadata (webHcat) but I cannot query the actual data (metastore_db) the Hadoop instance is the Hortonworks Sandbox (single node cluster).
    Vertica returns an error when I execute a query:

    SQL Error [3457] [42883]: [Vertica][VJDBC](3457) ERROR: Function VHCatSource() does not exist, or permission is denied for VHCatSource()


    VHCatSource(), what is that and why is it not the same as the source of the table metadata..

  • Hi Ben,

    VHCatSource() is a new function in Vertica 7; it is added to your database as part of our installation process and is used internally by Vertica in order to connect with Hadoop.

    You will get this error if your system administrator has revoked your access to our Hadoop-connectivity-related functions.

    You can also receive this error if your Vertica installation failed (but the database managed to start anyway).  Did the "install_vertica" or "admintools" scripts emit any errors during cluster setup or installation?  Are there any errors in the installer or adminTools logs in "/opt/vertica/log/"?

    A copy of our HCatalog functions is placed into "/opt/vertica/packages/hcat/".  They can be installed using the "ddl/install.sql" script in that directory.  However, if these functions are entirely missing, this is indicative of a larger problem with your Vertica installation; it is likely that you will see other strange errors or apparently-missing functionality.

    Adam
  • Hello, I'm testing integration with HCAT for Vertica 7.0.1 and Hadoop 2.0 / Hive 0.12. I successfully created schema hcat and listed tables but cannot query a table. Library and functions have been created correctly. I've got the following message when doing a select * from table: ERROR 3399: Failure in UDx RPC call InvokePlanUDL(): Error in User Defined Object [VHCatSource], error code: 0 com.vertica.sdk.UdfException: Error: [org.apache.hcatalog.common.HCatException : 2004 : HCatOutputFormat not initialized, setOutput has to be called. Cause : java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: callId, status; Host Details : local host is: “192.168.1.133"; destination host is: "sandbox.hortonworks.com":8020; ] HINT: Vertica HCatalog Hadoop and ProtoBuf versions are not compatible with Hive metatsore service and HDFS service. You may update hadoop-*.jar and proto*.jar archives in packages/hcat/lib to be compatible with Hive and HDFS. Supported versions: Hive 0.10, 0.11, Hadoop 2.0 and protobuf 2.4. How can I configure Vertica environment to support Hive 0.12 ? Which jars need to copy from Hadoop lib to Vertica lib? Thanks in advance, Vivian.
  • no install errors I have a sturdy DEV cluster. I'll try to reinstall using the install.sql. Your solution seems to count on me having a bad install of some kind.

    here is that output:
    CREATE LIBRARY
    vsql:/opt/vertica/packages/hcat/ddl/install.sql:14: ROLLBACK 3472:  Function with same name and number of parameters already exists: VHCatSource
    GRANT PRIVILEGE
    vsql:/opt/vertica/packages/hcat/ddl/install.sql:16: ROLLBACK 3472:  Function with same name and number of parameters already exists: VHCatParser
    GRANT PRIVILEGE

    Any other ideas?

  • Hi Vivian,

    Currently we don't support Hive 0.12 . But the good news is there a workaround to make this work. Some of our customers tried it and it seems to work. 


    Replace all the jars files except for "verticahcatalogudl" in /opt/vertica/packages/hcat/lib with $HIVE_HOME/lib/* & $HCAT_HOME/lib/* . 

    Let us know how it goes.

    Thanks,

    Satish



  • It works, Thank you.
  • Hi,

    I have related question. How the hcat connector works? When I run sql statement on HCatalog table i can see in explain plan that is using external table. Is there any difference between hcat and HDFS connector except this one that I don't need to carry on about data format etc.?

    Does hcat connector retrieve all the data to Vertica external table and then run sql statement on it or translate SQL to Hive commands and run on Hadoop side what is possible?

    Thanks,
    Filip
  • Have any other Vertica customers managed to get the Vertica HCatalog connector working with a Cloudera cluster installation?

    In my experience, the trouble is almost entire on Cloudera's side, as WebHCat (in CDH 4.7 at least) does not work out-of-the-box.

  • When I'm installing the Hadoop connector, I get errors. thought I've verified that all the jars and configuration is present. 

     

    dbadmin=> \i /opt/vertica/packages/hcat/ddl/install.sql
    CREATE LIBRARY
    vsql:/opt/vertica/packages/hcat/ddl/install.sql:16: ROLLBACK 3399: Failure in UDx RPC call InvokeSetExecContext(): Error in User Defined Object [VHCatSource], error code: 0
    Couldn't instantiate class com.vertica.hcatalogudl.HCatalogSplitsNoOpSourceFactory

    vsql:/opt/vertica/packages/hcat/ddl/install.sql:17: ROLLBACK 2059: Source with specified name and parameters does not exist: VHCatSource
    vsql:/opt/vertica/packages/hcat/ddl/install.sql:18: ROLLBACK 3399: Failure in UDx RPC call InvokeSetExecContext(): Error in User Defined Object [VHCatParser], error code: 0
    Couldn't instantiate class com.vertica.hcatalogudl.HCatalogSplitsParserFactory

    vsql:/opt/vertica/packages/hcat/ddl/install.sql:19: ROLLBACK 2059: Parser with specified name and parameters does not exist: VHCatParser
    dbadmin=>
    \q

     

    does anyone has any idea??

  •  Hi.

     

    What are the other reasons of getting the error Function VHCatSource() does not exist, or permission is denied for VHCatSource() ? And what are the ways to solve them?

     

    We are trying to connect our Hadoop and Vertica. We can query the hive metadata and can view the tables by "SELECT * FROM HCATALOG_TABLE_LIST". But when we are trying to query any of the tables, we get that error.

     

    We are using cloudera 5.4.2-1 and Vertica 7.

     

    Thanks in advance.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.