Options

Get SQLFeatureNotSupportedException: [Vertica][JDBC](10220) when using Sqoop?

Folks,

I am trying to import a single oolumn from Vertica (v8.1.0-2) into HDFS or Hive using Sqoop but unable. I get the following exception:

Caused by: java.sql.SQLFeatureNotSupportedException: [Vertica]JDBC Driver not capable.
at com.vertica.exceptions.ExceptionConverter.toSQLException(Unknown Source)
at com.vertica.jdbc.common.SForwardResultSet.getBlob(Unknown Source)
at org.apache.sqoop.lib.LargeObjectLoader.readBlobRef(LargeObjectLoader.java:238)

The column consists of a json object:

Data_type Type_name Column_size Buffer_length
-4 Long Varbinary 130000 130000

Questions:
1. Wonder if vertica-jdbc-8.1.0-3.jar support getBlob()? It does when I disasembled the class.

javap SForwardResultSet.class | grep -i getblob
public java.sql.Blob getBlob(int) throws java.sql.SQLException;
public java.sql.Blob getBlob(java.lang.String) throws java.sql.SQLException;

  1. What is the parameter passed to the getBlob(). I looked at the actual java code generated by Sqoop.

public void loadLargeObjects(LargeObjectLoader loader)
throws SQLException, IOException, InterruptedException {
this.___raw
= __loader.readBlobRef(1, this.__cur_result_set);
}

where __cur_result_set is a (non-string) ResultSet.

  1. what could be a work around to import the column?

Other things I have tried:

  • able to "select " on the Vertica table using Sqoop eval with no problems
  • able to import another Vertica table in HDFS using Sqoop
  • able to view the column with DBVisualizer with the Vertica JDBC drivers

Thanking in advance

Cross posted on Cloudera forms, Stack Exchange and Vertica Forums

Comments

  • Options
    twalltwall Vertica Employee Employee

    Hi dr_rizz

    Vertica's JDBC does not implement the BLOB/CLOB interfaces, hence the SQLFeatureNotSupportedException. The parameter passed to getBlob is the name of a binary column in the result set; it isn't very interesting to the problem at hand.

    These API's are optional in JDBC. They are designed to work with very large values, making it easier to seek and read sections of the value without having to send the entire value back to the client.

    Vertica supports long varchar/varbinary up to 32MB, but it does not have the underlying protocol to support arbitrary seeks, read and writes of of a partial value that are required in order to implement the JDBC BLOB/CLOB API. If Vertica ever changes to support large (i.e. gigabyte-sized) BLOB/CLOB objects, then this API may be implemented, but until then, the best way to read a binary value is using one of the two other API's for binary data.

    If you have a long varbinary column, you can use ResultSet.getBinaryStream() or ResultSet.getBytes() to read the value. getBinaryStream() is probably the ideal way for long varbinary because it does not require having the whole value in memory at once. BLOB could do this too, but unlike BLOB, the value in its entirety is always streamed to the client

    I am not very familiar with Sqoop, but hopefully there is a way to retrieve the data using an one of these API's instead. In other tools, there is often some size threshold where it switches to using the BLOB API, and if so perhaps it is configurable.

    Another possibility may be to customize the behavior via a small sqoop plugin.
    I did some quick digging into the code around the stack you provided, and at a glance, it looks like the line below is how sqoop decided to generate a class that uses the unsupported BLOB API:
    https://github.com/infinidb/sqoop/blob/master/src/java/org/apache/sqoop/manager/ConnManager.java#L139

    It would of course require a fair bit of effort and testing, but if that is the line to change, you should be able to wire together a simple plugin that customizes that behavior to follow the VARBINARY path instead.

    I can't promise this is the exact solution, but I think something like this is the best path forward. It may even make sense to submit a bug ticket to sqoop do this by default instead of using blobs. If sqoop always intends to read the whole value (seems likely since sqoop is used to move data wholesale between db's and hadoop), the BLOB API seems like it is not the best choice for LONG VARBINARY, because not all databases support this API, and those that do probably pay a cost to support the full BLOB API semantics when only a simple, forward-only read of the value would suffice.

    Tom

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file