Invoke VSQL through java program

Satish_1 · August 2014

I am using below command to extract the data from vertica db,
vsql -h XX -U XX -W -c "select * from emp" -o "c:/test1.dat" -t -F "|" -At
but how do i invoke this from Java, when i look at the documents all ref. to only Copy and Stm to be processed, If we can invoke VSQL then it might be pretty easy to get the data from Vertica database. or locate me the .jar file of VSQL which could help me to invoke? any thoughts?

Daniel_Leybovic · August 2014

Hi!

May be it can help you: http://www.vertica-forums.com/viewtopic.php?f=80&t=1390&sid=96f3ff77e31932351864bb185364d6ed

or this: http://alvinalexander.com/java/java-exec-processbuilder-process-1

Satish_1 · August 2014

Thanks Daniel, I had written exactly similar to this but its works good with 1 million records or less, when it exceeds more than that, it performance is very bad,

so i am trying to find out the approach to invoke vsql.

I notice there is around 50% better performance with vsql and also i try to looking for bulk extract and load to file

and also split the file after every 2 GB,

but nothing is solving the problem,

hence i am trying to see the vsql invoke and what was code used for vsql to extract much faster than java, is it python?

What does the special with VSQL to extract much faster than reading and writing the data through Java.

I would suggest, if this class included in JDBC driver to invoke vsql, which solves lots of problems.

Daniel_Leybovic · August 2014

Hi!

>>its works good with 1 million records or less, when it exceeds more than that, it performance is very bad
From my experience everyone who works with Vertica + JAVA looses about 30% from performance. If it works with 1M rows, so do it in parts:

    select ...bla...bla...bla... where row_id <= 1000000

    select ...bla...bla...bla... where row_id >= 1000000 and row_id <= 2000000

and so on

Benefits:

no degrades in performance
you can do it in parallel to different files, after it concatenate them to a single file.

>> I notice there is around 50% better performance with vsql and also i try to looking for bulk extract and load to file
Of cause, VSQL returns strings and JDBC returns objects (if it date so it date, if it integer so you get integer and not string and you can perform calculation on it). VSQL and JAVA its a different things - Java is programming language and VSQL its a db client. How you can compare it?

>> hence i am trying to see the vsql invoke and what was code used for vsql to extract much faster than java, is it python?
I did some tests and my tests shows me that Python works better, than JAVA.

>> I would suggest, if this class included in JDBC driver to invoke vsql, which solves lots of problems.
Forget about it, it increases dependencies - with java you will require a VSQL(no way - we need minimal dependencies). You want do it with VSQL so do it directly with VSQL, don't call it from JAVA.

BTW:

take a look on External Procedures. You can install it and after it to invoke from Vertica(via JDBC). So write an EP that extracts data without JAVA and call for EP from JDBC.
review a Vertica MarcketPlace >> ETL and Data Ingest - there are parallel UDF for export data

Tuning Java Virtual Machines (JVMs)
http://docs.oracle.com/cd/E15523_01/web.1111/e13814/jvm_tuning.htm

PS
Im pretty sure, that you will get same performance degradation with VSQL if you will call it from JAVA, just because its JAVA. For example memory defined per JVM, all resources controlled by JVM - heap size, stack size, etc.

If you are filling degradation on extract, so probably JVM doesn't configured well and this is a problem - you have to investigate it, otherwise you always will get a performance degrades with JAVA.

eli_revach · August 2014

Hi ,

As mention by Daniel , C++ implementation (VSQL) will probably be faster , we see it also in other databases like Oracle . However , i will try to check what is the root cause of your extract degradation after 1M records , it may related to GC of java or maybe it’s something related to the time you span for building the records (pad the delimiter ) did you test your performance without adding the delimiter char to your records ? , you can also try to optimize your statement setFetchSize attribute , but in general you should not see such degradation .

Check your code with some kind of java profiler before jump to the use of VSQL from your java code ( its ugly )

Thanks

We're Moving!

Create My New Community Account Now

Invoke VSQL through java program

Comments

Leave a Comment