Invoke VSQL through java program
I am using below command to extract the data from vertica db,
vsql -h XX -U XX -W -c "select * from emp" -o "c:/test1.dat" -t -F "|" -At
but how do i invoke this from Java, when i look at the documents all ref. to only Copy and Stm to be processed, If we can invoke VSQL then it might be pretty easy to get the data from Vertica database. or locate me the .jar file of VSQL which could help me to invoke? any thoughts?
vsql -h XX -U XX -W -c "select * from emp" -o "c:/test1.dat" -t -F "|" -At
but how do i invoke this from Java, when i look at the documents all ref. to only Copy and Stm to be processed, If we can invoke VSQL then it might be pretty easy to get the data from Vertica database. or locate me the .jar file of VSQL which could help me to invoke? any thoughts?
0
Comments
May be it can help you: http://www.vertica-forums.com/viewtopic.php?f=80&t=1390&sid=96f3ff77e31932351864bb185364d6ed
or this: http://alvinalexander.com/java/java-exec-processbuilder-process-1
so i am trying to find out the approach to invoke vsql.
I notice there is around 50% better performance with vsql and also i try to looking for bulk extract and load to file
and also split the file after every 2 GB,
but nothing is solving the problem,
hence i am trying to see the vsql invoke and what was code used for vsql to extract much faster than java, is it python?
What does the special with VSQL to extract much faster than reading and writing the data through Java.
I would suggest, if this class included in JDBC driver to invoke vsql, which solves lots of problems.
>>its works good with 1 million records or less, when it exceeds more than that, it performance is very bad
From my experience everyone who works with Vertica + JAVA looses about 30% from performance. If it works with 1M rows, so do it in parts: and so on
Benefits:
- no degrades in performance
- you can do it in parallel to different files, after it concatenate them to a single file.
>> I notice there is around 50% better performance with vsql and also i try to looking for bulk extract and load to fileOf cause, VSQL returns strings and JDBC returns objects (if it date so it date, if it integer so you get integer and not string and you can perform calculation on it). VSQL and JAVA its a different things - Java is programming language and VSQL its a db client. How you can compare it?
>> hence i am trying to see the vsql invoke and what was code used for vsql to extract much faster than java, is it python?
I did some tests and my tests shows me that Python works better, than JAVA.
>> I would suggest, if this class included in JDBC driver to invoke vsql, which solves lots of problems.
Forget about it, it increases dependencies - with java you will require a VSQL(no way - we need minimal dependencies). You want do it with VSQL so do it directly with VSQL, don't call it from JAVA.
BTW:
Tuning Java Virtual Machines (JVMs)
http://docs.oracle.com/cd/E15523_01/web.1111/e13814/jvm_tuning.htm
PS
Im pretty sure, that you will get same performance degradation with VSQL if you will call it from JAVA, just because its JAVA. For example memory defined per JVM, all resources controlled by JVM - heap size, stack size, etc.
If you are filling degradation on extract, so probably JVM doesn't configured well and this is a problem - you have to investigate it, otherwise you always will get a performance degrades with JAVA.
As mention by Daniel , C++ implementation (VSQL) will probably be faster , we see it also in other databases like Oracle . However , i will try to check what is the root cause of your extract degradation after 1M records , it may related to GC of java or maybe it’s something related to the time you span for building the records (pad the delimiter ) did you test your performance without adding the delimiter char to your records ? , you can also try to optimize your statement setFetchSize attribute , but in general you should not see such degradation .
Check your code with some kind of java profiler before jump to the use of VSQL from your java code ( its ugly )
Thanks