VERTICA - SPARK CONNECTOR ISSUE
hey
my name is ben
im working with
veritca version is : 7.2.1
spark-connector is : 0.2.1
im trying to execute sql query of dataFrame like select * from where IN ("A","B","V")
the problem is the vertica parse the query in wrong way
the “(A,B,V)” is translated as a list
from vertica log:
2016-03-17 10:34:24.256 Init Session:0x7eff38012d50 [Session] <INFO> [PQuery] TX:0(...-24603:0xcc8) select …,...,...,... from ... where( (0x00000000ffffffff & hash()) >= 3579139414 a
nd (0x00000000ffffffff & hash()) <= 4294967297 ) AND (... in [Ljava.lang.Object; @5aca2402)
2016-03-17 10:34:24.256 Init Session:0x7eff38012940 <ERROR> @v_..._node0001: 42601/4856: Syntax error at or near "[" at character 205
additionally
im trying to execute query like select * from where col = "abc"
vertica parse the filter value "abc" as col
how can i filter string in spark sql - vertica dataframe ?
in each problem that i presented here its seems like a bug from vertica api
please help !
thank
ben.
0
Comments
Hi Ben,
Could you describe in more detail (maybe with a code example) as to how you're performing this string-filter operation from Spark?
Thanks,
Ed
ERROR as explaind .....from vertica log:
column test doesnt exist
OR
Hi Ben,
Is that "in" clause the canonical way of doing that in Spark? I was able to phrase your example as a multi-condition filter as follows:
df.where("a = 'X' OR a = 'Y'")
which should produce the desired result. However, you may encounter a bug in the connector related to strings. That issue will be fixed in the next release, which should be available next week. Thanks for your patience!
Ed
Hey Edward
Yes , that's "in" clause is the canonical way of doing that in Spark , and it's important for me too
I know that I can do "or" clause but that's not good for me
I have to filter like 100,000 - 500,000 values each query
There is a bug as I mentioned before that when you initiate query with "in" clause , vertica transform it to list object, when this bug will fixed too?
Another question is , if there is any limitation on query length ? Because the queries is generated from Scala code and can get bigger
Thanks
Ben