How to read from projections in spark connector?

ujalityagiujalityagi Registered User

I am able execute queries on projections in vsql command line but not sure how to access this data on application level in dataframe.

val spark: SparkSession = SparkSession
.builder()
.appName("vertica-spark-connector-testing")
.master("local")
.getOrCreate()

options: Map[String, String] = Map(
"table" -> "p_f_test",
"db" -> "Test",
"user" -> "foo",
"password" -> "bar",
"numPartitions" -> "10",
"host" -> "localhost",
"hdfs_url" -> "hdfs://localhost:9000/user/dir/",
"web_hdfs_url" -> "webhdfs://localhost:9870/user/dir/",
"dbschema" -> "public")

spark.read.format("com.vertica.spark.datasource.DefaultSource").options(options).load()
spark.sql("select * from p_f_test")

output:- Specified relation name "public"."p_f_test" does not exist

But in vsql command line:-
select * from f_test; <------- actual table
id | message | still_here
----+---------+------------
3 | hello | t
4 | goodbye | f

create projection p_f_test (message,still_here) as select message, still_here from f_test segmented by hash(id) all nodes;
select * from p_f_test; <----------- projection
message | still_here
---------+------------
goodbye | f
hello | t

Is there a way to load projection dataset on application level?

Thanks in advance :smile:

Kind Regards,
Ujali Tyagi

Comments

  • aluanrbeachaluanrbeach Registered User
    edited May 9

    @ujalityagi you are almost there.
    add the following lines and it will work.

    val df = spark.read.format("com.vertica.spark.datasource.DefaultSource").options(options).load()
    df.createOrReplaceTempView("p_f_test")
    val df2 = spark.sql("select * from p_f_test")
    df2.show()
    

    Also where did you find the list of options Vertica Spark Connector provides?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file