Partial data load from Vertica to Spark
One of my customers wants to know if they can load partial data into Spark from a vertica table using select.
Is there anyway we can load partial data from the table using a SQL Query? We don’t want to load the whole table to our Spark app.
Dataset ds = spark.read().format("com.vertica.spark.datasource.DefaultSource")
.option("user", "...")
.option("password", "...")
.option("driver", "com.vertica.jdbc.Driver")
.option("dbschema", "PDW_ERIEUTRAN_VIEWS")
.option("table", "RBS_EUTRANCELLFDD1") // DON’T WANT to load the whole table!!!!
.option("db", "PDW")
//.option("url", "jdbc:vertica://***:5433/SRVVERTICA")
.option("host", "verticapride-brhmal.it.att.com")
.load();
They have restricted access to views and their views are very large as well.
Any pointers?
thanks
Comments
The online doc has a section called "Column Selection and Filter Push Down". I think this what you are looking for.
See:
https://my.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/SparkConnector/LoadingVerticaDataToSparkUsingVerticaDataSource.htm