Running multiple queries in parallel using the same JDBC connection

neo_i_am_the_one · April 2022

Hey Vertica Gurus,

Is it possible to run multiple queries in parallel using the same JDBC connection in Vertica? Or will they block, and the result will end up being sequential?

I am looking for a Vertica-specific answer. A few non-Vertica discussion forums suggest that this is DB/Driver specific.

I am well aware of connection-pooling and am already using it. My use case is to spawn multiple threads from a connection pool for a category of queries. Each category has a bunch of pre-queries, base-queries, and post-queries, and I would like to use the same connection for all queries in the context of the category.

If this is not possible, what strategies can you suggest to workaround this to achieve this?

Thanks!

SruthiA · April 2022

@neo_i_am_the_one : I think you can use vget class. it will help in spawning multiple sessions. Please take a look at the below links.

https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/ConnectingToVertica/ClientJDBC/KVApi/VerticaRoutableConnection.htm

https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/ConnectingToVertica/ClientJDBC/KVApi/VgetObject.htm

neo_i_am_the_one · April 2022

Hello Sruthi,

Thanks for your reply. I learned something new today, but unfortunately, this does not solve my problem. You describe just another connection pool "inside" the single connection that essentially acts as a wrapper.

Furthermore, creating this connection object requires specifying a single node as the host where you know the data resides. In a multi-node environment, where you only have the ELB DNS, this would not work. The documentation says so itself.

My conclusion is that this ask is not achievable at this time.

SergeB · April 2022

Have you tried enabling MARS? https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/ConnectingToVertica/MARSAndRBS/MultipleActiveResultSets.htm

neo_i_am_the_one · April 2022

Thank you, Serge. This is very interesting and quite intriguing. I am very willing to try this. Where can I find an end-to-end JDBC example of this feature? I have been searching like crazy!

neo_i_am_the_one · April 2022

Thank you, Serge. This is very interesting and quite intriguing. I am very willing to try this. Where can I find an end-to-end JDBC example of this feature? I have been searching like crazy!

baron42bba · April 2022

ELB DNS?
You should consider using a round robin dns entry with all cluster members instead. First of the ELB might timeout before a long running query is finished. Furthermore you miss on the feature where a busy cluster node tells you to connect to another one if it’s busy.

Bryan_H · April 2022

@neo_i_am_the_one Here is a very simple example showing how to open multiple Statements and ResultSets from a single connection:

package com.vertica.java;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Properties;

public class VerticaJdbcMarsExample {

public static void main(String[] args) throws Exception {
    String host = "192.168.1.206", port = "5433", dbName = "d2";
    Properties jdbcOptions = new Properties();
    jdbcOptions.put("username", "dbadmin");
    jdbcOptions.put("password", "Vertica1!");

    Connection c = DriverManager.getConnection(
            "jdbc:vertica://" + host + ":" + port + "/" + dbName, jdbcOptions);
    Statement s1, s2;
    ResultSet rs1, rs2;
    // no MARS: throws Exception
    System.out.println("Run two queries (Statements/ResultSets) WITHOUT MARS");
    try {
        s1 = c.createStatement();
        rs1 = s1.executeQuery("select * from v_monitor.node_states;");
        s2 = c.createStatement();
        rs2 = s2.executeQuery("select * from v_monitor.storage_containers;");
        System.out.println("It worked without MARS!");
    } catch (Exception e) {
        e.printStackTrace();
    }
    jdbcOptions.put("MultipleActiveResultSets", "true");
    c = DriverManager.getConnection(
            "jdbc:vertica://" + host + ":" + port + "/" + dbName, jdbcOptions);
    // MARS: this should work
    System.out.println("Run two queries (Statements/ResultSets) WITH MARS");
    try {
        s1 = c.createStatement();
        rs1 = s1.executeQuery("select * from v_monitor.node_states;");
        s2 = c.createStatement();
        rs2 = s2.executeQuery("select * from v_monitor.storage_containers;");
        System.out.println("It worked with MARS!");
    } catch (Exception e) {
        e.printStackTrace();
    }
    System.out.println("MARS demo done");
}

}

neo_i_am_the_one · April 2022

Thanks for your response. I am aware of "MultipleActiveResultSets" property. The code you shared does not help much because it's identical before and after setting this property and is therefore sequential.

If you refer to https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/SQLReferenceManual/SystemTables/MONITOR/SESSION_MARS_STORE.htm, you will see that there is a notion of a "RESULTSET_ID" for a given "SESSION_ID". So how do I get that from JDBC? If I understand this feature correctly, you issue a bunch of queries, get ResultSet IDs and use those IDs to get the actual results later according to this diagram: https://www.vertica.com/docs/11.1.x/HTML/Content/Authoring/ConnectingToVertica/MARSAndRBS/MultipleActiveResultSets.htm

Bryan_H · April 2022

The two calls within the try-catch block run differently. The first block throws an error when executing s2 because there's still an open ResultSet. The second block runs successfully because the two statements are assigned to ResultSet objects associated with separate MARS caches on server side. It's possible to retrieve the two ResultSets independently in the second block, though I don't actually implement that.
What is the use case for having multiple open statements/resultsets on a single connection? JDBC standard doesn't provide a method to retrieve a previously run result set, and Vertica's JDBC implementation doesn't provide additional functionality to get a MARS cached result set. As far as I can see, it would be possible to use a local variable - a Map or an Array - to manage the ResultSet objects and correlate to SQL queries in the client application. It would help to understand why you are trying to cache result sets on the server to suggest a recommended practice or at least a more complete example here.

neo_i_am_the_one · April 2022

Thank you, Bryan, for adding more information. My use case is as follows:

I have a Java enum of 5 categories of queries, and each category has an array of "attributes." I have much better names in actual code than C and A.

public enum Category {
    C1(new String[]{"A1", "A2", "A3"}),
    C2(new String[]{"A1", "A2", "A3", "A4"}),
    C3(new String[]{"A1"}),
    C4(new String[]{"A1", "A2", "A3"}),
    C5(new String[]{"A1", "A2", "A3", "A4", "A5", "A6"}),

    /**
     * Attributes.
     */
    private final String[] attributes;

    /**
     * Constructor.
     *
     * @param attributes category attributes.
     */
    Category(String[] attributes) {
        this.attributes = attributes;
    }

    /**
     * Returns the category attributes.
     *
     * @return category attributes.
     */
    public String[] getAttributes() {
        return attributes;
    }

    /**
     * Returns the name as a string in lower case.
     *
     * @return lower case value of the name.
     */
    public String toLowerCase() {
        return name().toLowerCase();
    }
}

Each category has a set of SHARED pre-and-post queries amongst the attributes. For example, for C1:

-- shared pre-queries
c1.pre.query.count=3
c1.pre.query.1=CREATE LOCAL TEMP TABLE T1 ... (complex query)
c1.pre.query.2=CREATE LOCAL TEMP TABLE T2 ... (complex query)
c1.pre.query.3=CREATE LOCAL TEMP TABLE T2 ... (complex query)

-- shared post-queries
c1.post.query.count=3
c1.post.query.1=DROP TABLE IF EXISTS T1 CASCADE -- explicit drops required due to connection pool sharing
c1.post.query.2=DROP TABLE IF EXISTS T2 CASCADE
c1.post.query.3=DROP TABLE IF EXISTS T3 CASCADE

Now "attributes" of a category have "base" queries:

c1.a1.base.query=SELECT ... FROM T1,  T2 ... (complex query)
c1.a2.base.query=SELECT ... FROM T2 ... (complex query)
c1.a3.base.query=SELECT ... FROM T3, T2, T1 ... (complex query)

Order of execution:

We run the set of SHARED pre-queries for each category, execute each attribute base-queries, and then run the set of SHARED post-queries.

What that context setup, my code uses Java's "CompletableFuture" API, which has a lambda as the "Supplier" that enumerates through all values of the enum "Category" (in our case C1 ... C5) and, for each one, gets a connection from a connection pool (configured to Vertica DB). These connections are, of course, different from each other.

GOAL: The life cycle of each category (pre-queries, attribute-base-queries, post-queries) should be complete using a SINGLE connection. I want to achieve parallelism for attribute base queries since they are independent of each other to make everything super fast.

NON-PROBLEM: Pre-and-post queries don't return result sets, so that's good.

PROBLEM: Attribute-base-queries all return result-sets, and there lies the PROBLEM. Without "MARS" capability, I have to use NEW connections for each attribute base query. The number of required connections is (#categories) x (#attributes) and I am not able to run the attribute base queries in parallel.

Hopefully, this helps explain the situation. I am looking forward to solution ideas!

neo_i_am_the_one · April 2022

I updated my comment above a couple of times. So if you are reading this comment, please re-read the above comment one more time, please. Thank you!

neo_i_am_the_one · April 2022

Here is the "top-level" code:

ExecutorService executor = Executors.newFixedThreadPool(Category.values().length);

Arrays.stream(Category.values())
    .map(category -> CompletableFuture.supplyAsync(() -> {
        // each set of queries will be run using a separate database connection
        try (Connection connection = DatabaseConnectionManager.getInstance().getConnectionPool().getConnection()) {
            try {
                // execute pre-queries for each category
                executeQueryStatements(connection, category.getPreQueries())

                // execute attribute level base queries for each category
                executeAttributeBaseQueries(connection, category.getAttributes());
            } finally {
                // execute post-queries for each category
                executeQueryStatements(connection, category.getPostQueries())
            }
        } catch (SQLException e) {
            throw new CompletionException(e);
        }
        return null;
    }, executor))
    .collect(Collectors.toList())
    .stream()
    .map(CompletableFuture::join)
    .collect(Collectors.toList());

I want to leverage MARS-like functionality inside the executeAttributeBaseQueries method using the SAME connection to achieve parallelism. If parallelism is not achievable then my use-case is not solved. Loosely speaking, say if 10 attribute base queries that are independent of each other take 10 seconds when run sequentially, I want them to run in parallel to get the answer in "1 second" so to speak. You get the idea.

PS: In my comments, I used upper-case for certain words for emphasis purposes only :-)

Bryan_H · April 2022

As long as your connection manager sets the option "MultipleActiveResultSets" to "true" then it should be possible to run the "pre", "base", and "post" queries on the same connection. It will be the responsibility of the executeAttributeBaseQueries method to collect and process the ResultSet object from each execute/executeQuery run for each attribute. If it's necessary to collect all ResultSets or share data between sets, then create a data structure like a Map<String,ResultSet> where you would add map.put("a1",{ResultSet of query a1}) and process query results by query ID.

neo_i_am_the_one · April 2022

Thanks, Bryan. Yes, there is a need to collect the individual result sets because the eventual goal is to send a JSON response to the client. The JSON structure has the categories (C1 ... C5) as top-level nodes, and each of them contains attribute-named nodes as array objects of rows returned from the individual "base" queries.

Jamesrobert · April 2022

@Bryan_H said:
@neo_i_am_the_one Here is a very simple example showing how to open multiple Statements and ResultSets from a single connection:

package com.vertica.java;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Properties;

public class VerticaJdbcMarsExample {

public static void main(String[] args) throws Exception {
    String host = "192.168.1.206", port = "5433", dbName = "d2";
    Properties jdbcOptions = new Properties();
    jdbcOptions.put("username", "dbadmin");
    jdbcOptions.put("password", "Vertica1!");

    Connection c = DriverManager.getConnection(
            "jdbc:vertica://" + host + ":" + port + "/" + dbName, jdbcOptions);
    Statement s1, s2;
    ResultSet rs1, rs2;
    // no MARS: throws Exception
    System.out.println("Run two queries (Statements/ResultSets) WITHOUT MARS");
    try {
        s1 = c.createStatement();
        rs1 = s1.executeQuery("select * from v_monitor.node_states;");
        s2 = c.createStatement();
        rs2 = s2.executeQuery("select * from v_monitor.storage_containers;");
        System.out.println("It worked without MARS!");
    } catch (Exception e) {
        e.printStackTrace();
    }
    jdbcOptions.put("MultipleActiveResultSets", "true");
    c = DriverManager.getConnection(
            "jdbc:vertica://" + host + ":" + port + "/" + dbName, jdbcOptions);
    // MARS: this should work
    System.out.println("Run two queries (Statements/ResultSets) WITH MARS");
    try {
        s1 = c.createStatement();
        rs1 = s1.executeQuery("select * from v_monitor.node_states;");
        s2 = c.createStatement();
        rs2 = s2.executeQuery("select * from v_monitor.storage_containers;");
        System.out.println("It worked with MARS!");
    } catch (Exception e) {
        e.printStackTrace();
    }
    System.out.println("MARS demo done");
}

}

@Bryan_H said:
@neo_i_am_the_one Here is a very simple example showing how to open multiple Statements and ResultSets from a single connection:

package com.vertica.java;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Properties;

public class VerticaJdbcMarsExample {

public static void main(String[] args) throws Exception {
    String host = "192.168.1.206", port = "5433", dbName = "d2";
    Properties jdbcOptions = new Properties();
    jdbcOptions.put("username", "dbadmin");
    jdbcOptions.put("password", "Vertica1!");

    Connection c = DriverManager.getConnection(
            "jdbc:vertica://" + host + ":" + port + "/" + dbName, jdbcOptions);
    Statement s1, s2;
    ResultSet rs1, rs2;
    // no MARS: throws Exception
    System.out.println("Run two queries (Statements/ResultSets) WITHOUT MARS");
    try {
        s1 = c.createStatement();
        rs1 = s1.executeQuery("select * from v_monitor.node_states;");
        s2 = c.createStatement();
        rs2 = s2.executeQuery("select * from v_monitor.storage_containers;");
        System.out.println("It worked without MARS!");
    } catch (Exception e) {
        e.printStackTrace();
    }
    jdbcOptions.put("MultipleActiveResultSets", "true");
    c = DriverManager.getConnection(
            "jdbc:vertica://" + host + ":" + port + "/" + dbName, jdbcOptions);
    // MARS: this should work
    System.out.println("Run two queries (Statements/ResultSets) WITH MARS");
    try {
        s1 = c.createStatement();
        rs1 = s1.executeQuery("select * from v_monitor.node_states;");
        s2 = c.createStatement();
        rs2 = s2.executeQuery("select * from v_monitor.storage_containers;");
        System.out.println("It worked with MARS!");
    } catch (Exception e) {
        e.printStackTrace();
    }
    System.out.println("MARS demo done");
}

}

@Bryan_H said:
@neo_i_am_the_one Here is a very simple example showing how to open multiple Statements and ResultSets from a single connection:

package com.vertica.java;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Properties;

public class VerticaJdbcMarsExample {

public static void main(String[] args) throws Exception {
    String host = "192.168.1.206", port = "5433", dbName = "d2";
    Properties jdbcOptions = new Properties();
    jdbcOptions.put("username", "dbadmin");
    jdbcOptions.put("password", "Vertica1!");

    Connection c = DriverManager.getConnection(
            "jdbc:vertica://" + host + ":" + port + "/" + dbName, jdbcOptions);
    Statement s1, s2;
    ResultSet rs1, rs2;
    // no MARS: throws Exception
    System.out.println("Run two queries (Statements/ResultSets) WITHOUT MARS");
    try {
        s1 = c.createStatement();
        rs1 = s1.executeQuery("select * from v_monitor.node_states;");
        s2 = c.createStatement();
        rs2 = s2.executeQuery("select * from v_monitor.storage_containers;");
        System.out.println("It worked without MARS!");
    } catch (Exception e) {
        e.printStackTrace();
    }
    jdbcOptions.put("MultipleActiveResultSets", "true");
    c = DriverManager.getConnection(
            "jdbc:vertica://" + host + ":" + port + "/" + dbName, jdbcOptions);
    // MARS: this should work
    System.out.println("Run two queries (Statements/ResultSets) WITH MARS");
    try {
        s1 = c.createStatement();
        rs1 = s1.executeQuery("select * from v_monitor.node_states;");
        s2 = c.createStatement();
        rs2 = s2.executeQuery("select * from v_monitor.storage_containers;");
        System.out.println("It worked with MARS!");
    } catch (Exception e) {
        e.printStackTrace();
    }
    System.out.println("MARS demo done");
}

}

This is right.

Running multiple queries in parallel using the same JDBC connection

Answers

Leave a Comment