The Vertica Forum is getting a makeover! The site will be undergoing maintenance from Tuesday 8/13 to Friday 8/16. We appreciate your patience and cooperation during this time.

Vertica DB performance question

We have a Vertica v8.1.1 DB cluster made of three nodes. This DB comes as part of an integrated solution by a vendor. Now we want to use the DB to store some network traffic data. Here is my table:

packetlogic=> \d pason.dscp2
                                            List of Fields by Tables
 Schema | Table |         Column         |     Type     | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+------------------------+--------------+------+---------+----------+-------------+-------------
 pason  | dscp2 | time_stamp             | timestamptz  |    8 |         | t        | f           |
 pason  | dscp2 | subscriber             | varchar(256) |  256 |         | f        | f           |
 pason  | dscp2 | ip_address             | varchar(256) |  256 |         | t        | f           |
 pason  | dscp2 | remote_address         | varchar(256) |  256 |         | t        | f           |
 pason  | dscp2 | remote_vhost           | varchar(256) |  256 |         | f        | f           |
 pason  | dscp2 | signature_service_name | varchar(256) |  256 |         | f        | f           |
 pason  | dscp2 | dscp_out               | int          |    8 |         | t        | f           |
 pason  | dscp2 | bytes_in               | int          |    8 |         | t        | f           |
 pason  | dscp2 | bytes_out              | int          |    8 |         | t        | f           |
(9 rows)

I am writing data through the python vertica library. The issue is that it takes over 40 seconds to write about 1000 rows. Here is the relevent part of my code:

user = 'dbadmin'
pwd = 'xxx'
database = 'packetlogic'
host = 'pic01-00'
db = vertica_db_client.connect(database=database, user=user, password=pwd, host=host)
cur = db.cursor()

cur.executemany("""INSERT INTO pason.dscp2 (time_stamp, subscriber, ip_address, remote_address, signature_service_name, dscp_out, bytes_in, bytes_out) VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", data1)
cur.execute("COMMIT")

This script runs over the network but it is all local LAN with 10G throughput and no congestion. The RTT to the DB server is under 1ms.

Is the performance normal? Is there anything I should do to improve the performance? I am not from the DB background so hopefully, there is something simple for me to change to increase the throughput. Thanks.

Comments

  • Please don't insert into table one by one,use copy or insert values by batching.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.