Vertica DB performance question
We have a Vertica v8.1.1 DB cluster made of three nodes. This DB comes as part of an integrated solution by a vendor. Now we want to use the DB to store some network traffic data. Here is my table:
packetlogic=> \d pason.dscp2 List of Fields by Tables Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key --------+-------+------------------------+--------------+------+---------+----------+-------------+------------- pason | dscp2 | time_stamp | timestamptz | 8 | | t | f | pason | dscp2 | subscriber | varchar(256) | 256 | | f | f | pason | dscp2 | ip_address | varchar(256) | 256 | | t | f | pason | dscp2 | remote_address | varchar(256) | 256 | | t | f | pason | dscp2 | remote_vhost | varchar(256) | 256 | | f | f | pason | dscp2 | signature_service_name | varchar(256) | 256 | | f | f | pason | dscp2 | dscp_out | int | 8 | | t | f | pason | dscp2 | bytes_in | int | 8 | | t | f | pason | dscp2 | bytes_out | int | 8 | | t | f | (9 rows)
I am writing data through the python vertica library. The issue is that it takes over 40 seconds to write about 1000 rows. Here is the relevent part of my code:
user = 'dbadmin' pwd = 'xxx' database = 'packetlogic' host = 'pic01-00' db = vertica_db_client.connect(database=database, user=user, password=pwd, host=host) cur = db.cursor() cur.executemany("""INSERT INTO pason.dscp2 (time_stamp, subscriber, ip_address, remote_address, signature_service_name, dscp_out, bytes_in, bytes_out) VALUES (?, ?, ?, ?, ?, ?, ?, ?)""", data1) cur.execute("COMMIT")
This script runs over the network but it is all local LAN with 10G throughput and no congestion. The RTT to the DB server is under 1ms.
Is the performance normal? Is there anything I should do to improve the performance? I am not from the DB background so hopefully, there is something simple for me to change to increase the throughput. Thanks.
0
Comments
Please don't insert into table one by one,use copy or insert values by batching.