Trickle Load - performance degradation

Hi , I am testing load process using Trickle Load to a table that include 180M records , the process is a java main that open 10 threads each one commit after 1000 records , I use auto mode since I see the direct is very slow . The process finish after 600Sec and die. Next time I re-execute the process I am facing performance degradation the process finish after 700Sec and so on and so on , each time it take extra time to finish . My general setup is : • Vetica 6.0 CE • Single node . • Hp server - 24Core and 32G memory (avg load during process execution is cpu ~13% disk ~5%). • No projections (only super) .

Comments

  • ->>> I like to add that when i restart the database the process go back to normal and then after several execution i go back to the same behavior . I am not deleting /updating records and their is not running querys on the system , in parallel to my execution ->>>
  • More -> My moveout seting was change to 20Sec
  • Hi Eli, thanks for sharing all of your details! We'll look into this to see if this behavior is normal or can be optimized in some way.
  • Hi Eli, what does your table schema look like? 1000 records in 600 seconds is on the order of 2 records/second or less, which is extremely slow for Vertica unless your table is nearing Vertica's max table width or you're using complicated data types. Are you perhaps doing lots of INSERT statements, rather than a single prepared INSERT with multiple records (or a single COPY statement)? Also, are you sure the bottleneck is on the Vertica end, rather than in your Java processes? You note that the system is less than 20% utilized with ten threads going at once. (What are those numbers exactly?; %cpu and %wait in 'top'?) A single batch data load in Vertica will use more than 100% CPU (more than one CPU core -- one for parsing, another for storing data) unless it's receiving data too slowly / is spending time waiting for data rather than processing it.
  • I am starting with 10M records on 600 sec and then the time grow to 712 and then on and on , on each new cycle i run , i use standared prepare statment on my java code , i can send it to you for reveiw .if its ok with you I am using java batch insert
  • import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.SQLException; import java.util.Date; import java.util.Properties; import java.sql.*; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Calendar; import com.vertica.jdbc.VerticaConnection; public class VerticaLoader extends Thread { private static String pageURL = "http://hp.mercury.com/RUM_MC/Page1.asp"; private static String postData = "PostData=postdata"; private static String sessionID = "jsessionid=10295"; private static String httpMethod = "GET"; private static int runmid = 0; private static int componentSeq = 0; private static String pages_data_statement = "INSERT INTO pages_data2 " + "VALUES(?,?,?,?,?,?,?,?,?)"; private static String sese_sum_statement = "INSERT INTO session_summary(rumID, startTime, endTime, applicationId, applicationTierId, actionHits, actionFailure, expectedActionHits, totalBytesIn,totalBytesOut, totalDownloadTimeForAvailableActions, totalDownloadTimeForUnavailableActions, hasPerformanceEvent, dwellTime, eventAppErrorCount, eventInfoCount, eventPerformanceCount, userName, userHost, sessionMngrIP, serverId, probeHost, lastActionTime, hasFullSnapshot, endUserGroupId,subnetId, isTVIntegrationEnabled, endUserIPv4, endUserIPv6, locationId, clientId, totalLatency, totalPackets, isPalSession) VALUES( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? ,?, HEX_TO_BINARY(?), ?, ?, ?, ?,?)"; private static String events_statement = "INSERT INTO events (eventTime, eventCategory, extraData, rumID, componentSeq, actionId, endUserGroupId, subnetId, userName, ClientIPv4, clientIPv6, ClientHost, sweId, eventId, locationId, clientId) values(?,?, ?,?,?,?,?,?,?,?,HEX_TO_BINARY(?),?,?,?,?,?)"; private static String session_properties_statement = "INSERT INTO session_properties(rumID, startTime, applicationID, Property1, Property2, Property3, Property4, Property5) VALUES(?, ?,?,?,?,?,?,?)"; private static String action_parent_mapping_statement = "INSERT INTO action_parent_mapping(rumID, actionSeq, timeStamp, parentActionSeq) VALUES(?,?,?,?) "; private static String rum_integration_tv_statement = "INSERT INTO rum_integration_tv(rumID,componentSeq,applicationId,applicationTierId,timeStamp,tvGUID,tvServerAddr,tvTimeStamp,tvUserData) values (?,?,?,?,?,?,?,?,?)"; private static String BPM_TRANS_HR_S_90000_10 = "INSERT INTO BPM_TRANS_HR_S_90000_11 (SAMPLETIME, TUID, DBDATE, CUSTOMER_ID, INTERNAL_TRANSACTION_ID, INTERNAL_LOCATION_ID, SCRIPT_ID, BPM_AGENT_ID, INTERNAL_SUBNET_ID, STATUS_ID, TIME_STAMP_COUNT, EM_RESULT_VALUE_SUM, EM_RESULT_VALUE_SUMSQR, EM_RESULT_VALUE_MAX, EM_RESULT_VALUE_MIN, TOT_OK_HITS_SUM, TOT_MINOR_HITS_SUM, TOT_CRITICAL_HITS_SUM, DOWNTIME_STATE, AVAILABILITY_STATUS_SUM, EM_DOWNLOAD_DATA_SIZE_CNT, TIME_STAMP_MAX) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"; public static File fd; public static FileWriter fw; public static File peakfd; public static FileWriter peakfw; public Connection conn; public static int batchSize = 1000; public static int iterations = 1000; public static int peakSize = 200; /** * @Param args * @throws IOException * @throws SQLException */ public static void main(String[] args) throws IOException, SQLException { int numThreads = 10; if (args.length > 0) { int i = 0; while (i + 1 <= args.length) { String arg1 = args; String arg2 = args[i + 1]; if (arg1.equalsIgnoreCase("-batchSize")) batchSize = Integer.valueOf(arg2).intValue(); else if (arg1.equalsIgnoreCase("-iterations")) iterations = Integer.valueOf(arg2).intValue(); else if (arg1.equalsIgnoreCase("-numThreads")) numThreads = Integer.valueOf(arg2).intValue(); else if (arg1.equalsIgnoreCase("-peakSize")) peakSize = Integer.valueOf(arg2).intValue(); i += 2; } } fd = new File("C:\\Vertica\\temp\\load.txt"); peakfd = new File("C:\\Vertica\\temp\\peak_load.txt"); Thread[] thread = new Thread[numThreads]; for (int i = 0; i < thread.length; i++) { thread = new VerticaLoader(i); } // Load JDBC driver try { Class.forName("com.vertica.jdbc.Driver"); } catch (ClassNotFoundException e) { // Could not find the driver class. Likely an issue // with finding the .jar file. System.err.println("Could not find the JDBC driver class."); e.printStackTrace(); return; } // Create property object to hold username & password Properties myProp = new Properties(); myProp.put("user", "dbadmin"); myProp.put("password", "dbadmin"); for (int i = 0; i < thread.length; i++) { ((VerticaLoader) thread).conn = DriverManager.getConnection("jdbc:vertica://MYDPHDB0079:5433/BSM", myProp); // ((VerticaConnection)((VerticaLoader) thread).conn).setProperty("DirectBatchInsert", true); // Object o = ((VerticaConnection)((VerticaLoader) thread).conn).getProperty("DirectBatchInsert"); } for (int i = 0; i < thread.length; i++) { thread.start(); } } public VerticaLoader(int i) { super(); this.setName("Thread " + i); } private void setParamsForPagesData(PreparedStatement pstmt,java.sql.Date sqlDate,long MyTime) throws SQLException { pstmt.setObject(1, MyTime); pstmt.setObject(2, "CA636D1868DBD383E0403B10B8400CB6"); pstmt.setObject(3, sqlDate); pstmt.setObject(4, 1); pstmt.setObject(5, 13260); pstmt.setObject(6, 10905); pstmt.setObject(7, 396); pstmt.setObject(8, 744); pstmt.setObject(9, 22457); pstmt.setObject(10, 0); pstmt.setObject(11, 1); pstmt.setObject(12, 3800); pstmt.setObject(13, 14440000); pstmt.setObject(14, 3800); pstmt.setObject(15, 3800); pstmt.setObject(16, 1); pstmt.setObject(17, 0); pstmt.setObject(18, 0); pstmt.setObject(19, 0); pstmt.setObject(20, 0.1); pstmt.setObject(21, 0); pstmt.setObject(22, 1348426973); pstmt.addBatch(); } @Override public void run() { // DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss.SSS"); // Calendar calendar = Calendar.getInstance(); try { fw = new FileWriter(fd); peakfw = new FileWriter(peakfd); conn.setAutoCommit(true); PreparedStatement pstmt = conn.prepareStatement(BPM_TRANS_HR_S_90000_10); Date d = new Date(); long start; java.util.Date daterecep; for (int j = 0; j < iterations; j++) { start = System.currentTimeMillis(); java.util.Date myDate = new java.util.Date(); java.sql.Date sqlDate = new java.sql.Date(myDate.getTime()); for (int i = 0; i < batchSize; i++) { setParamsForPagesData(pstmt,sqlDate,start); } // Execute the batch to insert the values. try { int[] results = pstmt.executeBatch(); // conn.commit(); } catch (SQLException e) { System.out.println("Error message: " + e.getMessage()); e.printStackTrace(); } long period = System.currentTimeMillis() - start; System.out.println(this.getName() + ": \t" + j + ":\t : Time = " + (period)); try { // fw.write(this.getName() + ": \t" + j + ":\t : Time = " + (period) + " \n"); // calendar.setTimeInMillis(start); // fw.write(this.getName() + ": \t" + formatter.format(calendar.getTime()) + "," + (period) + " \n"); fw.write(this.getName() + ": \t" + start + ":" + (period) + " \n"); if (period > peakSize) peakfw.write(this.getName() + ": \t" + j + ":\t : Time = " + (period) + " \n"); } catch (Exception e) { // Could not connect to database. // System.err.println("Could not connect to database."); // e.printStackTrace(); // return; } long sleep = System.currentTimeMillis() - start; if (sleep >0) Thread.sleep(sleep); //conn.commit(); } Date d2 = new Date(); System.out.println("took " + (d2.getTime() - d.getTime()) + " milisec"); conn.close(); fw.close(); peakfw.close(); // } catch (SQLException e) { // // Could not connect to database. // System.err.println("Could not connect to database."); // e.printStackTrace(); // return; } catch (Exception e) { // Could not connect to database. System.err.println("Could not connect to database."); e.printStackTrace(); return; } } }
  • Thanks for sharing your code! I'm not personally a Java expert (though I am at least familiar with the language; I'm more of a C/C++ person), others may have further comments. 10 million rows is a much more reasonable size for 600 seconds. Still pretty slow, though. I just loaded 10 million rows with a similar (not quite identical) schema to the above in under 40 seconds with a single-node database on my laptop, and that was using DIRECT mode. One tip: There's a known issue with batch-INSERT and function calls, ie., the HEX_TO_BINARY() call that you're doing in the INSERT statement. If you can rewrite as a COPY LOCAL statement, or do the hex->binary conversion in Java, or use a column with a default value or any other approach on the Vertica side to tell Vertica about the conversion without making it an explicit function call inside an INSERT statement, you may see better performance. As to the slowdown: It sounds like you're loading 10x 10M (so 100M) records at a time? That's starting to be a significant amount of data. AUTO mode keeps all data in memory until it's moved out to disk. (That's the advantage of AUTO -- DIRECT forces the data to be sorted and stored properly in files; if you're loading smaller batches of data, it turns out to be much more efficient for Vertica to batch together several load's into one sort/write-to-disk.) If less memory is available, Vertica will perform more slowly; it will allocate smaller buffers for various operations, etc. And your system is likely low on memory to begin with -- you have 24 CPU cores, which is a fair number of cores; and 32gb RAM, which is less than Vertica's minimum system requirement of 2gb RAM per CPU core (4gb recommended). Just a guess, but that could be the problem. You're also likely seeing a slowdown in later load's from the load and the moveout from the previous load happening simultaneously. Since AUTO defers writing some data to disk, it means you're still paying for the COPY after it's finished. The effect should be small on this data; though it might start adding up somewhat if you start needing large mergeouts as well (to combine old loads into single large sorted files, to increase query performance); that will depend on your partitioning scheme for the table.
  • Thanks for the help until now . My total load is 10M and not 100M , the HEX_TO_BINARY was not being use it just being at the part of several querys templet i define , at this specific test i don't use it . I like to mention the fact that my table size now is about 300M records with no projections (only super) and i don't use any partition , you think it may related to the fact that the mergout has to rebuild large files (no partition ) can impcat it ? . BTW , during the load i see many X LOCK'S(exclusive locks ) on the global and local catloge . One more thing , i work for HP SW in israel , so i can let you log to my system with no issue , can we arrange it ?
  • No problem. Regarding the projections, that certainly could have an impact; it won't slow down each COPY statement directly, but it will make each mergeout require more system resources to rebuild the large table file, which could slow down the system as a result. Regarding projections, incidentally, have you tried Vertica's Database Designer (in adminTools)? It sounds like you have plenty of data for it to run. It should automatically come up with a much-better projection design for you. (Including encodings, etc.) In many cases it will come up with the optimal design; particularly if you're not super-experienced at manual projection design, we always recommend that you just run it and let it do its' thing. Also: Oh, hi, another HP employee! I'll get in touch with you offline.
  • Also, regarding the X-lock's: Those should only happen during data load if your transactions are in Serializable mode. Most Vertica users load data in something lower like Read-Committed mode, which allows concurrent loads.
  • Thanks for the input I will run the dbd. Regarding the X-Locks you can see my code i did mention any isolation level on it , so i use the default which is committed read , but i still see the locks
  • Hi I upload a picture of my locks table via the run process , you can see that the post_commit process take X lock and its total time , what can be done ? imageimage
  • Hm... Off the top of my head, does the table happen to have an IDENTITY column, and are you loading more than 250000 or so rows per COPY? If so, take a look at the CACHE argument to ALTER SEQUENCE in the documentation. If you exhaust the cache of sequence values, we take an X lock to make sure no other concurrent COPY is using the same IDENTITY values as this statement. (IDENTITY columns are backed by sequences; take a look at the SEQUENCES table to find the sequence for your IDENTITY column.) Not certain that's your problem, but it's something to take a look at.
  • No i do not use sequences at all , below is more deatils about the X locks i am facing . And i am using simple batch insert via java imageimage
  • My table : CREATE TABLE BPM_TRANS_HR_S_90000_10 ( SAMPLETIME NUMERIC(28,6) NOT NULL, TUID VARCHAR(36 ) NOT NULL, DBDATE DATE, CUSTOMER_ID NUMERIC(10), INTERNAL_TRANSACTION_ID NUMERIC(20), INTERNAL_LOCATION_ID NUMERIC(20), SCRIPT_ID NUMERIC(20), BPM_AGENT_ID NUMERIC(20), INTERNAL_SUBNET_ID NUMERIC(20), SERVER_IP VARCHAR(50 ), SERVER_NAME VARCHAR(63 ), STATUS_ID NUMERIC(5), TIME_STAMP_COUNT NUMERIC(10), EM_RESULT_VALUE_SUM NUMERIC(28,6), EM_RESULT_VALUE_SUMSQR NUMERIC(28,6), EM_RESULT_VALUE_MAX NUMERIC(10), EM_RESULT_VALUE_MIN NUMERIC(10), EM_PAGE_CBD_COUNT_SUM NUMERIC(28,6), EM_DOWNLOAD_DATA_SIZE_SUM NUMERIC(28,6), EM_W_CONNECTION_TIME_SUM NUMERIC(28,6), EM_W_DNS_TIME_SUM NUMERIC(28,6), EM_W_SSL_TIME_SUM NUMERIC(28,6), EM_W_NETWORK_TIME_SUM NUMERIC(28,6), EM_W_NETWORK_FB_TIME_SUM NUMERIC(28,6), EM_W_SERVER_FB_TIME_SUM NUMERIC(28,6), EM_W_DOWNLOAD_TIME_SUM NUMERIC(28,6), EM_W_RETRY_TIME_SUM NUMERIC(28,6), EM_W_CONNECTION_TIME_MIN NUMERIC(10), EM_W_CONNECTION_TIME_MAX NUMERIC(10), EM_W_DNS_TIME_MIN NUMERIC(10), EM_W_DNS_TIME_MAX NUMERIC(10), EM_W_SSL_TIME_MIN NUMERIC(10), EM_W_SSL_TIME_MAX NUMERIC(10), EM_W_NETWORK_TIME_MIN NUMERIC(10), EM_W_NETWORK_TIME_MAX NUMERIC(10), EM_W_NETWORK_FB_TIME_MIN NUMERIC(10), EM_W_NETWORK_FB_TIME_MAX NUMERIC(10), EM_W_SERVER_FB_TIME_MIN NUMERIC(10), EM_W_SERVER_FB_TIME_MAX NUMERIC(10), EM_W_DOWNLOAD_TIME_MIN NUMERIC(10), EM_W_DOWNLOAD_TIME_MAX NUMERIC(10), EM_W_RETRY_TIME_MIN NUMERIC(10), EM_W_RETRY_TIME_MAX NUMERIC(10), EM_S_CONNECTION_TIME_SUM NUMERIC(28,6), EM_S_DNS_TIME_SUM NUMERIC(28,6), EM_S_SSL_TIME_SUM NUMERIC(28,6), EM_S_NETWORK_TIME_SUM NUMERIC(28,6), EM_S_NETWORK_FB_TIME_SUM NUMERIC(28,6), EM_S_SERVER_FB_TIME_SUM NUMERIC(28,6), EM_S_DOWNLOAD_TIME_SUM NUMERIC(28,6), EM_S_RETRY_TIME_SUM NUMERIC(28,6), BASELINE_RESP_TIME_MEAN_SUM NUMERIC(28,6), BASELINE_RESP_TIME_STD_SUM NUMERIC(28,6), BASELINE_RESP_TIME_LOC_MEANSUM NUMERIC(28,6), BASELINE_RESP_TIME_LOC_STD_SUM NUMERIC(28,6), BASELINE_DOWNLOAD_TIME_MEANSUM NUMERIC(28,6), BASELINE_DOWNLOAD_TIME_STD_SUM NUMERIC(28,6), BASELINE_SERVER_TIME_MEAN_SUM NUMERIC(28,6), BASELINE_SERVER_TIME_STD_SUM NUMERIC(28,6), BASELINE_FIRSTBUF_TIME_MEANSUM NUMERIC(28,6), BASELINE_FIRSTBUF_TIME_STD_SUM NUMERIC(28,6), BASELINE_NETWORK_TIME_MEAN_SUM NUMERIC(28,6), BASELINE_NETWORK_TIME_STD_SUM NUMERIC(28,6), TOT_OK_HITS_SUM NUMERIC(20), TOT_MINOR_HITS_SUM NUMERIC(20), TOT_CRITICAL_HITS_SUM NUMERIC(20), DOWNTIME_STATE NUMERIC(3), ERROR_COUNT_SUM NUMERIC(28,6), AVAILABILITY_STATUS_SUM NUMERIC(28,6), EM_DOWNLOAD_DATA_SIZE_CNT NUMERIC(10), TIME_STAMP_MAX NUMERIC(28,6) )

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file