Inconsistent/slow runtimes due to high Scan execution times on individual nodes.
Our database is exhibiting extremely inconsistent runtimes for "little-sip" queries. By "little sip" queries, I mean queries which scan large datasets to find and return a small number of records. The inconsistencies generally occur when large-batch loads (COPY , "INSERT...SELECT", or CTAS) are running in other sessions. When no such processes are running, the queries show great performance consistently.
Naturally, some slowdown is to be expected when resource contention is in play. However, the variation we are experiencing typically exceeds 100-fold (that is, the normal "quiet database" response times are over 100x faster than slower "busy database" response times, even if "busy database" is just one or two background batch loads). Attempts to mitigate this with runtime priorities and resource pools have been fruitless.
A from-scratch demo:
HPSUPPORT=> CREATE TABLE HPSUPPORT.ONE_BILLION_ROWS (OBR_ID INT NOT NULL, OBR_RAND1 FLOAT, OBR_RAND2 FLOAT) ORDER BY OBR_ID SEGMENTED BY HASH(OBR_ID) ALL NODES; CREATE TABLE HPSUPPORT=> CREATE SEQUENCE HPSUPPORT.ONE_BILLION_ROWS_SEQ; CREATE SEQUENCE HPSUPPORT=> INSERT /*+ DIRECT */ INTO HPSUPPORT.ONE_BILLION_ROWS SELECT HPSUPPORT.ONE_BILLION_ROWS_SEQ.NEXTVAL, RANDOM(), RANDOM() FROM HPSUPPORT.VERYLARGETABLE LIMIT 1000000000; OUTPUT ------------ 1000000000 (1 row) HPSUPPORT=> commit; COMMIT HPSUPPORT=> SELECT ANALYZE_STATISTICS('HPSUPPORT.ONE_BILLION_ROWS'); ANALYZE_STATISTICS -------------------- 0 (1 row) HPSUPPORT=> \timing Timing is on. HPSUPPORT=> \o /dev/null HPSUPPORT=> SELECT * FROM HPSUPPORT.ONE_BILLION_ROWS WHERE OBR_ID BETWEEN 500000000 AND 500001000 AND OBR_RAND1 BETWEEN 0.2 AND 0.3 AND OBR_RAND2 BETWEEN 0.2 AND 0.3; Time: First fetch (10 rows): 4302.331 ms. All rows formatted: 4302.359 ms HPSUPPORT=> SELECT * FROM HPSUPPORT.ONE_BILLION_ROWS WHERE OBR_ID BETWEEN 500000000 AND 500001000 AND OBR_RAND1 BETWEEN 0.2 AND 0.3 AND OBR_RAND2 BETWEEN 0.2 AND 0.3; Time: First fetch (10 rows): 632.195 ms. All rows formatted: 632.226 ms HPSUPPORT=> SELECT * FROM HPSUPPORT.ONE_BILLION_ROWS WHERE OBR_ID BETWEEN 500000000 AND 500001000 AND OBR_RAND1 BETWEEN 0.2 AND 0.3 AND OBR_RAND2 BETWEEN 0.2 AND 0.3; Time: First fetch (10 rows): 41.238 ms. All rows formatted: 41.267 ms HPSUPPORT=> SELECT * FROM HPSUPPORT.ONE_BILLION_ROWS WHERE OBR_ID BETWEEN 500000000 AND 500001000 AND OBR_RAND1 BETWEEN 0.2 AND 0.3 AND OBR_RAND2 BETWEEN 0.2 AND 0.3; Time: First fetch (10 rows): 2661.480 ms. All rows formatted: 2661.513 ms HPSUPPORT=> SELECT * FROM HPSUPPORT.ONE_BILLION_ROWS WHERE OBR_ID BETWEEN 500000000 AND 500001000 AND OBR_RAND1 BETWEEN 0.2 AND 0.3 AND OBR_RAND2 BETWEEN 0.2 AND 0.3; Time: First fetch (10 rows): 7655.370 ms. All rows formatted: 7655.398 ms HPSUPPORT=>
Note that though these last queries were submitted approximately 30 seconds apart, the slowest was almost 200x slower than the fastest.
A few more pertinent points:
1) The cluster has 18 nodes.
2) In the case of this demo, there were two background "INSERT /*+ DIRECT */ SELECT..." sessions running during the 4 SELECT queries. Both of these background sessions were set to LOW runtime priority prior to testing. The HPSUPPORT user account used for this test was assigned (and sole user of) a resource pool with HIGH priority.
3) When I ran PROFILE of the sample query using MC console, the results of a slow run show a HUGE skew among nodes. 16 nodes show "clock time(us)" ranging from 10000 to 100000 microseconds. The other two nodes show 1.1 million microseconds and 13.5 million microseconds. The only operator in play was "Scan". When I ran PROFILE a second time, immediately after the first, the same skewedness is observed, but with completely different nodes affected.
MC Profile of first run of slow query:
MC Profile of second run of slow query (run immediately after first run):
For baseline perspective, here's the \timing of 4 runs of the sample query when no background batch-loads are running. Even varying OBR_ID range from one run to the next, the response times are all <100 ms (not shown, but next 16 runs were also < 100ms).
HPSUPPORT=> SELECT * FROM HPSUPPORT.ONE_BILLION_ROWS WHERE OBR_ID BETWEEN 100000000 AND 100001000 AND OBR_RAND1 BETWEEN 0.2 AND 0.3 AND OBR_RAND2 BETWEEN 0.2 AND 0.3; Time: First fetch (13 rows): 50.389 ms. All rows formatted: 50.623 ms HPSUPPORT=> SELECT * FROM HPSUPPORT.ONE_BILLION_ROWS WHERE OBR_ID BETWEEN 200000000 AND 200001000 AND OBR_RAND1 BETWEEN 0.2 AND 0.3 AND OBR_RAND2 BETWEEN 0.2 AND 0.3; Time: First fetch (9 rows): 90.546 ms. All rows formatted: 90.573 ms HPSUPPORT=> SELECT * FROM HPSUPPORT.ONE_BILLION_ROWS WHERE OBR_ID BETWEEN 300000000 AND 300001000 AND OBR_RAND1 BETWEEN 0.2 AND 0.3 AND OBR_RAND2 BETWEEN 0.2 AND 0.3; Time: First fetch (9 rows): 75.071 ms. All rows formatted: 75.099 ms HPSUPPORT=> SELECT * FROM HPSUPPORT.ONE_BILLION_ROWS WHERE OBR_ID BETWEEN 400000000 AND 400001000 AND OBR_RAND1 BETWEEN 0.2 AND 0.3 AND OBR_RAND2 BETWEEN 0.2 AND 0.3; Time: First fetch (8 rows): 43.434 ms. All rows formatted: 43.465 ms
The requirement we are seeking to meet is to ensure that one specific user's "Little Sip" queries (queries similar to the example here) will consistently run in < 2 seconds (note, this is 50x slower than the "quiet environment" response time, so should not be such a tall order).
Other than restricting all batch-operations to non-business hours (not a viable option), is there anything more that can be done to help achieve the desired consistency for just one subset of high-priority queries?