Query performance
I have 1 crore data
user_testing=> select count(*) from likes_test ; count ---------- 10000001 (1 row) Time: First fetch (1 row): 8.088 ms. All rows formatted: 8.116 ms
The data contains like this
[ {"id":1, "likes":[401,218,134,941,565,457,864,346,774,921,766,432,815,63,822,202,679,409,457,743,902,199,717,232,172,958,179,808,271,418,145,949,806,399,68,935,423,287,31,365,68,875,347,786,386,792,620,903,136,67,539,196,866,419,390,612,591,881,68,575,226,291,354,380,889,105,412,511,411,861,61,802,388,168,655,296,43,541,824,756,649,2,204,396,231,499,42,773,918,19,229,716,675,762,939,352,35,949,445,787,215,524,998,596,267,444,690,52,762]} ]
Count Query on likes, it takes 29 seconds
#
user_testing=> select count(id) from likes_test where mapcontainsvalue(likes,'2')='t' or mapcontainsvalue(likes,'4'); count --------- 1968654 (1 row) Time: First fetch (1 row): 29661.954 ms. All rows formatted: 26661.985 ms
Then I decreased interval of AnalyzeRowCount service,Default value is 60
user_testing=> ALTER DATABASE db_testing_0 SET AnalyzeRowCountInterval = 10; ALTER DATABASE Time: First fetch (0 rows): 6.891 ms. All rows formatted: 6.902 ms
Tried the same query, it takes 27 seconds
user_testing=> select count(id) from likes_test where mapcontainsvalue(likes,'2')='t' or mapcontainsvalue(likes,'4'); count --------- 1968654 (1 row) Time: First fetch (1 row): 27388.772 ms. All rows formatted: 27388.800 ms
Then I created projection and tried the same query it takes 26 seconds
#
user_testing=> select count(id) from likes_test_p where mapcontainsvalue(likes,'2')='t' or mapcontainsvalue(likes,'4'); count --------- 1968654 (1 row) Time: First fetch (1 row): 26661.954 ms. All rows formatted: 26661.985 ms
What I have to do to get counts in less time, Provide solution
Thanks in Advance
0
Comments
Hi!
Flex tables in Vertica aren't created for performance, but for unstructured/semi-structured data. Following to your example, your data can be represented as structured data, so do unpivot to your data and create a regular table, i.e.:
Segment your table + create good projection/s (
ORDER BY id, likes
).PS: Vertica is RDBMS and not Document Oriented DBMS.
AnalyzeRowCount is only used to update row count statistics. It has nothing to do with running a COUNT query.
Is the "likes_test" table a FLEX table? These are not built for performance. You will get MUCH better performance if you materialize the columns in your WHERE clause predicates.
Oops. Just saw the post by @sKwa! He is correct and I would go with his suggestion.
You can unroll the likes array with mapvalues(). Something like:
CREATE FLEX LOCAL TEMP TABLE staging(id int) segmented by hash(id) all nodes;
COPY staging FROM '...' PARSER fjsonparser();
INSERT INTO likes SELECT id, mapvalues(likes) OVER (partition by id) FROM staging;
Great idea Ben_Vandiver
Complete example (with one record):
I think that needs to be a COUNT(DISTINCT ID) to reflect the original intention.
I corrected the complete example
hello @Jim_Knicely @Ben_Vandiver
I followed what you have said, this is my likes data
subscriber_data=> select count(*) from likes_proj;commit;
count
------------
1382890288
(1 row)
And I need to join this likes_proj with another subscriber_8 table count
subscriber_data=> select count(*) from subscriber_8;commit;
count
----------
57548349
(1 row)
I queried like this it takes 1 minute 35 seconds
subscriber_data=> select count(distinct s1.id) from likes_proj s1,projection_base_subscriber_8 s2 where esp=1 and un=0 and s1.id=s2.id and values in (4,5,6,8,133,699,999);commit;
count
---------
1959111
(1 row)
Can anyone suggest me to get fast counts?
Hi,
I wouldn't join projections together. You should join tables.
Can you post the DDL for the tables? I'm sure the projections can be tuned for that query. In fact, why not run database designer on it? It's there to do the hard work for you!
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/ConfiguringTheDB/PhysicalSchema/DBD/DesignQueries.htm
Make sure you've analyzed stats on the tables.
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/Statistics/CollectingDatabaseStatistics.htm
Try to use the newer ANSI SQL-92 join syntax. I find that it performs much better in Vertica!
For example:
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AnalyzingData/Queries/Joins/JoinSyntax.htm