Query performance

naresh_way2 · January 2018

I have 1 crore data

user_testing=> select count(*) from likes_test ;
  count   
----------
 10000001
(1 row)

Time: First fetch (1 row): 8.088 ms. All rows formatted: 8.116 ms

The data contains like this

 [
{"id":1,
"likes":[401,218,134,941,565,457,864,346,774,921,766,432,815,63,822,202,679,409,457,743,902,199,717,232,172,958,179,808,271,418,145,949,806,399,68,935,423,287,31,365,68,875,347,786,386,792,620,903,136,67,539,196,866,419,390,612,591,881,68,575,226,291,354,380,889,105,412,511,411,861,61,802,388,168,655,296,43,541,824,756,649,2,204,396,231,499,42,773,918,19,229,716,675,762,939,352,35,949,445,787,215,524,998,596,267,444,690,52,762]}
]

Count Query on likes, it takes 29 seconds

#

 user_testing=> select count(id) from likes_test where mapcontainsvalue(likes,'2')='t' or mapcontainsvalue(likes,'4');
  count  
---------
 1968654
(1 row)

Time: First fetch (1 row): 29661.954 ms. All rows formatted: 26661.985 ms

Then I decreased interval of AnalyzeRowCount service,Default value is 60

    user_testing=> ALTER DATABASE db_testing_0 SET AnalyzeRowCountInterval = 10;
        ALTER DATABASE
        Time: First fetch (0 rows): 6.891 ms. All rows formatted: 6.902 ms

Tried the same query, it takes 27 seconds

user_testing=> select count(id) from likes_test where mapcontainsvalue(likes,'2')='t' or mapcontainsvalue(likes,'4');
  count  
---------
 1968654
(1 row)

Time: First fetch (1 row): 27388.772 ms. All rows formatted: 27388.800 ms

Then I created projection and tried the same query it takes 26 seconds

#

 user_testing=> select count(id) from likes_test_p where mapcontainsvalue(likes,'2')='t' or mapcontainsvalue(likes,'4');
  count  
---------
 1968654
(1 row)

Time: First fetch (1 row): 26661.954 ms. All rows formatted: 26661.985 ms

What I have to do to get counts in less time, Provide solution
Thanks in Advance

sKwa · January 2018

Hi!

Flex tables in Vertica aren't created for performance, but for unstructured/semi-structured data. Following to your example, your data can be represented as structured data, so do unpivot to your data and create a regular table, i.e.:

 id  |   likes
-----+--------
    1| 201
    1| 315
 ...

Segment your table + create good projection/s (ORDER BY id, likes).

PS: Vertica is RDBMS and not Document Oriented DBMS.

Jim_Knicely · January 2018

AnalyzeRowCount is only used to update row count statistics. It has nothing to do with running a COUNT query.

Is the "likes_test" table a FLEX table? These are not built for performance. You will get MUCH better performance if you materialize the columns in your WHERE clause predicates.

Oops. Just saw the post by @sKwa! He is correct and I would go with his suggestion.

Ben_Vandiver · January 2018

You can unroll the likes array with mapvalues(). Something like:
CREATE FLEX LOCAL TEMP TABLE staging(id int) segmented by hash(id) all nodes;
COPY staging FROM '...' PARSER fjsonparser();
INSERT INTO likes SELECT id, mapvalues(likes) OVER (partition by id) FROM staging;

Jim_Knicely · January 2018

Great idea Ben_Vandiver

Complete example (with one record):

dbadmin=> CREATE FLEX LOCAL TEMP TABLE staging(id int) ON COMMIT PRESERVE ROWS SEGMENTED BY HASH(id) ALL NODES;
CREATE TABLE

dbadmin=> \! cat /home/dbadmin/likes.json
[
{"id":1,
"likes":[401,218,134,941,565,457,864,346,774,921,766,432,815,63,822,202,679,409,457,743,902,199,717,232,172,958,179,808,271,418,145,949,806,399,68,935,423,287,31,365,68,875,347,786,386,792,620,903,136,67,539,196,866,419,390,612,591,881,68,575,226,291,354,380,889,105,412,511,411,861,61,802,388,168,655,296,43,541,824,756,649,2,204,396,231,499,42,773,918,19,229,716,675,762,939,352,35,949,445,787,215,524,998,596,267,444,690,52,762]}
]

dbadmin=> COPY staging FROM '/home/dbadmin/likes.json' PARSER fjsonparser();
 Rows Loaded
-------------
           1
(1 row)

dbadmin=> CREATE TABLE likes (id INT, values INT) ORDER BY id, values SEGMENTED BY HASH(id) ALL NODES;
CREATE TABLE

dbadmin=> INSERT INTO likes SELECT id, values::int FROM (SELECT id, mapvalues(likes) OVER (partition by id) FROM staging) foo;
 OUTPUT
--------
    109

(1 row)

dbadmin=> SELECT COUNT(DISTINCT id) FROM likes WHERE values IN (2, 4);
 COUNT
-------
     1
(1 row)

Ben_Vandiver · January 2018

I think that needs to be a COUNT(DISTINCT ID) to reflect the original intention.

Jim_Knicely · January 2018

I corrected the complete example

ashu123 · January 2018

hello @Jim_Knicely @Ben_Vandiver
I followed what you have said, this is my likes data
subscriber_data=> select count(*) from likes_proj;commit;
count
------------
1382890288
(1 row)

    Time: First fetch (1 row): 154.654 ms. All rows formatted: 154.717 ms

And I need to join this likes_proj with another subscriber_8 table count
subscriber_data=> select count(*) from subscriber_8;commit;
count
----------
57548349
(1 row)

    Time: First fetch (1 row): 7245.067 ms. All rows formatted: 7245.099 ms

I queried like this it takes 1 minute 35 seconds
subscriber_data=> select count(distinct s1.id) from likes_proj s1,projection_base_subscriber_8 s2 where esp=1 and un=0 and s1.id=s2.id and values in (4,5,6,8,133,699,999);commit;
count
---------
1959111
(1 row)

        Time: First fetch (1 row): 95715.810 ms. All rows formatted: 95715.841 ms

Can anyone suggest me to get fast counts?

Jim_Knicely · January 2018

Hi,

I wouldn't join projections together. You should join tables.

Can you post the DDL for the tables? I'm sure the projections can be tuned for that query. In fact, why not run database designer on it? It's there to do the hard work for you!

See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/ConfiguringTheDB/PhysicalSchema/DBD/DesignQueries.htm

Make sure you've analyzed stats on the tables.

See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/Statistics/CollectingDatabaseStatistics.htm

Try to use the newer ANSI SQL-92 join syntax. I find that it performs much better in Vertica!

For example:

select count(distinct s1.id)
from likes_table s1
join projection_base_subscriber_8 s2
on s1.id = s2.id
where esp=1 
and un=0
and values in (4,5,6,8,133,699,999);

See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AnalyzingData/Queries/Joins/JoinSyntax.htm