Database Designer

naresh_way2 · November 2017

Can any one explain me how to write frequent queries(format) in .sql file to give database designer? With an example.

Jim_Knicely · November 2017

You can query the DC_REQUESTS_ISSUED tables. Something like this:

[dbadmin@s18384357 ~]$ vsql -Atc "select request from dc_requests_issued where request_type = 'QUERY' and time >= trunc(sysdate) - 7 and right(request, 1) = ';' and user_name <> 'dbadmin' group by request having count(*) > 5 limit 100;" -o /home/dbadmin/frequent_queries.sql

[dbadmin@s18384357 ~]$ cat /home/dbadmin/frequent_queries.sql
select c from il;
select count(*) from jim.irIS;

So the I got at most 100 of the queries executed by all users other than "dbadmin" more than 5 times in the previous week and stuck them in the file "/home/dbadmin/frequent_queries.sql".

Now I can pass this file to DBD!

naresh_way2 · November 2017

Thank you jim.

naresh_way2 · November 2017

Jim,I need pre join projection of normal table with flex tables.Can it possible?If it is not then what is the alternative solution to get best query performance?.Thanks in advance.

Jim_Knicely · November 2017

Hi,

If you have a question on a separate topic, can you please create a new post?

Anyway, wow, I've never seen that question asked before

Fyi ... Pre-join projections have been deprecated in Vertica 8.0.

See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/NewFeatures/_VersionIndependent/DeprecatedandRetiredFunctionality.htm

Although as of Vertica 9.0 you can still create pre-join projections, those that involve a FLEX table are not supported...

Example:

dbadmin=> SELECT table_name, table_schema, is_flextable FROM v_catalog.tables WHERE table_name IN ('cars', 'names');;
 table_name | table_schema | is_flextable
------------+--------------+--------------
 names      | public       | f
 cars       | public       | t
(2 rows)

dbadmin=> CREATE PROJECTION cars_names_prejoin AS SELECT maplookup(cars.__raw__, 'name') name1, names.name name2 FROM cars JOIN names ON maplookup(cars.__raw__, 'name') =  names.name;
ERROR 5600:  Invalid predicate in projection-select. Only PK=FK equijoins are allowed

dbadmin=> ALTER TABLE names ADD CONSTRAINT names_pk PRIMARY KEY (name);
WARNING 2623:  Column "name" definition changed to NOT NULL
WARNING 4887:  Table names has data. Queries using this table may give wrong results if the data does not satisfy this constraint
HINT:  Use analyze_constraints() to check constraint violation on data
ALTER TABLE

dbadmin=> ALTER TABLE cars ADD COLUMN name VARCHAR(10) NOT NULL DEFAULT name::VARCHAR(10);
ALTER TABLE

dbadmin=> ALTER TABLE cars ADD CONSTRAINT cars_fk1 FOREIGN KEY (name) REFERENCES names(name);
WARNING 4887:  Table cars has data. Queries using this table may give wrong results if the data does not satisfy this constraint
HINT:  Use analyze_constraints() to check constraint violation on data
ALTER TABLE

dbadmin=> CREATE PROJECTION cars_names_prejoin AS SELECT cars.name name1, names.name name2 FROM cars JOIN names ON cars.name = names.name;
WARNING 7399:  Created prejoin projection 'cars_names_prejoin'. Prejoin projections have been deprecated and will be removed in a future version
ROLLBACK 6092:  Unsupported access to flex table: No PREJOIN PROJECTION support

Note that final statement: "Unsupported access to flex table: No PREJOIN PROJECTION support"

naresh_way2 · November 2017

Thank you jim. But I need that type of requirement.Is there any way alternative to get better query performance of normal table with flex table?

Jim_Knicely · November 2017

The only way that I can think of to get better performance out of a FLEX table is to materialize them:

See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/FlexTables/MaterializingFlexTables.htm

Refer to the Note:
Materializing virtual columns by promoting them to real columns can significantly improve query performance. Vertica recommends that you materialize important virtual columns before running large and complex queries. Promoted columns cause a small decrease in load performance.

naresh_way2 · November 2017

Thank you jim.

Ben_Vandiver · November 2017

If you're on a recent version of Vertica, try out flattened tables, our replacement for prejoins.
https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/AnalyzingData/FlattenedTables/FlattenedTableCreate.htm?TocPath=Analyzing%20Data|Flattened%20Tables|_____2

An example:
create flex table bar();
create table foo (a int, b int default (select whatever::!int from bar where bar.b = foo.a));

sreeblr · March 2018

Hi Jim, should we have only request_type query or even load is useful?

thiago_rodrigo · October 2018

Hello, Ben! I'm having a similar problem here. I have a fact table with millions of registers with some dimensions. And I need to improve a performance of a query, initially I was trying to create a projection and end up with the error "Prejoin projection is deprecated", then I looked up more information about flattened tables that you suggested as a solution.
However, I have some doubts about this:

1- I was planning on creating a flattened table that used the fact table and the dimensions, but do not alter them, and them be able to create other flattened tables to aggregate data from fact and dimension tables.

2- Is there a way to create a flattened table totally based on the query I was using to create a projection? I was expecting something like create table fact_wide set using [query], is there a way to do it?

3- I only found 2 suggestions of use of flattened tables, creating a table based on other (like: CREATE TABLE fact_wide as select a, b,c from fact; and later add dimension columns one by one with a query related), or altering the fact table adding the dimension columns. Then I come to questions, if I create a flattened table like fact_wide, how the fact data can be refreshed on fact_wide?

4- Also if I alter the fact table (which I'm not sure is a good practice), do I really have to add a query column by column even if its the same dimension table (in my case there are a lot of columns in each dimension)

thiago_rodrigo · October 2018

I created another discussion to talk focused on "Prejoin Project is deprecated", flattened tables or other solutions for this deprecated item:

https://forum.vertica.com/discussion/239997/prejoin-projection-is-deprecated-flattened-tables/p1?new=1

AkshayBarakoti · February 2019

i have a doubt when i am using
SELECT time,user_name,transaction_id,request_type,request FROM dc_requests_issued
where request_type='QUERY'
order by time asc
it is giving me the list of queries that i have fired not by all the users

Jim_Knicely · February 2019

@AkshayBarakoti - Data Collector table data is stored in flat files in the DataCollector directory on each node. Each node stores its own data and each node has a policy (typically based on size) that determines how long the data is retained. If users are connecting to different nodes there is a potential that the data on each node might not go as far back in history as one another. Especially if most users connect to one node. The query history on that node would roll off more quickly than the other nodes giving the appearance that older queries for those users are not being retained

Quick example:

I have 3 nodes:

[dbadmin ~]$ vsql -ic "SELECT node_name, node_address FROM nodes;"
Timing is off.
    node_name    |  node_address
-----------------+----------------
v_mydb_node0001 | 192.168.61.227
v_mydb_node0002 | 192.168.61.228
v_mydb_node0003 | 192.168.61.229
(3 rows)

Set the disk space retention policy for the Data Collector component RequestsIssued (the DC_REQUESTS_ISSUED table) to 2K and clear:

[dbadmin ~]$ vsql -ic "SELECT set_data_collector_policy('RequestsIssued', '1500', '2');"
Timing is off.
set_data_collector_policy
---------------------------
SET
(1 row)

[dbadmin ~]$ vsql -ic "SELECT clear_data_collector('RequestsIssued');" -d mydb
Timing is off.
clear_data_collector
----------------------
CLEAR
(1 row)

Now run a query on node 1 as user JIM and JANE, and 1 query on node 2 as user JANE:

[dbadmin ~]$ vsql -U jim -h 192.168.61.227 -ic "SELECT * FROM dual;" -d mydb #node1
Timing is off.
dummy
-------
X
(1 row)

[dbadmin ~]$ vsql -U jane -h 192.168.61.227 -ic "SELECT * FROM dual;" -d mydb #node1
Timing is off.
dummy
-------
X
(1 row)

[dbadmin ~]$ vsql -U jane-h 192.168.61.228 -ic "SELECT * FROM dual;" -d mydb #node2
Timing is off.
dummy
-------
X
(1 row)

I have short SQL statement in a file that shows the disk usage for the Data Collector table, including the users JIM and JANE to show they still have query info stored:

[dbadmin ~]$ cat summary.sql
SELECT DISTINCT ri.node_name, ri.user_name, dc.current_disk_records, dc.current_disk_bytes
  FROM data_collector dc
  JOIN dc_requests_issued ri
    ON ri.node_name = dc.node_name
WHERE dc.table_name = 'dc_requests_issued'
   AND ri.user_name IN ('jim', 'jane');

[dbadmin ~]$ vsql -if summary.sql
Timing is off.
    node_name    | user_name | current_disk_records | current_disk_bytes
-----------------+-----------+----------------------+--------------------
v_mydb_node0001 | jane      |                    4 |               1834
v_mydb_node0001 | jim       |                    4 |               1834
v_mydb_node0002 | jane      |                    2 |                768
(3 rows)

Now I will run the disk usage query several times on node one until the disk usage exceeds 2K:

[dbadmin ~]$ vsql -if summary.sql
Timing is off.
    node_name    | user_name | current_disk_records | current_disk_bytes
-----------------+-----------+----------------------+--------------------
v_mydb_node0001 | jane      |                    4 |               2035
v_mydb_node0001 | jim       |                    4 |               2035
v_mydb_node0002 | jane      |                    2 |                768
(3 rows)

[dbadmin ~]$ vsql -if summary.sql
Timing is off.
    node_name    | user_name | current_disk_records | current_disk_bytes
-----------------+-----------+----------------------+--------------------
v_mydb_node0002 | jane      |                    2 |                768
(1 row)

See, I “lost” history for user JIM but not user JANE…

A Possible Solution:

To maintain a history for some period of time (i.e. 6 days), then set a time capacity for the RequestsIssued data Collector tables on all nodes:

[dbadmin ~]$ vsql -ic "SELECT set_data_collector_time_policy('RequestsIssued', '6 days'::interval);"
Timing is off.
set_data_collector_time_policy
--------------------------------
SET
(1 row)

For more info, checkout:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/VerticaFunctions/DataCollection/SET_DATA_COLLECTOR_TIME_POLICY.htm
https://forum.vertica.com/discussion/240231/keep-data-collector-information-for-a-set-interval

Jim_Knicely · February 2019

@AkshayBarakoti - One more thing... Are you running the query as the DBADMIN user?

Non admin users only see their own data in the system tables by default:

dbadmin=> SELECT count(*) FROM dc_requests_issued WHERE request_type='QUERY';
 count
-------
  2528
1 row)

dbadmin=> \c - testuser
You are now connected as user "testuser".

dbadmin=> SELECT count(*) FROM dc_requests_issued WHERE request_type='QUERY';
 count
-------
     2

AkshayBarakoti · February 2019

@Jim_Knicely thanks man really help actually i was running the query with my user but when i switched to dbadmin it works
one more thing i am from sql server (Microsoft Transact SQL ) background and facing issues with vertica concepts and syntax.it would be a great help if you give me some quick links to learn vertica from scratch

We're Moving!

Create My New Community Account Now

Database Designer

Comments

Leave a Comment