CPU usage per query
How can I tell how much CPU load is being generated by the queries that are currently executing?
I'm currently seeing rather high CPU load across the cluster, but none of the currently executing queries look much different than our ordinary load, which historically has had much lower CPU load.
Using "top" on each of the nodes, all the CPU load is from the vertica process, so it's not something else on the OS causing this.
Regardless of my current problem, I would really like to know how to monitory CPU usage by query. All I can find is ways to monitor overall CPU usage.
I'm currently seeing rather high CPU load across the cluster, but none of the currently executing queries look much different than our ordinary load, which historically has had much lower CPU load.
Using "top" on each of the nodes, all the CPU load is from the vertica process, so it's not something else on the OS causing this.
Regardless of my current problem, I would really like to know how to monitory CPU usage by query. All I can find is ways to monitor overall CPU usage.
0
Comments
Or you can see the output of your system_resource_usage
-this will give you an output of your cpu % per node on each minute. -this cpu output is responsible for the entire Vertica instance consumption.
You could also use the client_pid from the sessions table, and with it you can output the cpu usage of that linux pid : One other way to see your cpu usage is by running the explain command with the verbose option:
As I understand it, the explain plan gives estimated resource usage. I need to know what is being used by a query that is currently executing, which, due to many different factors, may be completely different than the explain plan's estimate.
Is there any way to break down CURRENT cpu usage by query?
It's not so easy to calculate a CPU usage itself, so Im afraid its not possible.
But you can do a some synthetic test with pidstat and mpstat while passing a PID of Vertica and not client. Take a cpu usage of Vertica for a minute in IDLE; take usage with query and subtract "IDLE usage".
PS
But you can try (profile has counter - How is CPU usage calculated?
http://www.design-reuse.com/articles/8289/how-to-calculate-cpu-utilization.html
https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/PROFILE.htm...
Also -- maybe you have background system tasks running, in addition to your regular queries? The MC should display these; you might also consider looking at, for example, the "tuple_mover_operations" system table, to see if there's a long-running mergeout or other ongoing operation.
For what it's worth, to directly answer the question: The "execution_engine_profiles" system table contains various counters that attempt to speak to resource usage of running queries.
Many of "execution_engine_profiles"'s fields are undocumented. Because we don't document what they do, we make no guarantees about how well they do it :-) And it's certainly not easy to read. But you may be able to gain some insights by poking around in there.
(The table has columns for the session, transaction, and statement ID's. Those three columns form a composite key that can be joined with many other system tables; from there you can figure out which query a particular row belongs to, etc.)
--Sharon
Want to dig deeper into it.
Please correct me if I wrong anywhere.
Suppose I have 3 node cluster. Each node is 48 cores (cpu's)
so when I run a query and they try to check the CPU usage of it. using this query I get a resultset like this: The resultset continues for next two nodes also with same operator name
Going ahead, as you said, each instance of operator represent a thread, so I tried this query for node 1
Lets consider operator scan and its value is 3, so does that mean only three CPU's from node1 participated in scanning process and for CPU usgae to 100 % it should include all CPU.i.e. 48 and the value for scan should be 48 ?
Also, How can we calculate the CPU usage for the complete query on node 1 from this information.
Thanks in Advance.
Yes - your analysis is accurate regarding the number of threads used per operator.
It's hard to quantify the cpu usage for the entire query, since some portions of a query will fan out over all cores, some over a subset of cores, some will execute on only a single core. Some execute concurrently with other operators while some need to wait on data from other operators. The "execution time" counter can provide insight into where the cpu time is being spent, but I don't tend to try to draw conclusions about total cpu usage from that data.
--Sharon
- Im afraid it possible to calculate accurate CPU usage on OpenVMS only(and not even on all hw).
- You can find over Internet a lot of threads that "old" utils like top do not calculates CPU usage properly. What talk about sub-task of Vertica.
- Since we dont know a couple things (I belive Vertica also manages a type of task priority heap - at least resource pool) like execution "quantum" time, priority queue and so on its going to be impossible.
- Over Internet you can find some example for MS SQL, but be careful it can be useful only as indicator against a differ query that returns same result only. But a Vertica cost its a same indicator and it accurate enough.
The inherent parallelism on a node can be accounted for by cumulating CPU usage on all CPUs ( cores) where the query is running and has run in the past.
one query stuck and takes longer than usual and CPU usage is around %98 percent while memory usage is very low.
same query runs normal every day. the problem i see is: its using a table logins which is arount 300mil table
running count or select limit 10 on this table takes too. even close_session of query is not working;
i believe the table corrupted somehow or something is get realy messed after last load to this table.
had same issue before and truncating and reloading the table solved issue before.
feel like similar issue
Not clear what you call table logins , please elaborate .
@Eli
Ismail means that query system tables like sessions(logins) or meta functions like "close_sessions()" can take a long time.
Odd bug. I also familiar with this issue, sometimes projection refreshes can cause to this issue. But who cares from bugs (except Vertica clients)? I know about an open bug for more than 2 years and no one care.
login table is a table which we save customer logins. so its one of the biggest table with ~300mil record.
im not 100 percent sure if this is the correct solution but i want to share how we solved our issue.
in our scripts we are doing alot of delete inserts and we see that it is creating a lot of delete vectors.
we started to run following code daily at the end of our loads. and this code helped us a lot to get good performance and since then we haven't got into same problem.
select make_ahm_now();
select purge();