vertica read disk too much

lwmonster · August 2017

hi, all:

I have encountered a problem, I have build a vertica cluster with 3 nodes and had written some data(less than 1% of the license), today, I found there is one thread of vertica DB which use too much IO(20M/s read);

When use iostat command to check, the [%util] is always between 90% and 100%; When use iotop command, it shows that there is one thread of vertica DB read too much. EVEN THERE IS NO load and query operation on the DB.

I have googled for some days, but didn't get any good reason, please help me: what is the thread doing? Why it need to read so much disk? and most importantly, **HOW to reduce the utility of disk? **

lwmonster · August 2017

Do I describe it not clear?
No one answer me ......

emoreno · August 2017

Hi lwmonster,

The other vertica operation that can be reading from disk and writing is the TUPLE MOVER. Check the table tuple_mover_operations to see if there are mergeout happening.

Eugenia

lwmonster · August 2017

@emoreno

hi, emoreno:

From execution_engine_profiles and query_events and query_requests, I found there are lots of RESEGMENTED_MANY_ROWS executions(COUNTER_NAME is bytes read from disk ), and it is said that RESEGMENTED_MANY_ROWS is a Negative Execution Engine Event in this post: https://my.vertica.com/kb/System-Tables-For-Performance/Content/BestPractices/System-Tables-For-Performance.htm.

How can I reduce the number of  RESEGMENTED_MANY_ROWS executions?

The following is my SQL:

WITH EVENTS AS
  (SELECT session_id,
          transaction_id,
          STATEMENT_ID,
          event_type,
          event_description,
          event_timestamp
   FROM v_monitor.query_events
   WHERE event_timestamp > '2017-08-25 15:00:00+0800'), counter AS
  (SELECT session_id,
          transaction_id,
          STATEMENT_ID,
          operator_name,
          counter_name,
          sum(counter_value) AS counter_value
   FROM v_monitor.execution_engine_profiles
   WHERE counter_name LIKE '%disk%'
     AND (session_id,
          transaction_id,
          STATEMENT_ID) IN
       (SELECT session_id,
               transaction_id,
               STATEMENT_ID
        FROM EVENTS)
   GROUP BY session_id,
            transaction_id,
            STATEMENT_ID,
            operator_name,
            counter_name),
                                                        qr AS
  (SELECT session_id,
          transaction_id,
          STATEMENT_ID,
          request,
          request_duration_ms
   FROM v_monitor.query_requests
   WHERE (session_id,
          transaction_id,
          STATEMENT_ID) IN
       (SELECT session_id,
               transaction_id,
               STATEMENT_ID
        FROM EVENTS) )
SELECT counter.*,
       events.event_type,
       events.event_description,
       events.event_timestamp,
       qr.request,
       qr.request_duration_ms
FROM counter,
     EVENTS,
     qr
WHERE counter.session_id = events.session_id
  AND counter.transaction_id = events.transaction_id
  AND counter.statement_id = events.statement_id
  AND counter.session_id = events.session_id
  AND counter.transaction_id = qr.transaction_id
  AND qr.statement_id = qr.statement_id
ORDER BY counter.counter_value DESC;

lwmonster · August 2017

@emoreno

hi, emoreno:

I have checked the table tuple_mover_operations, I found that it would do some MergeOut operations every 5 minutes(the MergeOutInterval was changed to 300); But each time SUM(total_ros_used_bytes) is about 24343148, it means that each time the MergeOut operation would read about 24M data from disk, and it is not match with the IO usage(the IO usage is about 20M/s), there is a large difference between them.

lwmonster · August 2017

@ emoreno

I have checked the total_ros_used_bytes of a whole day, without any PLAN_TYPE filter, the result of SUM( total_ros_used_bytes) is about 7G; Compare with IO usage, one DB would read about 1.7T data from disk, the gap is too large;

Any other clue?

What is the DB doing...

lwmonster · August 2017

HELP......

emoreno · August 2017

Check in the /var/log/messages or something in the SO that could indicate something on the disk. Vertica does not read just for read, if you are not doing anything and the TM is not working, there is no reason that you should have high IO.

lwmonster · August 2017

The TM is working, but the SUM( total_ros_used_bytes) from table tuple_mover_operations and read_kbytes_per_sec from table io_usage are not match; and the gap is too large.

So, except TM, is there any other operations will read disk?

lwmonster · August 2017

Would too many RESEGMENTED_MANY_ROWS event will impact the IO usage?

lwmonster · August 2017

Now, there are no QUERY but only load operation.

emoreno · August 2017

the resgmented_many_rows means that the data needs to be restribuited on the nodes because you have a join that the data is not segmented by the join key so need to redistribuited the data. It reads the data but it should not read more than without having to resdistribuite it. To avoid this message, you need to segment a projection by the same key or replicate the small projection ( normally the inner side of the query) , if you can avoid it, it should be OK too, it is normal, just indicate that and that in the middle of the query you have too send data through the network

Did you run vioperf? It will simulate vertica workload so do not do it when the database is working and that will tell you what is the IO that you have in the disks.

lwmonster · August 2017

@emoreno
I didn't use vioperf to test, should I try it?

I used pstack to print the vertica process stack when I find one thread was reading disk; The result is as following:

Thread 35 (Thread 0x7f400ffcf700 (LWP 4231)):
#0  0x00000030464db51d in read () from /lib64/libc.so.6
#1  0x0000000002bba990 in SAL::LinuxFileOperator::read(void*, unsigned long) ()
#2  0x0000000002bb3a70 in SAL::IndexReader::open() ()
#3  0x0000000002b3d8f7 in SAL::FileColumnReaderT<SAL::ValIndexPidxEntry64>::readBlock(unsigned long long, unsigned int, Basics::Block*, bool, unsigned int*, unsigned long long*, bool) ()
#4  0x0000000001b8570d in EE::ROSSALColumnAccess<EE::DecodeString, false, true>::goToPosition(unsigned long long, void const**, unsigned long long*) ()
#5  0x0000000001ec4ce3 in CMD::collectSamples(std::vector<unsigned long long, std::allocator<unsigned long long> >, unsigned int, std::vector<CMD::StorageVitals, std::allocator<CMD::StorageVitals> >&, CMD::SamplingVitals&) ()
#6  0x0000000001ecdc04 in CMD::sampleDatabase(CMD::SamplingRequest) ()
#7  0x0000000001425471 in Dist::SampleNodeCellsMessage::invoke() ()
#8  0x00000000011ad409 in Dist::DistCalls::dispatchIncomingCallMessage(Dist::CallWireFormat&, Basics::ByteBuffer&, bool*) ()
#9  0x000000000121ce35 in Dist::distcallDispatchThread(Session::SessionID) ()
#10 0x00000000013fd651 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, void (*)(Session::SessionID), boost::_bi::list1<boost::_bi::value<Session::SessionID> > >, void>::invoke(boost::detail::function::function_buffer&) ()
#11 0x000000000142ef7b in boost::function0<void>::operator()() const ()
#12 0x0000000003652aef in Session::ThreadManager::launchThread(boost::function0<void> const&) ()
#13 0x0000000003d97830 in thread_proxy ()
#14 0x0000003046807aa1 in start_thread () from /lib64/libpthread.so.0
#15 0x00000030464e893d in clone () from /lib64/libc.so.6

It shows that the thread is invoking these methods:
Dist::SampleNodeCellsMessage CMD::sampleDatabase CMD::collectSamples EE::ROSSALColumnAccess
Do you know what do these method do? Seems it is doing some sample things on the DB.

emoreno · August 2017

I am wondering if it is our audit.

To see when the audit runs you can use this procedure
select get_audit_time();

and to see how long it is taking and the time that it run

select audit_start_timestamp, audit_end_timestamp from license_audits order by 1 desc;

Are the time of the audit match the IO increase that you see?

lwmonster · August 2017

@emoreno

hi, emoreno:
two results:

=> select get_audit_time();
                   get_audit_time
-----------------------------------------------------
 The audit is scheduled to run at 11:59 PM each day.

=> select audit_start_timestamp, audit_end_timestamp from license_audits order by 1 desc;
     audit_start_timestamp     |      audit_end_timestamp
-------------------------------+-------------------------------
 2017-08-29 01:36:40.124765+00 | 2017-08-29 01:37:55.959363+00
 2017-08-29 01:35:55.940457+00 | 2017-08-29 01:37:29.714507+00
 2017-08-29 01:35:42.197343+00 | 2017-08-29 01:37:18.139218+00
 2017-08-29 01:33:07.752538+00 | 2017-08-29 01:34:52.271813+00
 2017-08-29 01:32:32.88725+00  | 2017-08-29 01:34:05.679389+00
 2017-08-29 01:32:19.110099+00 | 2017-08-29 01:34:17.169441+00
 2017-08-29 01:30:39.492889+00 | 2017-08-29 01:31:43.992281+00
 2017-08-29 01:29:57.388694+00 | 2017-08-29 01:31:22.804387+00

Seems the DB do audit in every minute.
Does the audit too frequently?

vertica read disk too much

Comments

Leave a Comment