High Cpu Utilization

We have cluster of node 34, we found cpu utilization is high on one node. i am not able to find the exact reason why cpu utilization was high

Comments

  • Nitin_singhNitin_singh Registered User

    please suggest, if any one have the anwer

  • TomMTomM Employee, Registered User, VerticaExpert

    The diagnosis and fix may take a few attempts, and why don't you start with our validation scripts? Here are two links, an overview and one for the CPU issue you're having.

    https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/InstallationGuide/scripts/ValidationScripts.htm

    https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/InstallationGuide/scripts/vcpuperf.htm

  • Jim_KnicelyJim_Knicely Employee, Registered User, VerticaExpert
    edited September 26

    In addition to Tom's suggestions, maybe check that you are not overloading that node with a disproportional number client sessions:

    select node_name, count(*) from user_sessions group by node_name order by 2 desc;
    

    Or ...

    select node_name, count(*) from user_sessions where session_start_timestamp >= 'DATE1' and session_end_timestamp <= 'DATE2' group by node_name order by 2 desc;
    

    Replace DATE1 and DATE2 with a date range where you saw the spike.

  • Nitin_singhNitin_singh Registered User

    Thanks Guru. let me check.

  • Nitin_singhNitin_singh Registered User

    I did not found high number of client sessions on problematic node also i run vcpuperf command on two node. below is the output

    problematic node:

    Compiled with: 4.8.2 20140120 (Red Hat 4.8.2-15)
    Expected time on Core 2, 2.53GHz: ~9.5s
    Expected time on Nehalem, 2.67GHz: ~9.0s
    Expected time on Xeon 5670, 2.93GHz: ~8.0s

    This machine's time:
    CPU Time: 15.030000s
    Real Time:15.210000s

    Some machines automatically throttle the CPU to save power.
    This test can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz).
    Low load times much larger than 100-200us or much larger than the corresponding high load time
    indicate low-load throttling, which can adversely affect small query / concurrent performance.

    This machine's high load time: 130 microseconds.
    This machine's low load time: 327 microseconds.

    Other node:

    Compiled with: 4.8.2 20140120 (Red Hat 4.8.2-15)
    Expected time on Core 2, 2.53GHz: ~9.5s
    Expected time on Nehalem, 2.67GHz: ~9.0s
    Expected time on Xeon 5670, 2.93GHz: ~8.0s

    This machine's time:
    CPU Time: 11.470000s
    Real Time:11.470000s

    Some machines automatically throttle the CPU to save power.
    This test can be done in <100 microseconds (60-70 on Xeon 5670, 2.93GHz).
    Low load times much larger than 100-200us or much larger than the corresponding high load time
    indicate low-load throttling, which can adversely affect small query / concurrent performance.

    This machine's high load time: 65 microseconds.
    This machine's low load time: 132 microseconds.

    please check above output and suggest if there is any problem also suggest which point i have to check on vcpuperf command output.

  • Jim_KnicelyJim_Knicely Employee, Registered User, VerticaExpert
    edited September 27

    Hi,

    Do each of the nodes have the same processors? Check with this SQL:

    select host_name, processor_count, processor_core_count, processor_description from host_resources order by 1;
    

    Are there any other processes (besides Vertica) running on the node experiencing the issue? The following Linux command should list the top 10 processes by CPU usage:

    ps -Ao user,uid,comm,pid,pcpu,tty --sort=-pcpu | head -n 10

    From your vcpuperf output, it looks like CPU scaling might be enabled. You should disable it. See:

    https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/InstallationGuide/BeforeYouInstall/cpuscaling.htm

    Also, check the following post thread to see if it can help you: https://forum.vertica.com/discussion/238751/high-cpu-usage