We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


BUG: soft lockup - CPU for Vertica — Vertica Forum

BUG: soft lockup - CPU for Vertica

Hello Everyone,

We recently encountered another outage on our on prem Vertica install. I was looking at the logs generated via scrutinize and one thing that I found interesting are these entries on the dmesg log

`[2002085.759439] Code: c0 00 4d 89 ee 48 89 4d b0 41 89 c5 eb 1d 90 49 83 c7 01 48 83 c3 40 4d 39 fc 0f 86 07 01 00 00 41 83 c5 01 4d 85 f6 4c 0f 44 f3 <8b> 43 18 83 f8 80 75 dc 48 8b 45 b8 0f b6 55 c0 48 8d 75 c8 4c
[2002113.703071] BUG: soft lockup - CPU#6 stuck for 22s! [vertica:31668]
[2002113.703461] Modules linked in: ppdev vmw_balloon crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd serio_raw pcspkr vmw_vmci i2c_piix4 shpchp parport_pc parport binfmt_misc dm_multipath ext4 mbcache jbd2 sd_mod sr_mod cdrom crc_t10dif ata_generic crct10dif_common pata_acpi mptspi drm_kms_helper scsi_transport_spi ttm mptscsih ata_piix drm libata mptbase vmxnet3 i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod
[2002113.706248] CPU: 6 PID: 31668 Comm: vertica Not tainted 3.10.0-229.14.1.el7.x86_64 #1
[2002113.706849] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[2002113.707550] task: ffff880fe4c40b60 ti: ffff880b2cdb0000 task.ti: ffff880b2cdb0000
[2002113.708141] RIP: 0010:[] [] compaction_alloc+0xf8/0x240
[2002113.708831] RSP: 0000:ffff880b2cdb3908 EFLAGS: 00000202
[2002113.709232] RAX: ffff88103ff9a6a0 RBX: 00000000005ac000 RCX: 0000000000000000
[2002113.709924] RDX: ffff88103ff9a000 RSI: ffff880b2cdb38c0 RDI: ffff88103ff9d068
[2002113.710570] RBP: ffff880b2cdb3948 R08: ffff880b2cdb3aa8 R09: ffff88103ff9d000
[2002113.711164] R10: 0000000000103a00 R11: 0000000001040000 R12: 000000001c998000
[2002113.711749] R13: 0000000001040000 R14: ffff88103ff9e008 R15: ffffffff81179ce9
[2002113.712335] FS: 00007fb0cba64700(0000) GS:ffff880fff2c0000(0000) knlGS:0000000000000000
[2002113.712947] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2002113.713309] CR2: 00007fa66e81d210 CR3: 0000000e3a9ac000 CR4: 00000000000407e0
[2002113.713899] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2002113.714485] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[2002113.715074] Stack:
[2002113.715324] 0000000000103a00 ffff88103ff9d000 0000000001040000 ffffea00040dfe40
[2002113.715933] ffff880b2cdb3a60 ffffea00040dfe00 ffffea00040dfe60 ffff880fe4c40b60
[2002113.716555] ffff880b2cdb39e8 ffffffff811b199e ffff880fe4c40b60 000000002cdb3aa8

`

CPU Details

` Static hostname: localhost.localdomainsudo -i

     Icon name: computer-vm
       Chassis: vm
    Machine ID: 247cd847e16e4d5fa0b4fd08abe51193
       Boot ID: 89f515ffa1e5466096df4cbc241497ad

Operating System: Red Hat Enterprise Linux Server 7.1 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.1:GA:server
Kernel: Linux 3.10.0-229.14.1.el7.x86_64
Architecture: x86_64

I'm not a linux guy and trying to do the RCA . Not sure if I could correlate the disk usage below to the 'CPU contention' issue above and the eventual vertica outage. Any other areas which are worth looking at this point?

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.25 0.01 0.49 0.47 4.29 19.08 0.00 2.69 3.01 2.69 0.58 0.03
sdb 0.00 7.20 0.01 6.64 0.58 169.66 51.17 0.02 3.73 10.28 3.72 0.29 0.19
dm-0 0.00 0.00 0.00 0.02 0.13 0.08 17.00 0.00 6.57 2.48 7.45 0.39 0.00
dm-1 0.00 0.00 0.00 0.37 0.22 2.45 14.38 0.00 3.25 3.48 3.25 0.41 0.02
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.66 0.66 0.00 0.52 0.00
dm-3 0.00 0.00 0.01 13.84 0.58 169.66 24.58 0.05 3.49 10.13 3.48 0.14 0.19
dm-4 0.00 0.00 0.00 0.00 0.01 0.09 77.10 0.00 8.06 3.48 8.51 0.49 0.00
dm-5 0.00 0.00 0.00 0.25 0.00 1.24 9.80 0.00 0.91 1.13 0.91 0.51 0.01
dm-6 0.00 0.00 0.00 0.03 0.01 0.04 3.04 0.00 1.37 8.46 1.35 0.49 0.00
dm-7 0.00 0.00 0.00 0.00 0.00 0.00 7.99 0.00 1.69 2.04 0.80 0.57 0.00
dm-8 0.00 0.00 0.00 0.06 0.08 0.38 15.46 0.00 3.98 4.27 3.98 0.33 0.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 2.33 0.00 2.00 0.00 17.33 17.33 0.00 0.33 0.00 0.33 0.33 0.07
sdb 0.00 10.67 0.00 1.00 0.00 48.00 96.00 0.00 1.67 0.00 1.67 1.67 0.17
dm-0 0.00 0.00 0.00 2.33 0.00 9.33 8.00 0.00 0.86 0.00 0.86 0.14 0.03
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 11.67 0.00 48.00 8.23 0.03 2.89 0.00 2.89 0.14 0.17
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 1.67 0.00 6.67 8.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-8 0.00 0.00 0.00 0.33 0.00 1.33 8.00 0.00 1.00 0.00 1.00 1.00 0.03

` iostat
Linux 3.10.0-229.14.1.el7.x86_64 (uspnsvulx162.test.ua3.eslabs.svcs.hpe.com) 06/28/2017 x86_64 (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
0.41 0.44 0.42 0.02 0.00 98.71

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.50 0.47 4.29 4560097 41696233
sdb 6.66 0.58 172.75 5664977 1679077828
dm-0 0.02 0.13 0.08 1229677 780460
dm-1 0.37 0.22 2.45 2132189 23845512
dm-2 0.00 0.00 0.00 888 0
dm-3 13.86 0.58 172.75 5664089 1679077828
dm-4 0.00 0.01 0.09 139505 921076
dm-5 0.25 0.00 1.24 29645 12086960
dm-6 0.03 0.01 0.04 126798 356285
dm-7 0.00 0.00 0.00 2253 884
dm-8 0.06 0.08 0.38 793169 3701944
`

Comments

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file