Tuple Mover: moving out projections

roger_moore · October 2015

I found 30G of the below being repeated. Any idea what's going on and how it can be avoided?

2015-10-09 09:02:44.632 TM Moveout:0x7fcf30112620 [Txn] <INFO> Commit Complete: Txn: a00000005d876a at epoch 0x5fb22
2015-10-09 09:02:44.632 TM Moveout:0x7fcf30112620 [TM] <INFO> Tuple Mover: moved out projection qr_concept_hashtags_tweets_b0
2015-10-09 09:02:44.632 TM Moveout:0x7fcf30112620-a00000005d876b [Txn] <INFO> Begin Txn: a00000005d876b 'Moveout: Tuple Mover'
2015-10-09 09:02:44.632 TM Moveout:0x7fcf30112620-a00000005d876b [TM] <INFO> Tuple Mover: moving out projection qr_concept_hashtags_tweets_b0
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) Moveout projection ecr.qr_concept_hashtags_tweets_b0
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) TM Moveout: Moving out DV data in WOS up to epoch 391970, based on WOS data up to epoch 391969
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) TM moveout: Wos row count = 0, Wos delete row count = 0
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) Skipping replay delete due to lack of deletes.
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) Executing DVWos moveout plans
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) getMiniROSsForDVWosMoveout: Creating DVWos marker(s)
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) getMiniROSsForDVWosMoveout: No unmarked DVWOSs to moveout.
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) Deleting Wos and DV contents
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) Dropping source WOSs - empty
2015-10-09 09:02:44.633 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> (a00000005d876b) Moved out 0 bytes
2015-10-09 09:02:44.634 TM Moveout:0x7fcf30112620-a00000005d876b [EE] <INFO> Moveout projection ecr.qr_concept_hashtags_tweets_b0 - done
2015-10-09 09:02:44.634 TM Moveout:0x7fcf30112620-a00000005d876b [Txn] <INFO> Starting Commit: Txn: a00000005d876b 'Moveout: (Table: ecr.qr_concept_hashtags_tweets) (Projection: ecr.qr_concept_hashtags_tweets_b0)'

Adrian_Oprea_1 · October 2015

Why would you wanna avoid this ? That is the Tuple Mover doing his job, moving data from WOS into ROS.(don`t worry there is nothing wrong going on)

See ths link for more datails .

Or are you reffering to the INFO msg in your vertica.log ? (you don`t want them anymore?)

roger_moore · October 2015

Is it normal that it would be performing this operation for many days and also that it would cause 30G of output?

roger_moore · October 2015

It looks like vertica shutdown because of it: Total memory accounted for by FileCache Allocator

2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 536870920: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 536870920: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 536870920: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 536870920: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 536870920: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.029 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/3916: Memory usage in FileCache:
   DETAIL: Size 1073741832: 0 on free list; 0 still in use (0 bytes)
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 <LOG> @v_vjanys_node0001: 00000/5071: Total memory accounted for by FileCache Allocator: 0
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Typical LRU usage: 0 free 0 in use
2015-10-10 02:37:06.030 unknown:0x7fcfd4d45700 [SAL] <INFO> Large LRU usage: 0 free 0 in use
2015-10-10 02:37:06.033 unknown:0x7fcfd4d45700 [Init] <INFO> Global pool memory usage: NewPool(0x4735320) 'GlobalPool': totalDtors 0 totalSize 132120576 (17364488 unused) totalChunks 6
2015-10-10 02:37:06.033 unknown:0x7fcfd4d45700 [Init] <INFO> SAL global pool memory usage: NewPool(0x4725380) 'SALGlobalPool': totalDtors 0 totalSize 3395289088 (2065753376 unused) totalChunks 47
2015-10-10 02:37:06.033 unknown:0x7fcfd4d45700 [Init] <INFO> SS::stopPoller()
2015-10-10 02:37:06.034 unknown:0x7fcfd4d45700 [Init] <INFO> DC::shutDown()
2015-10-10 02:37:06.034 unknown:0x7fcfd4d45700 [Init] <INFO> Shutdown complete. Exiting.

Adrian_Oprea_1 · October 2015

When you talk about 30G output are you reffering to the size of your vertica.log file ?

roger_moore · October 2015

Yes, usually the vertica.log file (until it is rotated) is around 1M

Adrian_Oprea_1 · October 2015

I don`t know if the error in the log is related to the fact that your log is huge in size.

But what you can do is check you logrotate conf and make sure your set it up for your use.

Example of how i setup mine:

/opt/vertica/bin/admintools -t logrotate -d <db_name> -r daily -k 15

-d - pass the database name

-r - rotation sequence

-k - how many should be kept

- this should keep your space waste to a certain control.

roger_moore · October 2015

It does seem that vertica crashed because of tuple mover operation. See the above post for the error: "Total memory accounted for by FileCache Allocator"

Adrian_Oprea_1 · October 2015

You are mixing problems ! You were facing huge log file issues, now you are looking at an Vertica crash error !

One at a time. maybe one is the couse of another or they might have nothing to doone with another !

Restart your DB setup your logrotate properly and monitor your db.

roger_moore · October 2015

Yes, I've had three crashes in the past week and they all have had large logs as a warning sign (i.e. I can tell when a crash is going to occur when the vertica log starts expanding beyond what it should be). If it sounds like there are multiple issues to this thread it's because I'm still investigating the cause of these crashes. Initially, I found that the crash was preceded by 30G of tuper mover operations (which is unusual). Ultimately, it seems like vertica is hitting a memory issue:

Total memory accounted for by FileCache Allocator

It's a three cluster set up and one is recovering. How can I (without reinstalling) erase all the data and all the history (all the epochs) and get back to square one WITH my schemas? I believe this would solve my problems since vertica is trying to get back to an epoch during recovery while data is constantly being added and deleted.

Adrian_Oprea_1 · October 2015

So you have one node in recovery state all this time ?

Can you see the output of this query:

select get_ahm_epoch(),get_last_good_epoch(),get_current_epoch();

Also :

select * from recovery_status where is_running;

Maybe the node is not coming back becouse there are projecitons stuck in recovery;

select * from vs_projection_recoveries  WHERE status not in ('finished', 'ignored');

What is the output of this queries ?

roger_moore · October 2015

=> select get_ahm_epoch(),get_last_good_epoch(),get_current_epoch();
get_ahm_epoch | get_last_good_epoch | get_current_epoch
---------------+---------------------+-------------------
368004 | 398028 | 398029
(1 row)

The query "select * from vs_projection_recoveries WHERE status not in ('finished', 'ignored') AND start_time > '2015-10-14';" results in 30 rows

=> select status, count(*) from vs_projection_recoveries WHERE status not in ('finished', 'ignored') AND start_time > '2015-10-14' group by status;
   status    | count
-------------+-------
error-fatal |    28
running     |     2
(2 rows)

Adrian_Oprea_1 · October 2015

What is the return of the :

SELECT node_name, projection_name, method, status, progress, detail, start_time
FROM projection_recoveries where status = 'error-fatal';

And also :

SELECT node_name, projection_name, method, status, progress, detail, start_time
FROM projection_recoveries where status = 'running';

roger_moore · October 2015

roger_moore · October 2015

Adrian_Oprea_1 · October 2015

Try to run :

select make_ahm_now(true)

- to skip the reply part and do recovery. This will force the AHM to advance even with one of yor nodes not up state.

- This will allow the node to be rebuild from scratch.

See more details about this function here.

Let me know the output.

roger_moore · October 2015

=> select make_ahm_now(true);

make_ahm_now

---------------------------------

AHM set (New AHM Epoch: 400056)

(1 row)

roger_moore · October 2015

Is there a way I can start back at epoch 0? Without reinstalling, is there a way I can delete all the data and all the history and get back to how the system was when I first installed it while keeping my current schemas and projections?

Adrian_Oprea_1 · October 2015

What is the recovery status now ?

select * from vs_recovery_status;

Try to restart the bad node !

We're Moving!

Create My New Community Account Now

Tuple Mover: moving out projections

Comments

Leave a Comment