Spark Vertica connector - Checking status and location of temp/reject tables

Hey Folks:

I am using spark vertica connector and wanted to monintor the status of the application, rows processed, rows rejected etc. Can you please let me know where I can find this information ?

I see the following line in the spark logs but am not able to find this table:

COPYing data into Vertica via statement: COPY "schema"."temp_table" FROM 'webhdfs://ip-webhdfshost:/user/workspace/S2V_job_/*.orc' ON ANY NODE orc DIRECT REJECTED DATA AS table "tableName__S2V__REJECTS" NO COMMIT

Also, are there any other tables where I can monitor the status/errors ?

There is a S2V_JOB_STATUS_USER_DBADMIN table but it only contains the following columns:

target_table_schema varchar(128),
target_table_name varchar(128),
save_mode varchar(128),
job_name varchar(256),
start_time timestamptz,
all_done boolean NOT NULL,
success boolean NOT NULL,
percent_failed_rows float

Thanks

Mans

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    Have you looked at the LOAD_STREAMS and LOAD_SOURCES system tables in Vertica?

    https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/AdministratorsGuide/BulkLoadCOPY/MonitoringCopyLoadsMetrics.htm

    The COPY command being issued should show up in these system tables.

  • Hi Jim: Thanks for your response.

    The application I was using to load data into vertica failed.

    I took a look at those tables and did not find any information about the errors and the reason for errors/etc.

    The load_sources is empty and the load_streams contains the rows mentioned below.

    Please let me know if you have any other suggestion/recommendations on how to trouble shoot and get more diagnostics info on load related issues.

    BTW - Does in the input size look too big ?

    Thanks again.

    session_id transaction_id statement_id stream_name schema_name table_id table_name load_start load_duration_ms is_executing accepted_row_count rejected_row_count read_bytes input_file_size_bytes parse_complete_percent unsorted_row_count sorted_row_count sort_complete_percent
    _p_node0001-45:0x2641 45035996273706115 1 schema 45035996273708148 table_2017_07_17 2017-07-19 21:16:47.461993+00 6519376 false 3833036380 0 1620098 250732759904 0 3833036380 3833036380 100
    _p_node0001-45:0x148fa 45035996273709857 1 schema 45035996273708148 table_2017_07_17 2017-07-20 18:36:33.732402+00 1715837 false 988540444 0 744452 38971415394 0 988540444 988540444 100
  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited July 2017

    Is the "table_2017_07_17" the table you loaded? If so it loaded 3,833,036,380 rows on 7/17 and 988540444 on 7/20. Both loads had 0 rejected rows.

    There is a system table called DC_ERRORS that may show any errors that occurred around those loads. Also, the vertica.log file is a great place to look for errors.

  • The load actually failed and there were no rows in the table. Here is the error from the spark job on 7/19 - it shows failed rows percentage 1.0. The status table did not have any error related columns. Where can I find the failed rows and the reason for the failure ?

    Let me know if you have any other pointers.

    Thanks

    17/07/19 23:05:26 INFO S2V: Checking number of rejected rows via statement: SELECT COUNT(*) as count FROM "table__S2V_job4227288210186083010_REJECTS"
    17/07/19 23:05:26 INFO S2V: Verifying rows saved to Vertica is within user tolerance...
    17/07/19 23:05:26 INFO S2V: Number of rows_rejected=0. rows_copied=-461930916. failedRowsPercent=1.0. user's failed_rows_percent_tolerance=0.00. passedFaultToleranceTest=false...FAILED. NOT OK to commit to database
    17/07/19 23:05:28 INFO S2VUtils: Attempting to delete HDFS path: hdfs://ip:8020/user/vertica/workspace/S2V_job4227288210186083010
    17/07/19 23:05:28 INFO S2VUtils: Successfully deleted HDFS path: hdfs://ip:8020/user/vertica/workspace/S2V_job4227288210186083010
    17/07/19 23:05:28 ERROR S2V: Failed to save DataFrame to Vertica table: schema.table_2017_07_17
    17/07/19 23:05:28 INFO S2V: Failure: Additional Job status info is available in Vertica table: schema.S2V_JOB_STATUS_USER_DBADMIN
    Exception in thread "main" java.lang.Exception: S2V: FATAL ERROR for job S2V_job4227288210186083010. Job status information is available in the Vertica table schema.S2V_JOB_STATUS_USER_DBADMIN. Unable to create/insert into target table schema.table_2017_07_17 with SaveMode: Append. ERROR MESSAGE: S2V: FATAL ERROR for job S2V_job4227288210186083010. Job status information is available in the Vertica table schema.S2V_JOB_STATUS_USER_DBADMIN. Failed rows percent was greater than user specified tolerance for table:Target table:schema.table_2017_07_17 with SaveMode: Append Number of rows_rejected=0. rows_copied=-461930916. failedRowsPercent=1.0. user's failed_rows_percent_tolerance=0.00. passedFaultToleranceTest=false
    at com.vertica.spark.s2v.S2V.do2Stage(S2V.scala:342)
    at com.vertica.spark.s2v.S2V.save(S2V.scala:392)
    at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:88)
    at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
    at

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited July 2017

    Looks like Vertica loaded the records it said it did (i.e. values in the accepted_row_count column of the LOAD_STREAMS table) but the records weren't committed. If you look at the COPY command in your original post, it is using the NO COMMIT option. That prevents the COPY statement from committing its transaction automatically when it finishes copying data. That would explain why there is no data in the table after.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    This is the most interesting entry in your log:

    17/07/19 23:05:26 INFO S2V: Number of rows_rejected=0. rows_copied=-461930916. failedRowsPercent=1.0. user's failed_rows_percent_tolerance=0.00. passedFaultToleranceTest=false...FAILED. NOT OK to commit to database

    Why did it fail if the number of rejected rows = 0? And it it showing a negative value for rows copied? -461930916?

    That could be a problem.

  • That is what I am struggling with - I am not sure where to look for the more details on the logs.

    Let me know if you have any other thoughts.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited July 2017

    What version of Vertica are you running? Make sure the Vertica, Scala and Spark versions you are using are compatible.

    See:

    https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/SupportedPlatforms/SparkIntegration.htm

    So you ran a query like this against the DC_ERRORS table?

    select * from dc_errors where transaction_id in (45035996273706115, 45035996273709857);

    It is possible that the errors (if any) have aged out of the DC_ERRORS table. You can check the earliest entry in there to see if it is < than 2017-07-19 21:16:47.461993+00.

    select min(time) from dc_errors;

    You may still be able to see all of the SQL executed against Vertica for the transactions:

    select transaction_id, query_start, query from query_profiles where transaction_id in (45035996273706115, 45035996273709857) order by transaction_id, query_start;

    I found a document that listed a limitation of the S2V to be:

    ** Two simultaneous S2V jobs Overwriting the same target table will interfere with each other. One job will issue drop table DDL statement, which will cause error in the other in-progress job COPY statement since the table is gone!

    The SELECT from the QUERY_PROFILES table above may show SQL that overlaps indicating that perhaps 2 jobs were running against the target table.

    But I think you are using the APPEND option so the about limitation most likely does not apply

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Can you post the values that are in the S2V_JOB_STATUS_USER_DBADMIN table?

    Also, how many nodes are in your cluster? Take a look at the vertica log on each node as they may contain different info around the 17/07/19 23:05:26 time frame.

  • Hey Jim - Thanks for the pointers. I apologize I got busy with another issue but will try to get the info for you as soon as I can. Thanks

  • Hey Folks:

    I wanted to find out where are the following tables mentioned in the logs and discussion above located - I've tried to find in the public and schema that I am using but don't see these tables. The only tables I can find is S2V_JOB_STATUS_USER_DBADMIN

    Here are the tables I am looking for:

    • DC_ERRORS
    • table__S2V_job6849397757185092870_REJECTS

    Also is there any guide where I can find which tables can be used for troubleshooting and error tracing ? The S2V_JOB_STATUS_USER_DBADMIN has very limited information.

    Thanks

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    The DC_ERRORS table is a "Data Collector" table and is in the V_INTERNAL schema.

    dbadmin=> select table_schema, table_name, table_description from vs_system_tables where table_name = 'dc_errors';
     table_schema | table_name |             table_description
    --------------+------------+--------------------------------------------
     v_internal   | dc_errors  | History of all errors+warnings encountered
    (1 row)
    

    You can search for the rejects table in the ALL_TABLES system table:

    dbadmin=> select schema_name, table_name from all_tables where table_name ilike '%job6849397757185092870%';
     schema_name | table_name
    -------------+------------
    (0 rows)
    
  • I am wondering if the negative rows_copied in the error above is due to overflow for the rows_copied field. Here is the errro again -
    17/07/19 23:05:26 INFO S2V: Number of rows_rejected=0. rows_copied=-461930916. failedRowsPercent=1.0. user's failed_rows_percent_tolerance=0.00. passedFaultToleranceTest=false...FAILED. NOT OK to commit to database

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file