Options

FAILED. NOT OK to commit rows to database. Too many rows were rejected

I'm trying to insert rows concurrently in a vertica table using concurrent/parallel threads in a Java Spark application. The logs shows multiple exceptions like the one I've pasted below, but all the rows were inserted successfully into the target table with no data loss. I've tried this multiple times and every time I got these exceptions without but without any data loss. Should I be concerned about these? or can I ignore these exceptions? I'm worried if these will create problem later in production.

 

Environment:

 

Vertica Cluster of 3 nodes running Vertica Analytic Database v7.2.3-0

 

Exception:

 

 INFO | 2016-08-22 02:37:06,121 | All user options and S2V options validated ok: temp_table=S2V_TEMP_TABLE_sometable_S2V_job5364016263210767025, strlen=1024, save_mode=Append, last_committer_table=S2V_TEMP_TABLE_sometable_S2V_job5364016263210767025_last_committer, host=192.168.100.3, num_partitions=2, tmpdir=/tmp, db=conuretsprod, empty_count=-1, job_name=S2V_job5364016263210767025, all_done_table=S2V_JOB_STATUS, commit_tables_prefix=S2V_TEMP_TABLE_sometable_S2V_job5364016263210767025_commits_partition, dbschema=public, port=5433, autocommit=false, log_rejected_rows_sample_size=10, primary_ip=192.168.100.3, failed_rows_percent_tolerance=0.0, batch_size=100000, user=dbadmin, table=sometable, password=MASKED
INFO | 2016-08-22 02:37:06,121 | Unique job name:S2V_job5364016263210767025 will appear in Vertica table: public.S2V_JOB_STATUS
INFO | 2016-08-22 02:37:16,511 | createTempTables(): created temp table public.S2V_TEMP_TABLE_sometable_S2V_job5364016263210767025
INFO | 2016-08-22 02:37:31,510 | createTempTables(): created temp table public.S2V_TEMP_TABLE_sometable_S2V_job5364016263210767025_last_committer
INFO | 2016-08-22 02:38:18,581 | createTempTables(): created temp tables public.S2V_TEMP_TABLE_sometable_S2V_job5364016263210767025_commits_partition_0...1
INFO | 2016-08-22 02:40:11,805 | Starting task 1.0 in stage 744.0 (TID 1237, localhost, partition 1,PROCESS_LOCAL, 2602 bytes)
INFO | 2016-08-22 02:40:11,805 | Running task 1.0 in stage 744.0 (TID 1237)
INFO | 2016-08-22 02:40:18,631 | dropTempTables(): successfully dropped temp table public.S2V_TEMP_TABLE_sometable_S2V_job5364016263210767025
INFO | 2016-08-22 02:40:18,708 | dropTempTables(): successfully dropped temp table public.S2V_TEMP_TABLE_sometable_S2V_job5364016263210767025_last_committer
INFO | 2016-08-22 02:40:18,901 | dropTempTables(): successfully dropped temp tables public.S2V_TEMP_TABLE_sometable_S2V_job5364016263210767025_commits_partition_0...1
ERROR | 2016-08-22 02:40:18,908 | Exception in task 1.0 in stage 744.0 (TID 1237)
java.lang.Exception: Partition[1]: FATAL ERROR for job S2V_job5364016263210767025. Job status information is available in the Vertica table public.S2V_JOB_STATUS. . Failed rows summary: FailedRowsPercent=1.0; failedRowsPercentTolerance=0.0: FAILED. NOT OK to commit rows to database. Too many rows were rejected. . Unable to create/insert into target table public.sometable
at com.vertica.spark.s2v.S2V.tryTofinalizeSaveToVertica(S2V.scala:746)
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:226)
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:128)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO | 2016-08-22 02:40:18,911 | Partition[1]: finished as the last committer and just performed the final save to Vertica. Dropping all temp tables now.
WARN | 2016-08-22 02:40:18,926 | Lost task 1.0 in stage 744.0 (TID 1237, localhost): java.lang.Exception: Partition[1]: FATAL ERROR for job S2V_job5364016263210767025. Job status information is available in the Vertica table public.S2V_JOB_STATUS. . Failed rows summary: FailedRowsPercent=1.0; failedRowsPercentTolerance=0.0: FAILED. NOT OK to commit rows to database. Too many rows were rejected. . Unable to create/insert into target table public.sometable
at com.vertica.spark.s2v.S2V.tryTofinalizeSaveToVertica(S2V.scala:746)
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:226)
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:128)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

ERROR | 2016-08-22 02:40:18,928 | Task 1 in stage 744.0 failed 1 times; aborting job
INFO | 2016-08-22 02:40:18,929 | Removed TaskSet 744.0, whose tasks have all completed, from pool
INFO | 2016-08-22 02:40:18,935 | Cancelling stage 744
INFO | 2016-08-22 02:40:18,936 | ResultStage 744 (count at S2V.scala:245) failed in 120.342 s
INFO | 2016-08-22 02:40:18,938 | Job 744 failed: count at S2V.scala:245, took 120.349084 s
ERROR | 2016-08-22 02:40:18,942 | Exception while loading via VerticaSparkConnectorLoader
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 744.0 failed 1 times, most recent failure: Lost task 1.0 in stage 744.0 (TID 1237, localhost): java.lang.Exception: Partition[1]: FATAL ERROR for job S2V_job5364016263210767025. Job status information is available in the Vertica table public.S2V_JOB_STATUS. . Failed rows summary: FailedRowsPercent=1.0; failedRowsPercentTolerance=0.0: FAILED. NOT OK to commit rows to database. Too many rows were rejected. . Unable to create/insert into target table public.sometable
at com.vertica.spark.s2v.S2V.tryTofinalizeSaveToVertica(S2V.scala:746)
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:226)
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:128)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at com.vertica.spark.s2v.S2V.save(S2V.scala:245)
at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:77)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at com.conurets.etl.loader.VerticaSparkConnectorLoader.postValidationLoad(VerticaSparkConnectorLoader.java:42)
at com.conurets.etl.loader.AbstractEtlLoader.load(AbstractEtlLoader.java:32)
at com.conurets.etl.parser.AbstractJdbcVerticaDestinationParser$2.run(AbstractJdbcVerticaDestinationParser.java:95)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: Partition[1]: FATAL ERROR for job S2V_job5364016263210767025. Job status information is available in the Vertica table public.S2V_JOB_STATUS. . Failed rows summary: FailedRowsPercent=1.0; failedRowsPercentTolerance=0.0: FAILED. NOT OK to commit rows to database. Too many rows were rejected. . Unable to create/insert into target table public.sometable
at com.vertica.spark.s2v.S2V.tryTofinalizeSaveToVertica(S2V.scala:746)
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:226)
at com.vertica.spark.s2v.S2V$$anonfun$2.apply(S2V.scala:128)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
... 3 more

 

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file