Spark Job in esecuzione su Yarn Cluster java.io.FileNotFoundException: Il file non esce, anche se il file termina sul nodo master

Sono abbastanza nuovo per Spark. Ho provato a cercare ma non ho potuto ottenere una soluzione adeguata. Ho installato hadoop 2.7.2 su due caselle (un nodo principale e l'altro nodo di lavoro). Ho installato il cluster seguendo il link sottostante http://javadev.org/docs/hadoop/centos/6/installation/multi-node-installation-on-centos-6-non-sucure-mode/ Stavo eseguendo hadoop e spark un'applicazione come utente root per testare il cluster.Spark Job in esecuzione su Yarn Cluster java.io.FileNotFoundException: Il file non esce, anche se il file termina sul nodo master

Ho installato la scintilla sul nodo master e la scintilla inizia senza errori. Tuttavia, quando invio il lavoro utilizzando spark submit, ricevo l'eccezione File Not Found anche se il file è presente nel nodo master nella stessa posizione nell'errore. Sto eseguendo il comando Spark Submit sotto il comando Spark Submit e trovo l'output dei registri sotto comando.

/bin/spark-submit --class com.test.Engine --master yarn --deploy-mode  cluster /app/spark-test.jar

 
16/04/21 19:16:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/04/21 19:16:13 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
16/04/21 19:16:14 INFO Client: Requesting a new application from cluster with 1 NodeManagers 
16/04/21 19:16:14 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 
16/04/21 19:16:14 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 
16/04/21 19:16:14 INFO Client: Setting up container launch context for our AM 
16/04/21 19:16:14 INFO Client: Setting up the launch environment for our AM container 
16/04/21 19:16:14 INFO Client: Preparing resources for our AM container 
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar 
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/app/spark-test.jar 
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-120aeddc-0f87-4411-9400-22ba01096249/__spark_conf__5619348744221830008.zip 
16/04/21 19:16:14 INFO SecurityManager: Changing view acls to: root 
16/04/21 19:16:14 INFO SecurityManager: Changing modify acls to: root 
16/04/21 19:16:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
16/04/21 19:16:15 INFO Client: Submitting application 1 to ResourceManager 
16/04/21 19:16:15 INFO YarnClientImpl: Submitted application application_1461246306015_0001 
16/04/21 19:16:16 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 
16/04/21 19:16:16 INFO Client: 
    client token: N/A 
    diagnostics: N/A 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1461246375622 
    final status: UNDEFINEDsparkcluster01.testing.com 
    tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461246306015_0001/ 
    user: root 
16/04/21 19:16:17 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 
16/04/21 19:16:18 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 
16/04/21 19:16:19 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 
16/04/21 19:16:20 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED) 
16/04/21 19:16:21 INFO Client: Application report for application_1461246306015_0001 (state: FAILED) 
16/04/21 19:16:21 INFO Client: 
    client token: N/A 
    diagnostics: Application application_1461246306015_0001 failed 2 times due to AM Container for appattempt_1461246306015_0001_000002 exited with exitCode: -1000 
For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001Then, click on links to logs of each attempt. 
Diagnostics: java.io.FileNotFoundException: File file:/app/spark-test.jar does not exist 
Failing this attempt. Failing the application. 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1461246375622 
    final status: FAILED 
    tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001 
    user: root 
Exception in thread "main" org.ap/app/spark-test.jarache.spark.SparkException: Application application_1461246306015_0001 finished with failed status 
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034) 
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) 
    at org.apache.spark.deploy.yarn.Client.main(Client.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Ho anche provato a fare funzionare la scintilla nel file system HDFS mettendo la mia applicazione su HDFS e dando il percorso HDFS nel Spark Invia comando. Anche in quel caso l'eccezione File not found di lancio su un file Spark Conf. Sto eseguendo il comando Spark Submit qui sotto e trovo l'output dei registri sotto il comando.

./bin/spark-submit --class com.test.Engine --master yarn --deploy-mode cluster hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar

 
16/04/21 18:11:45 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
16/04/21 18:11:46 INFO Client: Requesting a new application from cluster with 1 NodeManagers 
16/04/21 18:11:46 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 
16/04/21 18:11:46 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 
16/04/21 18:11:46 INFO Client: Setting up container launch context for our AM 
16/04/21 18:11:46 INFO Client: Setting up the launch environment for our AM container 
16/04/21 18:11:46 INFO Client: Preparing resources for our AM container 
16/04/21 18:11:46 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar 
16/04/21 18:11:47 INFO Client: Uploading resource hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar -> file:/root/.sparkStaging/application_1461234217994_0017/spark-test.jar 
16/04/21 18:11:49 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip 
16/04/21 18:11:50 INFO SecurityManager: Changing view acls to: root 
16/04/21 18:11:50 INFO SecurityManager: Changing modify acls to: root 
16/04/21 18:11:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 
16/04/21 18:11:50 INFO Client: Submitting application 17 to ResourceManager 
16/04/21 18:11:50 INFO YarnClientImpl: Submitted application application_1461234217994_0017 
16/04/21 18:11:51 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED) 
16/04/21 18:11:51 INFO Client: 
    client token: N/A 
    diagnostics: N/A 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1461242510849 
    final status: UNDEFINED 
    tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461234217994_0017/ 
    user: root 
16/04/21 18:11:52 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED) 
16/04/21 18:11:53 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED) 
16/04/21 18:11:54 INFO Client: Application report for application_1461234217994_0017 (state: FAILED) 
16/04/21 18:11:54 INFO Client: 
    client token: N/A 
    diagnostics: Application application_1461234217994_0017 failed 2 times due to AM Container for appattempt_1461234217994_0017_000002 exited with exitCode: -1000 
For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017Then, click on links to logs of each attempt. 
Diagnostics: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist 
java.io.FileNotFoundException: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist 
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609) 
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822) 
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599) 
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) 
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) 
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) 
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) 
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) 
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 

Failing this attempt. Failing the application. 
    ApplicationMaster host: N/A 
    ApplicationMaster RPC port: -1 
    queue: default 
    start time: 1461242510849 
    final status: FAILED 
    tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017 
    user: root 
Exception in thread "main" org.apache.spark.SparkException: Application application_1461234217994_0017 finished with failed status 
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034) 
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) 
    at org.apache.spark.deploy.yarn.Client.main(Client.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) 
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) 
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
16/04/21 18:11:55 INFO ShutdownHookManager: Shutdown hook called 
16/04/21 18:11:55 INFO ShutdownHookManager: Deleting directory /tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21

fonte

2016-04-20 Ajay

È bello che tu abbia incluso tutte queste informazioni, ma il codice scintilla non sarebbe stato utile per diagnosticare il problema? –

@ cricket_007 Il problema non è dovuto al codice spark perché anche se eseguo spark shell o altri esempi di scintille usando Yarn, sto ricevendo lo stesso errore. per esempio: spark-shell --master yarn-client – Ajay

Potrebbe aggiungere i tuoi log di filati per favore. potresti ottenerli facendo '$ logs di filato -applicationId application_1461246306015_0001' – user1314742

La configurazione scintilla non stava indicando la directory Configurazione Hadoop destra. La configurazione hadoop per 2.7.2 risiede nel percorso del file hadoop 2.7.2./etc/hadoop/ piuttosto che /root/hadoop2.7.2/conf. Quando ho indicato HADOOP_CONF_DIR =/root/hadoop2.7.2/etc/hadoop/sotto spark-env.sh la scintilla submit ha iniziato a funzionare e l'eccezione File not found è scomparsa. In precedenza puntava a /root/hadoop2.7.2/conf (che non esce). Se la scintilla non punta alla corretta directory di configurazione di hadoop, potrebbe verificarsi un errore simile. Penso che sia probabilmente un bug nella scintilla, dovrebbe gestirlo con garbo piuttosto che lanciare messaggi di errore ambigui.

fonte

2016-04-21 17:14:54 Ajay

Ho un errore simile con Spark in esecuzione su EMR. Ho scritto il mio codice spark in Java 8, e in EMR cluster spark viene eseguito, per impostazione predefinita, su Java 8. Quindi ho dovuto ricreare il cluster con JAVA_HOME che puntava alla versione java 8. Ha risolto il mio problema. Si prega di verificare le linee simili.

fonte

2016-05-11 17:46:04

Avevo un problema simile ma il problema era relativo alla presenza di due core-site.xml uno in $ HADOOP_CONF_DIR e altro in $ SPARK_HOME/conf. Il problema è scomparso quando ho rimosso quello sotto $ SPARK_HOME/conf

fonte

2017-08-08 22:21:35 smishra

Spark Job in esecuzione su Yarn Cluster java.io.FileNotFoundException: Il file non esce, anche se il file termina sul nodo master

risposta

Problemi correlati