Sto eseguendo un'applicazione spark con la modalità cluster YARN-client o YARN.L'avvio della modalità YARN Apache Spark richiede troppo tempo (10+ secondi)
Ma sembra che ci voglia troppo tempo per l'avvio.
Sono necessari più di 10 secondi per inizializzare il contesto scintilla.
È normale? Oppure può essere ottimizzato?
L'ambiente è il seguente:
- Hadoop: Hortonworks HDP 2.2 (Hadoop 2.6) (cluster di test molto piccolo con nodi di dati 3)
- Spark: 1.3.1
- Cliente: Windows 7, ma risultato simile su CentOS 6.6
Quanto segue è la parte di avvio del registro dell'applicazione. (Alcune informazioni private sono state modificate)
'Principale: inizializzazione del contesto' nella prima riga e 'MainProcessor: eliminazione dei file di output precedenti' nell'ultima riga sono i registri dall'applicazione. Altri nel mezzo sono da Spark stessa. La logica dell'applicazione viene eseguita dopo che questo registro è stato visualizzato.
15/05/07 09:18:31 INFO Main: Initializing context
15/05/07 09:18:31 INFO SparkContext: Running Spark version 1.3.1
15/05/07 09:18:31 INFO SecurityManager: Changing view acls to: myuser,myapp
15/05/07 09:18:31 INFO SecurityManager: Changing modify acls to: myuser,myapp
15/05/07 09:18:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(myuser, myapp); users with modify permissions: Set(myuser, myapp)
15/05/07 09:18:31 INFO Slf4jLogger: Slf4jLogger started
15/05/07 09:18:31 INFO Remoting: Starting remoting
15/05/07 09:18:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:54449]
15/05/07 09:18:31 INFO Utils: Successfully started service 'sparkDriver' on port 54449.
15/05/07 09:18:31 INFO SparkEnv: Registering MapOutputTracker
15/05/07 09:18:32 INFO SparkEnv: Registering BlockManagerMaster
15/05/07 09:18:32 INFO DiskBlockManager: Created local directory at C:\Users\myuser\AppData\Local\Temp\spark-2d3db9d6-ea78-438e-956f-be9c1dcf3a9d\blockmgr-e9ade223-a4b8-4d9f-b038-efd66adf9772
15/05/07 09:18:32 INFO MemoryStore: MemoryStore started with capacity 1956.7 MB
15/05/07 09:18:32 INFO HttpFileServer: HTTP File server directory is C:\Users\myuser\AppData\Local\Temp\spark-ff40d73b-e8ab-433e-88c4-35da27fb6278\httpd-def9220f-ac3a-4dd2-9ac1-2c593b94b2d9
15/05/07 09:18:32 INFO HttpServer: Starting HTTP Server
15/05/07 09:18:32 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/07 09:18:32 INFO AbstractConnector: Started [email protected]:54450
15/05/07 09:18:32 INFO Utils: Successfully started service 'HTTP file server' on port 54450.
15/05/07 09:18:32 INFO SparkEnv: Registering OutputCommitCoordinator
15/05/07 09:18:32 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/07 09:18:32 INFO AbstractConnector: Started [email protected]:4040
15/05/07 09:18:32 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/05/07 09:18:32 INFO SparkUI: Started SparkUI at http://mymachine:4040
15/05/07 09:18:32 INFO SparkContext: Added JAR file:/D:/Projects/MyApp/MyApp.jar at http://10.111.111.199:54450/jars/MyApp.jar with timestamp 1430957912240
15/05/07 09:18:32 INFO RMProxy: Connecting to ResourceManager at cluster01/10.111.111.11:8050
15/05/07 09:18:32 INFO Client: Requesting a new application from cluster with 3 NodeManagers
15/05/07 09:18:32 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (23040 MB per container)
15/05/07 09:18:32 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/05/07 09:18:32 INFO Client: Setting up container launch context for our AM
15/05/07 09:18:32 INFO Client: Preparing resources for our AM container
15/05/07 09:18:32 INFO Client: Source and destination file systems are the same. Not copying hdfs://cluster01/apps/spark/spark-assembly-1.3.1-hadoop2.6.0.jar
15/05/07 09:18:32 INFO Client: Setting up the launch environment for our AM container
15/05/07 09:18:33 INFO SecurityManager: Changing view acls to: myuser,myapp
15/05/07 09:18:33 INFO SecurityManager: Changing modify acls to: myuser,myapp
15/05/07 09:18:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(myuser, myapp); users with modify permissions: Set(myuser, myapp)
15/05/07 09:18:33 INFO Client: Submitting application 2 to ResourceManager
15/05/07 09:18:33 INFO YarnClientImpl: Submitted application application_1430956687773_0002
15/05/07 09:18:34 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED)
15/05/07 09:18:34 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1430957906540
final status: UNDEFINED
tracking URL: http://cluster01:8088/proxy/application_1430956687773_0002/
user: myapp
15/05/07 09:18:35 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED)
15/05/07 09:18:36 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED)
15/05/07 09:18:37 INFO Client: Application report for application_1430956687773_0002 (state: ACCEPTED)
15/05/07 09:18:37 INFO YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://[email protected]:39698/user/YarnAM#-1579648782]
15/05/07 09:18:37 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> cluster01, PROXY_URI_BASES -> http://cluster01:8088/proxy/application_1430956687773_0002), /proxy/application_1430956687773_0002
15/05/07 09:18:37 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/05/07 09:18:38 INFO Client: Application report for application_1430956687773_0002 (state: RUNNING)
15/05/07 09:18:38 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: cluster02
ApplicationMaster RPC port: 0
queue: default
start time: 1430957906540
final status: UNDEFINED
tracking URL: http://cluster01:8088/proxy/application_1430956687773_0002/
user: myapp
15/05/07 09:18:38 INFO YarnClientSchedulerBackend: Application application_1430956687773_0002 has started running.
15/05/07 09:18:38 INFO NettyBlockTransferService: Server created on 54491
15/05/07 09:18:38 INFO BlockManagerMaster: Trying to register BlockManager
15/05/07 09:18:38 INFO BlockManagerMasterActor: Registering block manager mymachine:54491 with 1956.7 MB RAM, BlockManagerId(<driver>, mymachine, 54491)
15/05/07 09:18:38 INFO BlockManagerMaster: Registered BlockManager
15/05/07 09:18:43 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:44996/user/Executor#-786778979] with ID 1
15/05/07 09:18:43 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
15/05/07 09:18:43 INFO MainProcessor: Deleting previous output files
Grazie.
UPDATE
Credo di aver trovato la ragione (forse parziale, ma importante).
E 'tra le seguenti righe:
15/05/08 11:36:32 INFO BlockManagerMaster: Registered BlockManager
15/05/08 11:36:38 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:55237/user/Executor#-149550753] with ID 1
Quando ho letto i log sul lato cluster, le righe seguenti sono stati trovati: (il tempo esatto è diverso con la linea di cui sopra, ma è la differenza tra le macchine)
15/05/08 11:36:23 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 5000
15/05/08 11:36:28 INFO impl.AMRMClientImpl: Received new token for : cluster04:45454
Sembra che Spark abbia deliberatamente dormito per 5 secondi.
Ho letto il codice sorgente Spark e in org.apache.spark.deploy.yarn.ApplicationMaster.scala, launchReporterThread() aveva il codice per quello. I loop chiamano allocator.allocateResources() e Thread.sleep(). Per la sospensione, legge la variabile di configurazione spark.yarn.scheduler.heartbeat.interval-ms (il valore predefinito è 5000, che è 5 secondi). In base al commento, "vogliamo essere ragionevolmente reattivi senza causare troppe richieste a RM". Quindi, a meno che YARN soddisfi immediatamente la richiesta di allocazione, sembra che 5 secondi saranno sprecati.
Quando ho modificato la variabile di configurazione su 1000, è stato necessario attendere 1 secondo.
Ecco le linee di registro dopo il cambiamento:
15/05/08 11:47:21 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 1000
15/05/08 11:47:22 INFO impl.AMRMClientImpl: Received new token for : cluster04:45454
4 sec salvati.
Quindi, quando non si desidera attendere 5 secondi, è possibile modificare spark.yarn.scheduler.heartbeat.interval-ms.
Spero che l'overhead aggiuntivo che incorre sarebbe trascurabile.
UPDATE
Un problema correlato JIRA era stato aperto e risolto. Vedi https://issues.apache.org/jira/browse/SPARK-7533
ho avuto lo stesso problema che avevi su pyspark, l'hai risolto? – newBike