2015-09-04 10 views
6

Sto utilizzando Spark 1.4.1 sul mio laptop Mac locale e sono in grado di utilizzare pyspark in modo interattivo senza problemi. Spark è stato installato tramite Homebrew e sto usando Anaconda Python. Tuttavia, non appena si tenta di utilizzare spark-submit, ottengo il seguente errore:Spark-submit non riesce ad importare SparkContext

15/09/04 08:51:09 ERROR SparkContext: Error initializing SparkContext. 
java.io.FileNotFoundException: Added file file:test.py does not exist. 
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329) 
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305) 
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) 
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) 
    at scala.collection.immutable.List.foreach(List.scala:318) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:458) 
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) 
    at py4j.Gateway.invoke(Gateway.java:214) 
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) 
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) 
    at py4j.GatewayConnection.run(GatewayConnection.java:207) 
    at java.lang.Thread.run(Thread.java:745) 
15/09/04 08:51:09 ERROR SparkContext: Error stopping SparkContext after init error. 
java.lang.NullPointerException 
    at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152) 
    at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1216) 
    at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) 
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1659) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:565) 
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) 
    at py4j.Gateway.invoke(Gateway.java:214) 
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) 
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) 
    at py4j.GatewayConnection.run(GatewayConnection.java:207) 
    at java.lang.Thread.run(Thread.java:745) 
Traceback (most recent call last): 
    File "test.py", line 35, in <module> sc = SparkContext("local","test") 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 113, in __init__ 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 165, in _do_init 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 219, in _initialize_context 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__ 
    File "/usr/local/Cellar/apache-spark/1.4.1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
: java.io.FileNotFoundException: Added file file:test.py does not exist. 
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1329) 
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1305) 
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) 
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:458) 
    at scala.collection.immutable.List.foreach(List.scala:318) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:458) 
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) 
    at py4j.Gateway.invoke(Gateway.java:214) 
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) 
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) 
    at py4j.GatewayConnection.run(GatewayConnection.java:207) 
    at java.lang.Thread.run(Thread.java:745) 

Ecco il mio codice:

from pyspark import SparkContext 

if __name__ == "__main__": 
    sc = SparkContext("local","test") 
    sc.parallelize([1,2,3,4]) 
    sc.stop() 

Se sposto il file da qualsiasi punto della directory /usr/local/Cellar/apache-spark/1.4.1/, poi spark-submit funziona bene. Ho le mie variabili d'ambiente impostate come segue:

export SPARK_HOME="/usr/local/Cellar/apache-spark/1.4.1" 
export PATH=$SPARK_HOME/bin:$PATH 
export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/lib/py4j-0.8.2.1-src.zip 

Sono sicuro che qualcosa non è impostato correttamente nel mio ambiente, ma io non riesco a rintracciarlo.

+1

Provare a usare 'spark-submit /text.py', sembra che' spark-submit' non riesca a trovare il tuo script Python. –

+0

Ho provato il percorso completo e continuo a ricevere lo stesso errore. Ho anche controllato le autorizzazioni sulla cartella e questo non sembra essere il problema. – caleboverman

+4

Prova ad aggiungere la directory che contiene 'test.py' al tuo PYTHONPATH. –

risposta

0

I file python eseguiti da spark-submit devono trovarsi su PYTHONPATH. O aggiungere il percorso completo della directory facendo:

export PYTHONPATH=full/path/to/dir:$PYTHONPATH 

o si può anche aggiungere '.' al PYTHONPATH se sei già dentro la directory in cui lo script python è

export PYTHONPATH='.':$PYTHONPATH 

Grazie a @Def_Os per averlo indicato!