2015-01-07 2 views
8

Sono un principiante di Scala e Apache Spark e sto cercando di utilizzare Spark SQL. Dopo la clonazione del repo ho iniziato il guscio scintilla digitando bin/spark-shell ed eseguire il seguente:SparkSQL MissingRequirementError durante la registrazione della tabella

val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
import sqlContext.createSchemaRDD 
val pathUsers = "users.txt" 
case class User(uid: String, name: String, surname: String) 
val users = sc.textFile(pathUsers).map(_.split(" ")).map(u => User(u(0), u(1), u(2))) 
users.registerTempTable("users") 
val res = sqlContext.sql("SELECT * FROM users") 
res.collect().foreach(println) 

e tutto ha funzionato come previsto. Il file users.txt è qualcosa di simile:

uid-1 name1 surname1 
uid-2 name2 surname2 
... 

Dopo che ho cercato di creare un progetto autonomo e ho costruito le dipendenze utilizzando sbt. Le dipendenze elencate nel build.sbt sono i seguenti:

"org.apache.spark" % "spark-streaming_2.10" % "1.2.0", 
"org.apache.spark" % "spark-streaming-kafka_2.10" % "1.2.0", 
"org.apache.spark" % "spark-sql_2.10" % "1.2.0", 
"org.apache.spark" % "spark-catalyst_2.10" % "1.2.0" 

Se corro le stesse istruzioni che si blocca su questa linea:

users.registerTempTable("users") 

con questo errore:

scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with [email protected] of type class java.net.URLClassLoader with classpath [file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/jansi.jar,file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/jline.jar,file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/scala-compiler.jar,file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/scala-library.jar,file:/Users/se7entyse7en/.sbt/boot/scala-2.10.4/lib/scala-reflect.jar] and parent being [email protected] of type class xsbt.boot.BootFilteredLoader with classpath [<unknown>] and parent being [email protected] of type class sun.misc.Launcher$AppClassLoader with classpath [file:/usr/local/Cellar/sbt/0.13.5/libexec/sbt-launch.jar] and parent being [email protected] of type class sun.misc.Launcher$ExtClassLoader with classpath [file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/dnsns.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/localedata.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/sunec.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar,file:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/ext/zipfs.jar,file:/System/Library/Java/Extensions/MRJToolkit.jar] and parent being primordial classloader with boot classpath [/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/sunrsasign.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/lib/JObjC.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/classes] not found. 
at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) 
at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) 
at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) 
at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) 
at scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72) 
at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119) 
at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21) 
at org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115) 
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231) 
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231) 
at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335) 
at scala.reflect.api.Universe.typeOf(Universe.scala:59) 
at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115) 
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) 
at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100) 
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) 
at org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94) 
at org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33) 
at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111) 
at .<init>(<console>:20) 
at .<clinit>(<console>) 
at .<init>(<console>:7) 
at .<clinit>(<console>) 
at $print(<console>) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734) 
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983) 
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) 
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604) 
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568) 
at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760) 
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805) 
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717) 
at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581) 
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588) 
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591) 
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882) 
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837) 
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837) 
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) 
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837) 
at scala.tools.nsc.interpreter.ILoop.main(ILoop.scala:904) 
at xsbt.ConsoleInterface.run(ConsoleInterface.scala:69) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:102) 
at sbt.compiler.AnalyzingCompiler.console(AnalyzingCompiler.scala:77) 
at sbt.Console.sbt$Console$$console0$1(Console.scala:23) 
at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply$mcV$sp(Console.scala:24) 
at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:24) 
at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:24) 
at sbt.Logger$$anon$4.apply(Logger.scala:90) 
at sbt.TrapExit$App.run(TrapExit.scala:244) 
at java.lang.Thread.run(Thread.java:744) 

qual è il problema ?

UPDATE:

Ok, non credo che il problema sia Spark SQL, ma è Spark stessa come io non sono nemmeno in grado di eseguire users.collect(). Invece se viene eseguito nella shell di scintilla il risultato è:

res5: Array[User] = Array(User(uid-1,name1,surname1), User(uid-2,name2,surname2)) 

come previsto. L'errore è il seguente:

15/01/08 09:47:02 INFO FileInputFormat: Total input paths to process : 1 
15/01/08 09:47:02 INFO SparkContext: Starting job: collect at <console>:19 
15/01/08 09:47:02 INFO DAGScheduler: Got job 0 (collect at <console>:19) with 2 output partitions (allowLocal=false) 
15/01/08 09:47:02 INFO DAGScheduler: Final stage: Stage 0(collect at <console>:19) 
15/01/08 09:47:02 INFO DAGScheduler: Parents of final stage: List() 
15/01/08 09:47:02 INFO DAGScheduler: Missing parents: List() 
15/01/08 09:47:02 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[3] at map at <console>:17), which has no missing parents 
15/01/08 09:47:02 INFO MemoryStore: ensureFreeSpace(2840) called with curMem=157187, maxMem=556038881 
15/01/08 09:47:02 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.8 KB, free 530.1 MB) 
15/01/08 09:47:02 INFO MemoryStore: ensureFreeSpace(2002) called with curMem=160027, maxMem=556038881 
15/01/08 09:47:02 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2002.0 B, free 530.1 MB) 
15/01/08 09:47:02 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.100.195:63917 (size: 2002.0 B, free: 530.3 MB) 
15/01/08 09:47:02 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0 
15/01/08 09:47:02 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838 
15/01/08 09:47:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[3] at map at <console>:17) 
15/01/08 09:47:02 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
15/01/08 09:47:02 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.100.195, PROCESS_LOCAL, 1326 bytes) 
15/01/08 09:47:02 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.100.195, PROCESS_LOCAL, 1326 bytes) 
15/01/08 09:47:02 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 192.168.100.195): java.io.EOFException 
at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2744) 
at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1032) 
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) 
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) 
at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216) 
at org.apache.hadoop.io.UTF8.readString(UTF8.java:208) 
at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) 
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237) 
at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) 
at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43) 
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985) 
at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) 
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) 
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) 
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) 
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) 
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:744) 

Ho trovato anche questo java.io.EOFException on Spark EC2 Cluster when submitting job programatically, ma non so quale versione di hadoop-client potrebbe essere necessario.

risposta

0

Prova ad aggiungere "org.apache.spark"% "spark-catalyst_2.10"% "1.2.0" (anche se ritengo che questo debba essere inserito come dipendenza).

+0

provato, ma nulla. Si prega di leggere la domanda aggiornata. – se7entyse7en

+0

Come si esegue il programma Spark? Puoi condividere più codice con noi? – tgpfeiffer