pySpark Crea DataFrame da RDD con chiave/valore

Se si dispone di un RDD di chiave/valore (la chiave è l'indice della colonna) è possibile caricarlo in un dataframe? Per esempio:pySpark Crea DataFrame da RDD con chiave/valore

(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)

e hanno l'aspetto dataframe come:

1,2,18 
1,10,18 
2,20,18

fonte

2015-05-02 theMadKing

Sì è possibile (testato con Spark 1.3.1):

>>> rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)]) 
>>> sqlContext.createDataFrame(rdd, ["id", "score"]) 
Out[2]: DataFrame[id: bigint, score: bigint]

fonte

2015-05-02 20:43:11

È equivolento a 'rdd.toDF ([" id "," punteggio "])'? –

L'oggetto 'RDD' non ha attributo 'toDF'. Di fronte a questo errore –

Sto usando 1.6 spark e pyspark. Impossibile caricare sql.SQLContext e creare DataFrame al di fuori di esso. –

rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)]) 

df=rdd.toDF(['id','score']) 

df.show()

risposta è:

+---+-----+ 
| id|score| 
+---+-----+ 
| 0| 1| 
| 0| 1| 
| 0| 2| 
| 1| 2| 
| 1| 10| 
| 1| 20| 
| 3| 18| 
| 3| 18| 
| 3| 18| 
+---+-----+

fonte

2017-02-10 04:39:39 srinivasu

pySpark Crea DataFrame da RDD con chiave/valore

risposta

Problemi correlati