Apache Spark, aggiungi una colonna "CASE WHEN ... ELSE ..." ad un DataFrame esistente

Sto provando ad aggiungere una colonna "CASE WHEN ... ELSE ..." ad un DataFrame esistente, usando le API di Scala. dataframe inizio:Apache Spark, aggiungi una colonna "CASE WHEN ... ELSE ..." ad un DataFrame esistente

color 
Red 
Green 
Blue

dataframe desiderata (sintassi SQL: CASO QUANDO colore verde == allora 1 altrimenti 0 END AS bool):

color bool 
Red 0 
Green 1 
Blue 0

Come devo implementare questa logica?

fonte

2015-06-11 Leonardo Biagioli

Eventuali duplicati di [SQL SPARK - caso in cui poi] (https://stackoverflow.com/questions/25157451/spark-sql-case-when-then) –

Nella prossima versione di SPARK 1.4.0 (dovrebbe essere rilasciato nei prossimi giorni). È possibile utilizzare il quando/altrimenti sintassi:

// Create the dataframe 
val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color") 

// Use when/otherwise syntax 
val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))

Se si utilizza SPARK 1.3.0 si può scegliere di utilizzare una funzione definita dall'utente:

// Define the UDF 
val isGreen = udf((color: String) => { 
    if (color == "Green") 1 
    else 0 
}) 
val df2 = df.withColumn("Green_Ind", isGreen($"color"))

fonte

2015-06-11 15:18:46 Herman

La ringrazio molto Herman, funziona! –

In Spark 1.5.0: è anche possibile utilizzare il funzione expr sintassi SQL

val df3 = df.withColumn("Green_Ind", expr("case when color = 'green' then 1 else 0 end"))

o semplice scintilla SQL

df.registerTempTable("data") 
val df4 = sql(""" select *, case when color = 'green' then 1 else 0 end as Green_ind from data """)

fonte

2015-10-28 12:46:07

Anche questo funziona in Python –

ho trovato questo:

https://issues.apache.org/jira/browse/SPARK-3813

ha funzionato per me su scintilla 2.1.0:

import sqlContext._ 
val rdd = sc.parallelize((1 to 100).map(i => Record(i, s"val_$i"))) 
rdd.registerTempTable("records") 
println("Result of SELECT *:") 
sql("SELECT case key when '93' then 'ravi' else key end FROM records").collect()

fonte

2017-02-25 20:10:44 ozma

ero alla ricerca di quel lungo periodo di tempo: ecco esempio di SPARK 2.1 JAVA con il gruppo by- per altri utenti java.

import static org.apache.spark.sql.functions.*; 
//... 
    Column uniqTrue = col("uniq").equalTo(true); 
    Column uniqFalse = col("uniq").equalTo(false); 

    Column testModeFalse = col("testMode").equalTo(false); 
    Column testModeTrue = col("testMode").equalTo(true); 

    Dataset<Row> x = basicEventDataset 
      .groupBy(col(group_field)) 
      .agg(
        sum(when((testModeTrue).and(uniqTrue), 1).otherwise(0)).as("tt"), 
        sum(when((testModeFalse).and(uniqTrue), 1).otherwise(0)).as("ft"), 
        sum(when((testModeTrue).and(uniqFalse), 1).otherwise(0)).as("tf"), 
        sum(when((testModeFalse).and(uniqFalse), 1).otherwise(0)).as("ff") 
      );

fonte

2017-12-13 14:03:04

Apache Spark, aggiungi una colonna "CASE WHEN ... ELSE ..." ad un DataFrame esistente

risposta

Problemi correlati