Hidden Secrets while trying SQL Dataframes

There are few points that one should keep in mind while using DataFrames with SparkSql like when to import sqlcontext.implicits._  and how to deal with No Type Definetion found error

Let’s try to understand it with an example

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
/**
* @author amithora
*/
object SqlContextJsonRDD {
case class Employee(name: String,rank: Int)// note this class declaration if you declare the class inside the method you will get type not found error
def main(args: Array[String]): Unit = {
val sparkConf=new SparkConf().setMaster("local[2]").setAppName("ImplicitExampleSQL")
val sparkContext=new SparkContext(sparkConf)
val sqlContext=new SQLContext(sparkContext)
import sqlContext.implicits._ // note this import if you try to import this after package declaration you will get compilation error this is used to use toDF() method if you don't import this you will not find toDF()
val textRdd=sparkContext.textFile("Person.txt", 2)

val personDF=textRdd.map { line =>{
line.split(",") }
}.map { p => Employee(p(0),p(1).trim().toInt)
}.toDF()

}
}

Leave a Reply

Your email address will not be published. Required fields are marked *