大数据进阶之路——Spark SQL 之 DataFrame&&Dataset
【摘要】
文章目录
dataframe 和 rddAPI常用操作DataFrame和RDD案例DataSet
DataFrame它不是Spark SQL提出的,而是早起在R、Pandas语言就...
-
A Dataset is a distributed collection of data:分布式的数据集
-
A DataFrame is a Dataset organized into named columns.
以列(列名、列的类型、列值)的形式构成的分布式数据集,按照列赋予不同的名称
dataframe 和 rdd
DataFrame是一种以RDD为基础的分布式数据集,类似于传统数据库中的二维表格。DataFrame引入了schema。
RDD | DataFrame |
---|---|
java/scala ==> jvm | java/scala/python ==> Logic Plan |
python ==> python runtime | |
Spark框架本身不了解Person类的内部结构 | Spark SQL可以清楚地知道该数据集中包含哪些列,每列的名称和类型各是什么 |
API常用操作
package org.example
import org.apache.spark.sql.SparkSession
object DataFrameApp {
def main(args: Array[String]) {
val spark = SparkSession.builder().appName("DataFrameApp").master("local[2]").getOrCreate()
// 将json文件加载成一个dataframe
val peopleDF = spark.read.format("json").load("people.json")
// 输出dataframe对应的schema信息
peopleDF.printSchema()
// 输出数据集的前20条记录
peopleDF.show()
//查询某列所有的数据: select name from table
peopleDF.select("name").show()
// 查询某几列所有的数据,并对列进行计算: select name, age+10 as age2 from table
peopleDF.select(peopleDF.col("name"), (peopleDF.col("age") + 10).as("age2")).show()
//根据某一列的值进行过滤: select * from table where age>19
peopleDF.filter(peopleDF.col("age") > 19).show()
//根据某一列进行分组,然后再进行聚合操作: select age,count(1) from table group by age
peopleDF.groupBy("age").count().show()
spark.stop()
}
}
root
|-- age: long (nullable = true)
|-- name: string (nullable = true)
+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
+-------+
| name|
+-------+
|Michael|
| Andy|
| Justin|
+-------+
+-------+----+
| name|age2|
+-------+----+
|Michael|null|
| Andy| 40|
| Justin| 29|
+-------+----+
+---+----+
|age|name|
+---+----+
| 30|Andy|
+---+----+
+----+-----+
| age|count|
+----+-----+
| 19| 1|
|null| 1|
| 30| 1|
+----+-----+
DataFrame和RDD
package org.example
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
import org.apache.spark.sql.{Row, SparkSession}
object DataFrameRDDApp {
def main(args: Array[String]) {
val spark = SparkSession.builder().appName("DataFrameRDDApp").master("local[2]").getOrCreate()
//inferReflection(spark)
program(spark)
spark.stop()
}
def program(spark: SparkSession): Unit = {
// RDD ==> DataFrame
val rdd = spark.sparkContext.textFile("infos.txt")
val infoRDD = rdd.map(_.split(",")).map(line => Row(line(0).toInt, line(1), line(2).toInt))
val structType = StructType(Array(StructField("id", IntegerType, true),
StructField("name", StringType, true),
StructField("age", IntegerType, true)))
val infoDF = spark.createDataFrame(infoRDD, structType)
infoDF.printSchema()
infoDF.show()
//通过df的api进行操作
infoDF.filter(infoDF.col("age") > 70).show
//通过sql的方式进行操作
infoDF.createOrReplaceTempView("infos")
spark.sql("select * from infos where age > 70").show()
}
def inferReflection(spark: SparkSession) {
// RDD ==> DataFrame
val rdd = spark.sparkContext.textFile("infos.txt")
//注意:需要导入隐式转换
import spark.implicits._
val infoDF = rdd.map(_.split(",")).map(line => Info(line(0).toInt, line(1), line(2).toInt)).toDF()
infoDF.show()
infoDF.filter(infoDF.col("age") > 70).show
infoDF.createOrReplaceTempView("infos")
spark.sql("select * from infos where age > 70").show()
}
case class Info(id: Int, name: String, age: Int)
}
root
|-- id: integer (nullable = true)
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
+---+-----+---+
| id| name|age|
+---+-----+---+
| 1|hello| 66|
| 2|world| 89|
| 3|spark| 88|
+---+-----+---+
+---+-----+---+
| id| name|age|
+---+-----+---+
| 2|world| 89|
| 3|spark| 88|
+---+-----+---+
+---+-----+---+
| id| name|age|
+---+-----+---+
| 2|world| 89|
| 3|spark| 88|
+---+-----+---+
案例
package org.example
import org.apache.spark.sql.SparkSession
/**
* DataFrame中的操作操作
*/
object DataFrameProjectApp {
def main(args: Array[String]) {
val spark = SparkSession.builder().appName("DataFrameProjectApp").master("local[2]").getOrCreate()
// RDD ==> DataFrame
val rdd = spark.sparkContext.textFile("Student.data")
//注意:需要导入隐式转换
import spark.implicits._
val studentDF = rdd.map(_.split("\\|")).map(line => Student(line(0).toInt, line(1), line(2), line(3))).toDF()
//show默认只显示前20条
studentDF.show
studentDF.show(30)
studentDF.show(30, false)
studentDF.take(10)
studentDF.first()
studentDF.head(3)
studentDF.select("email").show(30,false)
studentDF.filter("name=''").show
studentDF.filter("name='' OR name='NULL'").show
//name以M开头的人
studentDF.filter("SUBSTR(name,0,1)='M'").show
studentDF.sort(studentDF("name")).show
studentDF.sort(studentDF("name").desc).show
studentDF.sort("name","id").show
studentDF.sort(studentDF("name").asc, studentDF("id").desc).show
studentDF.select(studentDF("name").as("student_name")).show
val studentDF2 = rdd.map(_.split("\\|")).map(line => Student(line(0).toInt, line(1), line(2), line(3))).toDF()
studentDF.join(studentDF2, studentDF.col("id") === studentDF2.col("id")).show
spark.stop()
}
case class Student(id: Int, name: String, phone: String, email: String)
}
+---+--------+--------------+--------------------+
| id| name| phone| email|
+---+--------+--------------+--------------------+
| 1| Burke|1-300-746-8446|ullamcorper.velit...|
| 2| Kamal|1-668-571-5046|pede.Suspendisse@...|
| 3| Olga|1-956-311-1686|Aenean.eget.metus...|
| 4| Belle|1-246-894-6340|vitae.aliquet.nec...|
| 5| Trevor|1-300-527-4967|dapibus.id@acturp...|
| 6| Laurel|1-691-379-9921|adipiscing@consec...|
| 7| Sara|1-608-140-1995|Donec.nibh@enimEt...|
| 8| Kaseem|1-881-586-2689|cursus.et.magna@e...|
| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu...|
| 10| Maya|1-271-683-2698|accumsan.convalli...|
| 11| Emi|1-467-270-1337| est@nunc.com|
| 12| Caleb|1-683-212-0896|Suspendisse@Quisq...|
| 13|Florence|1-603-575-2444|sit.amet.dapibus@...|
| 14| Anika|1-856-828-7883|euismod@ligulaeli...|
| 15| Tarik|1-398-171-2268|turpis@felisorci.com|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s...|
| 17| Blossom|1-154-406-9596|Nunc.commodo.auct...|
| 18| Guy|1-869-521-3230|senectus.et.netus...|
| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@...|
| 20| Edward|1-711-710-6552|lectus@aliquetlib...|
+---+--------+--------------+--------------------+
only showing top 20 rows
| id| name| phone| email|
+---+--------+--------------+--------------------+
| 1| Burke|1-300-746-8446|ullamcorper.velit...|
| 2| Kamal|1-668-571-5046|pede.Suspendisse@...|
| 3| Olga|1-956-311-1686|Aenean.eget.metus...|
| 4| Belle|1-246-894-6340|vitae.aliquet.nec...|
| 5| Trevor|1-300-527-4967|dapibus.id@acturp...|
| 6| Laurel|1-691-379-9921|adipiscing@consec...|
| 7| Sara|1-608-140-1995|Donec.nibh@enimEt...|
| 8| Kaseem|1-881-586-2689|cursus.et.magna@e...|
| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu...|
| 10| Maya|1-271-683-2698|accumsan.convalli...|
| 11| Emi|1-467-270-1337| est@nunc.com|
| 12| Caleb|1-683-212-0896|Suspendisse@Quisq...|
| 13|Florence|1-603-575-2444|sit.amet.dapibus@...|
| 14| Anika|1-856-828-7883|euismod@ligulaeli...|
| 15| Tarik|1-398-171-2268|turpis@felisorci.com|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s...|
| 17| Blossom|1-154-406-9596|Nunc.commodo.auct...|
| 18| Guy|1-869-521-3230|senectus.et.netus...|
| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@...|
| 20| Edward|1-711-710-6552|lectus@aliquetlib...|
| 21| |1-711-710-6552|lectus@aliquetlib...|
| 22| |1-711-710-6552|lectus@aliquetlib...|
| 23| NULL|1-711-710-6552|lectus@aliquetlib...|
+---+--------+--------------+--------------------+
+---+--------+--------------+-----------------------------------------+
|id |name |phone |email |
+---+--------+--------------+-----------------------------------------+
|1 |Burke |1-300-746-8446|ullamcorper.velit.in@ametnullaDonec.co.uk|
|2 |Kamal |1-668-571-5046|pede.Suspendisse@interdumenim.edu |
|3 |Olga |1-956-311-1686|Aenean.eget.metus@dictumcursusNunc.edu |
|4 |Belle |1-246-894-6340|vitae.aliquet.nec@neque.co.uk |
|5 |Trevor |1-300-527-4967|dapibus.id@acturpisegestas.net |
|6 |Laurel |1-691-379-9921|adipiscing@consectetueripsum.edu |
|7 |Sara |1-608-140-1995|Donec.nibh@enimEtiamimperdiet.edu |
|8 |Kaseem |1-881-586-2689|cursus.et.magna@euismod.org |
|9 |Lev |1-916-367-5608|Vivamus.nisi@ipsumdolor.com |
|10 |Maya |1-271-683-2698|accumsan.convallis@ornarelectusjusto.edu |
|11 |Emi |1-467-270-1337|est@nunc.com |
|12 |Caleb |1-683-212-0896|Suspendisse@Quisque.edu |
|13 |Florence|1-603-575-2444|sit.amet.dapibus@lacusAliquamrutrum.ca |
|14 |Anika |1-856-828-7883|euismod@ligulaelit.co.uk |
|15 |Tarik |1-398-171-2268|turpis@felisorci.com |
|16 |Amena |1-878-250-3129|lorem.luctus.ut@scelerisque.com |
|17 |Blossom |1-154-406-9596|Nunc.commodo.auctor@eratSed.co.uk |
|18 |Guy |1-869-521-3230|senectus.et.netus@lectusrutrum.com |
|19 |Malachi |1-608-637-2772|Proin.mi.Aliquam@estarcu.net |
|20 |Edward |1-711-710-6552|lectus@aliquetlibero.co.uk |
|21 | |1-711-710-6552|lectus@aliquetlibero.co.uk |
|22 | |1-711-710-6552|lectus@aliquetlibero.co.uk |
|23 |NULL |1-711-710-6552|lectus@aliquetlibero.co.uk |
+---+--------+--------------+-----------------------------------------+
+-----------------------------------------+
|email |
+-----------------------------------------+
|ullamcorper.velit.in@ametnullaDonec.co.uk|
|pede.Suspendisse@interdumenim.edu |
|Aenean.eget.metus@dictumcursusNunc.edu |
|vitae.aliquet.nec@neque.co.uk |
|dapibus.id@acturpisegestas.net |
|adipiscing@consectetueripsum.edu |
|Donec.nibh@enimEtiamimperdiet.edu |
|cursus.et.magna@euismod.org |
|Vivamus.nisi@ipsumdolor.com |
|accumsan.convallis@ornarelectusjusto.edu |
|est@nunc.com |
|Suspendisse@Quisque.edu |
|sit.amet.dapibus@lacusAliquamrutrum.ca |
|euismod@ligulaelit.co.uk |
|turpis@felisorci.com |
|lorem.luctus.ut@scelerisque.com |
|Nunc.commodo.auctor@eratSed.co.uk |
|senectus.et.netus@lectusrutrum.com |
|Proin.mi.Aliquam@estarcu.net |
|lectus@aliquetlibero.co.uk |
|lectus@aliquetlibero.co.uk |
|lectus@aliquetlibero.co.uk |
|lectus@aliquetlibero.co.uk |
+-----------------------------------------+
+---+----+--------------+--------------------+
| id|name| phone| email|
+---+----+--------------+--------------------+
| 21| |1-711-710-6552|lectus@aliquetlib...|
| 22| |1-711-710-6552|lectus@aliquetlib...|
+---+----+--------------+--------------------+
+---+----+--------------+--------------------+
| id|name| phone| email|
+---+----+--------------+--------------------+
| 21| |1-711-710-6552|lectus@aliquetlib...|
| 22| |1-711-710-6552|lectus@aliquetlib...|
| 23|NULL|1-711-710-6552|lectus@aliquetlib...|
+---+----+--------------+--------------------+
+---+-------+--------------+--------------------+
| id| name| phone| email|
+---+-------+--------------+--------------------+
| 10| Maya|1-271-683-2698|accumsan.convalli...|
| 19|Malachi|1-608-637-2772|Proin.mi.Aliquam@...|
+---+-------+--------------+--------------------+
+---+--------+--------------+--------------------+
| id| name| phone| email|
+---+--------+--------------+--------------------+
| 21| |1-711-710-6552|lectus@aliquetlib...|
| 22| |1-711-710-6552|lectus@aliquetlib...|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s...|
| 14| Anika|1-856-828-7883|euismod@ligulaeli...|
| 4| Belle|1-246-894-6340|vitae.aliquet.nec...|
| 17| Blossom|1-154-406-9596|Nunc.commodo.auct...|
| 1| Burke|1-300-746-8446|ullamcorper.velit...|
| 12| Caleb|1-683-212-0896|Suspendisse@Quisq...|
| 20| Edward|1-711-710-6552|lectus@aliquetlib...|
| 11| Emi|1-467-270-1337| est@nunc.com|
| 13|Florence|1-603-575-2444|sit.amet.dapibus@...|
| 18| Guy|1-869-521-3230|senectus.et.netus...|
| 2| Kamal|1-668-571-5046|pede.Suspendisse@...|
| 8| Kaseem|1-881-586-2689|cursus.et.magna@e...|
| 6| Laurel|1-691-379-9921|adipiscing@consec...|
| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu...|
| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@...|
| 10| Maya|1-271-683-2698|accumsan.convalli...|
| 23| NULL|1-711-710-6552|lectus@aliquetlib...|
| 3| Olga|1-956-311-1686|Aenean.eget.metus...|
+---+--------+--------------+--------------------+
+---+--------+--------------+--------------------+
| id| name| phone| email|
+---+--------+--------------+--------------------+
| 5| Trevor|1-300-527-4967|dapibus.id@acturp...|
| 15| Tarik|1-398-171-2268|turpis@felisorci.com|
| 7| Sara|1-608-140-1995|Donec.nibh@enimEt...|
| 3| Olga|1-956-311-1686|Aenean.eget.metus...|
| 23| NULL|1-711-710-6552|lectus@aliquetlib...|
| 10| Maya|1-271-683-2698|accumsan.convalli...|
| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@...|
| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu...|
| 6| Laurel|1-691-379-9921|adipiscing@consec...|
| 8| Kaseem|1-881-586-2689|cursus.et.magna@e...|
| 2| Kamal|1-668-571-5046|pede.Suspendisse@...|
| 18| Guy|1-869-521-3230|senectus.et.netus...|
| 13|Florence|1-603-575-2444|sit.amet.dapibus@...|
| 11| Emi|1-467-270-1337| est@nunc.com|
| 20| Edward|1-711-710-6552|lectus@aliquetlib...|
| 12| Caleb|1-683-212-0896|Suspendisse@Quisq...|
| 1| Burke|1-300-746-8446|ullamcorper.velit...|
| 17| Blossom|1-154-406-9596|Nunc.commodo.auct...|
| 4| Belle|1-246-894-6340|vitae.aliquet.nec...|
| 14| Anika|1-856-828-7883|euismod@ligulaeli...|
+---+--------+--------------+--------------------+
+---+--------+--------------+--------------------+
| id| name| phone| email|
+---+--------+--------------+--------------------+
| 21| |1-711-710-6552|lectus@aliquetlib...|
| 22| |1-711-710-6552|lectus@aliquetlib...|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s...|
| 14| Anika|1-856-828-7883|euismod@ligulaeli...|
| 4| Belle|1-246-894-6340|vitae.aliquet.nec...|
| 17| Blossom|1-154-406-9596|Nunc.commodo.auct...|
| 1| Burke|1-300-746-8446|ullamcorper.velit...|
| 12| Caleb|1-683-212-0896|Suspendisse@Quisq...|
| 20| Edward|1-711-710-6552|lectus@aliquetlib...|
| 11| Emi|1-467-270-1337| est@nunc.com|
| 13|Florence|1-603-575-2444|sit.amet.dapibus@...|
| 18| Guy|1-869-521-3230|senectus.et.netus...|
| 2| Kamal|1-668-571-5046|pede.Suspendisse@...|
| 8| Kaseem|1-881-586-2689|cursus.et.magna@e...|
| 6| Laurel|1-691-379-9921|adipiscing@consec...|
| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu...|
| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@...|
| 10| Maya|1-271-683-2698|accumsan.convalli...|
| 23| NULL|1-711-710-6552|lectus@aliquetlib...|
| 3| Olga|1-956-311-1686|Aenean.eget.metus...|
+---+--------+--------------+--------------------+
+---+--------+--------------+--------------------+
| id| name| phone| email|
+---+--------+--------------+--------------------+
| 22| |1-711-710-6552|lectus@aliquetlib...|
| 21| |1-711-710-6552|lectus@aliquetlib...|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s...|
| 14| Anika|1-856-828-7883|euismod@ligulaeli...|
| 4| Belle|1-246-894-6340|vitae.aliquet.nec...|
| 17| Blossom|1-154-406-9596|Nunc.commodo.auct...|
| 1| Burke|1-300-746-8446|ullamcorper.velit...|
| 12| Caleb|1-683-212-0896|Suspendisse@Quisq...|
| 20| Edward|1-711-710-6552|lectus@aliquetlib...|
| 11| Emi|1-467-270-1337| est@nunc.com|
| 13|Florence|1-603-575-2444|sit.amet.dapibus@...|
| 18| Guy|1-869-521-3230|senectus.et.netus...|
| 2| Kamal|1-668-571-5046|pede.Suspendisse@...|
| 8| Kaseem|1-881-586-2689|cursus.et.magna@e...|
| 6| Laurel|1-691-379-9921|adipiscing@consec...|
| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu...|
| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@...|
| 10| Maya|1-271-683-2698|accumsan.convalli...|
| 23| NULL|1-711-710-6552|lectus@aliquetlib...|
| 3| Olga|1-956-311-1686|Aenean.eget.metus...|
+---+--------+--------------+--------------------+
+------------+
|student_name|
+------------+
| Burke|
| Kamal|
| Olga|
| Belle|
| Trevor|
| Laurel|
| Sara|
| Kaseem|
| Lev|
| Maya|
| Emi|
| Caleb|
| Florence|
| Anika|
| Tarik|
| Amena|
| Blossom|
| Guy|
| Malachi|
| Edward|
+------------+
+---+--------+--------------+--------------------+---+--------+--------------+--------------------+
| id| name| phone| email| id| name| phone| email|
+---+--------+--------------+--------------------+---+--------+--------------+--------------------+
| 12| Caleb|1-683-212-0896|Suspendisse@Quisq...| 12| Caleb|1-683-212-0896|Suspendisse@Quisq...|
| 22| |1-711-710-6552|lectus@aliquetlib...| 22| |1-711-710-6552|lectus@aliquetlib...|
| 1| Burke|1-300-746-8446|ullamcorper.velit...| 1| Burke|1-300-746-8446|ullamcorper.velit...|
| 13|Florence|1-603-575-2444|sit.amet.dapibus@...| 13|Florence|1-603-575-2444|sit.amet.dapibus@...|
| 6| Laurel|1-691-379-9921|adipiscing@consec...| 6| Laurel|1-691-379-9921|adipiscing@consec...|
| 16| Amena|1-878-250-3129|lorem.luctus.ut@s...| 16| Amena|1-878-250-3129|lorem.luctus.ut@s...|
| 3| Olga|1-956-311-1686|Aenean.eget.metus...| 3| Olga|1-956-311-1686|Aenean.eget.metus...|
| 20| Edward|1-711-710-6552|lectus@aliquetlib...| 20| Edward|1-711-710-6552|lectus@aliquetlib...|
| 5| Trevor|1-300-527-4967|dapibus.id@acturp...| 5| Trevor|1-300-527-4967|dapibus.id@acturp...|
| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@...| 19| Malachi|1-608-637-2772|Proin.mi.Aliquam@...|
| 15| Tarik|1-398-171-2268|turpis@felisorci.com| 15| Tarik|1-398-171-2268|turpis@felisorci.com|
| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu...| 9| Lev|1-916-367-5608|Vivamus.nisi@ipsu...|
| 17| Blossom|1-154-406-9596|Nunc.commodo.auct...| 17| Blossom|1-154-406-9596|Nunc.commodo.auct...|
| 4| Belle|1-246-894-6340|vitae.aliquet.nec...| 4| Belle|1-246-894-6340|vitae.aliquet.nec...|
| 8| Kaseem|1-881-586-2689|cursus.et.magna@e...| 8| Kaseem|1-881-586-2689|cursus.et.magna@e...|
| 23| NULL|1-711-710-6552|lectus@aliquetlib...| 23| NULL|1-711-710-6552|lectus@aliquetlib...|
| 7| Sara|1-608-140-1995|Donec.nibh@enimEt...| 7| Sara|1-608-140-1995|Donec.nibh@enimEt...|
| 10| Maya|1-271-683-2698|accumsan.convalli...| 10| Maya|1-271-683-2698|accumsan.convalli...|
| 21| |1-711-710-6552|lectus@aliquetlib...| 21| |1-711-710-6552|lectus@aliquetlib...|
| 11| Emi|1-467-270-1337| est@nunc.com| 11| Emi|1-467-270-1337| est@nunc.com|
+---+--------+--------------+--------------------+---+--------+--------------+--------------------+
only showing top 20 rows
DataSet
DataFrame = Dataset[Row]
Dataset:强类型 typed case class
DataFrame:弱类型 Row
-
SQL: seletc name from person; compile ok, result no
-
DF:
- df.select(“name”) compile no
- df.select(“nname”) compile ok
-
DS:ds.map(line => line.itemid) compile no
文章来源: hiszm.blog.csdn.net,作者:孙中明,版权归原作者所有,如需转载,请联系作者。
原文链接:hiszm.blog.csdn.net/article/details/120570484
【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)