一零一九、岗位数据分析(Spark)

举报
托马斯-酷涛 发表于 2022/08/03 00:35:58 2022/08/03
【摘要】 目录 建表语句 原始数据 数据分析 完整代码 分析岗位数据如下要求: 分析不同学历的平均薪资(每个学历的平均估值(最高薪资平均值、最低薪资平均值求平均) 分析不同岗位的平均薪资(求每个学历的平均估值(最高薪资平均值、最低薪资平均值求平均) 分析各公司提供的岗位 建表语句 DROP TABLE I...

目录

建表语句

原始数据

数据分析

完整代码


分析岗位数据如下要求:

分析不同学历的平均薪资(每个学历的平均估值(最高薪资平均值、最低薪资平均值求平均

分析不同岗位的平均薪资(求每个学历的平均估值(最高薪资平均值、最低薪资平均值求平均

分析各公司提供的岗位

建表语句


  
  1. DROP TABLE IF EXISTS `job`;
  2. CREATE TABLE `job` (
  3. `address` varchar(255) DEFAULT NULL,
  4. `company` varchar(255) DEFAULT NULL,
  5. `edu` varchar(255) DEFAULT NULL,
  6. `jobName` varchar(255) DEFAULT NULL,
  7. `salary` varchar(255) DEFAULT NULL,
  8. `size` varchar(255) DEFAULT NULL
  9. ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

原始数据

数据分析

1、获取原始数据


  
  1. val source: RDD[Row] = sprak.read
  2. .format("jdbc")
  3. .option("url", "jdbc:mysql://localhost:3306/crawler")
  4. .option("dbtable", "job")
  5. .option("user", "root")
  6. .option("password", "123456")
  7. .load()
  8. .rdd
  9. source.foreach(println)

 可见,题目要求为最低工资平均值 与 最高工资平均值,先对数据格式进行处理一下

 2、格式化数据


  
  1. val data: RDD[(String, String, String, String, Float, Float, String)] = source.map(item => {
  2. val salary: Any = item(4)
  3. val min: String = salary.toString.split("-")(0)
  4. val minsalary: String =
  5. if (min.contains("面议")) min.substring(4)
  6. else min
  7. val max: Int = salary.toString.split("-")(1).length - 1
  8. val maxsalary: String = salary.toString.split("-")(1).substring(0, max)
  9. (item(0).toString, item(1).toString, item(2).toString, item(3).toString, minsalary.toFloat, maxsalary.toFloat, item(5).toString)
  10. })
  11. data.foreach(println)

 处理完数据后如图,并将其转为小数,方便后续运算

3、转为DF,创建表


  
  1. import sprak.implicits._
  2. val dataDF: DataFrame = data.toDF("address", "company", "edu", "jobName", "minsalary", "maxsalary", "size")
  3. dataDF.createOrReplaceTempView("salary")

4、分析不同学历的平均薪资每个学历的平均估值(最高薪资平均值、最低薪资平均值求平均))


  
  1. val sql1 =
  2. """
  3. |select s.edu,(s.avg_minsalary + s.avg_maxsalary)/2 as avgSalary from
  4. |(select edu,avg(minsalary) as avg_minsalary,avg(maxsalary) as avg_maxsalary from salary group by edu) s
  5. |""".stripMargin
  6. sprak.sql(sql1).show()

5、分析不同岗位的平均薪资(求每个学历的平均估值(最高薪资平均值、最低薪资平均值求平均))


  
  1. val sql2 =
  2. """
  3. |select s.jobName,(s.avg_minsalary+s.avg_maxsalary)/2 as avgSalary from
  4. |(select jobName,avg(minsalary) as avg_minsalary,avg(maxsalary) as avg_maxsalary from salary group by jobName) s
  5. |""".stripMargin
  6. sprak.sql(sql2).show()

 6、分析各公司提供的岗位


  
  1. val sql3 =
  2. """
  3. |select company,count(jobName) as jobNum from salary group by company
  4. |""".stripMargin //分析不同岗位的平均薪资
  5. sprak.sql(sql3).show()

完整代码


  
  1. package example.spark.test
  2. import org.apache.spark.rdd.RDD
  3. import org.apache.spark.sql.{DataFrame, Row, SparkSession}
  4. object WordCount {
  5. def main(args: Array[String]): Unit = {
  6. val sprak: SparkSession = SparkSession.builder().master("local[6]").appName("job")
  7. .config("spark.sql.warehouse.dir", "E:/")
  8. .getOrCreate()
  9. sprak.sparkContext.setLogLevel("ERROR")
  10. val source: RDD[Row] = sprak.read
  11. .format("jdbc")
  12. .option("url", "jdbc:mysql://localhost:3306/crawler")
  13. .option("dbtable", "job")
  14. .option("user", "root")
  15. .option("password", "123456")
  16. .load()
  17. .rdd
  18. source.foreach(println)
  19. println("------------------------------------------------------------------------------------")
  20. // val data: RDD[(String, String, String, String, Float, Float, String)] = source.map(item => {
  21. // var salary = item(4)
  22. // val minsalary: String = if (salary.toString.split("-")(0).contains("面议"))
  23. // salary.toString.split("-")(0).substring(4)
  24. // else salary.toString.split("-")(0)
  25. // val maxsalary: String = salary.toString.split("-")(1).substring(0, salary.toString.split("-")(1).length - 1)
  26. // (item(0).toString, item(1).toString, item(2).toString, item(3).toString, minsalary.toFloat, maxsalary.toFloat, item(5).toString)
  27. // })
  28. val data: RDD[(String, String, String, String, Float, Float, String)] = source.map(item => {
  29. val salary: Any = item(4)
  30. val min: String = salary.toString.split("-")(0)
  31. val minsalary: String =
  32. if (min.contains("面议")) min.substring(4)
  33. else min
  34. val max: Int = salary.toString.split("-")(1).length - 1
  35. val maxsalary: String = salary.toString.split("-")(1).substring(0, max)
  36. (item(0).toString, item(1).toString, item(2).toString, item(3).toString, minsalary.toFloat, maxsalary.toFloat, item(5).toString)
  37. })
  38. data.foreach(println)
  39. println("------------------------------------------------------------------------------------")
  40. import sprak.implicits._
  41. val dataDF: DataFrame = data.toDF("address", "company", "edu", "jobName", "minsalary", "maxsalary", "size")
  42. dataDF.createOrReplaceTempView("salary")
  43. val sql1 =
  44. """
  45. |select s.edu,(s.avg_minsalary + s.avg_maxsalary)/2 as avgSalary from
  46. |(select edu,avg(minsalary) as avg_minsalary,avg(maxsalary) as avg_maxsalary from salary group by edu) s
  47. |""".stripMargin
  48. sprak.sql(sql1).show()
  49. println("------------------------------------------------------------------------------------")
  50. val sql2 =
  51. """
  52. |select s.jobName,(s.avg_minsalary+s.avg_maxsalary)/2 as avgSalary from
  53. |(select jobName,avg(minsalary) as avg_minsalary,avg(maxsalary) as avg_maxsalary from salary group by jobName) s
  54. |""".stripMargin
  55. sprak.sql(sql2).show()
  56. println("------------------------------------------------------------------------------------")
  57. val sql3 =
  58. """
  59. |select company,count(jobName) as jobNum from salary group by company
  60. |""".stripMargin //分析不同岗位的平均薪资
  61. sprak.sql(sql3).show()
  62. }
  63. }

文章来源: tuomasi.blog.csdn.net,作者:托马斯-酷涛,版权归原作者所有,如需转载,请联系作者。

原文链接:tuomasi.blog.csdn.net/article/details/126039041

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。