六十七、Spark-两种运行方式(本地运行,提交集群运行)

举报
托马斯-酷涛 发表于 2022/05/26 00:33:28 2022/05/26
【摘要】 本地运行:在IDEA中直接运行控制台输出结果即可 集群运行:在本地将程序打包为 jar,提交至集群运行其程序(将结果上传至hdfs) 文章目录 一、本地运行spark程序 二、集群运行spark程序 一、本地运行spark程序       &nbsp...

本地运行:在IDEA中直接运行控制台输出结果即可

集群运行:在本地将程序打包为 jar,提交至集群运行其程序(将结果上传至hdfs

文章目录

一、本地运行spark程序

二、集群运行spark程序


一、本地运行spark程序

        1、pom依赖

        注:依赖配置项及其版本一定要与集群环境相适配


  
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <project xmlns="http://maven.apache.org/POM/4.0.0"
  3. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  4. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  5. <modelVersion>4.0.0</modelVersion>
  6. <groupId>cn.itcast</groupId>
  7. <artifactId>SparkDemo</artifactId>
  8. <version>1.0-SNAPSHOT</version>
  9. <repositories>
  10. <repository>
  11. <id>aliyun</id>
  12. <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
  13. </repository>
  14. <repository>
  15. <id>apache</id>
  16. <url>https://repository.apache.org/content/repositories/snapshots/</url>
  17. </repository>
  18. <repository>
  19. <id>cloudera</id>
  20. <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
  21. </repository>
  22. </repositories>
  23. <properties>
  24. <encoding>UTF-8</encoding>
  25. <maven.compiler.source>1.8</maven.compiler.source>
  26. <maven.compiler.target>1.8</maven.compiler.target>
  27. <scala.version>2.12.11</scala.version>
  28. <spark.version>3.0.1</spark.version>
  29. <hadoop.version>2.7.5</hadoop.version>
  30. </properties>
  31. <dependencies>
  32. <!--依赖Scala语言-->
  33. <dependency>
  34. <groupId>org.scala-lang</groupId>
  35. <artifactId>scala-library</artifactId>
  36. <version>${scala.version}</version>
  37. </dependency>
  38. <!--SparkCore依赖-->
  39. <dependency>
  40. <groupId>org.apache.spark</groupId>
  41. <artifactId>spark-core_2.12</artifactId>
  42. <version>${spark.version}</version>
  43. </dependency>
  44. <!-- spark-streaming-->
  45. <dependency>
  46. <groupId>org.apache.spark</groupId>
  47. <artifactId>spark-streaming_2.12</artifactId>
  48. <version>${spark.version}</version>
  49. </dependency>
  50. <!--spark-streaming+Kafka依赖-->
  51. <dependency>
  52. <groupId>org.apache.spark</groupId>
  53. <artifactId>spark-streaming-kafka-0-10_2.12</artifactId>
  54. <version>${spark.version}</version>
  55. </dependency>
  56. <!--SparkSQL依赖-->
  57. <dependency>
  58. <groupId>org.apache.spark</groupId>
  59. <artifactId>spark-sql_2.12</artifactId>
  60. <version>${spark.version}</version>
  61. </dependency>
  62. <!--SparkSQL+ Hive依赖-->
  63. <dependency>
  64. <groupId>org.apache.spark</groupId>
  65. <artifactId>spark-hive_2.12</artifactId>
  66. <version>${spark.version}</version>
  67. </dependency>
  68. <dependency>
  69. <groupId>org.apache.spark</groupId>
  70. <artifactId>spark-hive-thriftserver_2.12</artifactId>
  71. <version>${spark.version}</version>
  72. </dependency>
  73. <!--StructuredStreaming+Kafka依赖-->
  74. <dependency>
  75. <groupId>org.apache.spark</groupId>
  76. <artifactId>spark-sql-kafka-0-10_2.12</artifactId>
  77. <version>${spark.version}</version>
  78. </dependency>
  79. <!-- SparkMlLib机器学习模块,里面有ALS推荐算法-->
  80. <dependency>
  81. <groupId>org.apache.spark</groupId>
  82. <artifactId>spark-mllib_2.12</artifactId>
  83. <version>${spark.version}</version>
  84. </dependency>
  85. <dependency>
  86. <groupId>org.apache.hadoop</groupId>
  87. <artifactId>hadoop-client</artifactId>
  88. <version>2.7.5</version>
  89. </dependency>
  90. <dependency>
  91. <groupId>com.hankcs</groupId>
  92. <artifactId>hanlp</artifactId>
  93. <version>portable-1.7.7</version>
  94. </dependency>
  95. <dependency>
  96. <groupId>mysql</groupId>
  97. <artifactId>mysql-connector-java</artifactId>
  98. <version>8.0.23</version>
  99. </dependency>
  100. <dependency>
  101. <groupId>redis.clients</groupId>
  102. <artifactId>jedis</artifactId>
  103. <version>2.9.0</version>
  104. </dependency>
  105. <dependency>
  106. <groupId>com.alibaba</groupId>
  107. <artifactId>fastjson</artifactId>
  108. <version>1.2.47</version>
  109. </dependency>
  110. <dependency>
  111. <groupId>org.projectlombok</groupId>
  112. <artifactId>lombok</artifactId>
  113. <version>1.18.2</version>
  114. <scope>provided</scope>
  115. </dependency>
  116. </dependencies>
  117. <build>
  118. <sourceDirectory>src/main/scala</sourceDirectory>
  119. <plugins>
  120. <!-- 指定编译java的插件 -->
  121. <plugin>
  122. <groupId>org.apache.maven.plugins</groupId>
  123. <artifactId>maven-compiler-plugin</artifactId>
  124. <version>3.5.1</version>
  125. </plugin>
  126. <!-- 指定编译scala的插件 -->
  127. <plugin>
  128. <groupId>net.alchim31.maven</groupId>
  129. <artifactId>scala-maven-plugin</artifactId>
  130. <version>3.2.2</version>
  131. <executions>
  132. <execution>
  133. <goals>
  134. <goal>compile</goal>
  135. <goal>testCompile</goal>
  136. </goals>
  137. <configuration>
  138. <args>
  139. <arg>-dependencyfile</arg>
  140. <arg>${project.build.directory}/.scala_dependencies</arg>
  141. </args>
  142. </configuration>
  143. </execution>
  144. </executions>
  145. </plugin>
  146. <plugin>
  147. <groupId>org.apache.maven.plugins</groupId>
  148. <artifactId>maven-surefire-plugin</artifactId>
  149. <version>2.18.1</version>
  150. <configuration>
  151. <useFile>false</useFile>
  152. <disableXmlReport>true</disableXmlReport>
  153. <includes>
  154. <include>**/*Test.*</include>
  155. <include>**/*Suite.*</include>
  156. </includes>
  157. </configuration>
  158. </plugin>
  159. <plugin>
  160. <groupId>org.apache.maven.plugins</groupId>
  161. <artifactId>maven-shade-plugin</artifactId>
  162. <version>2.3</version>
  163. <executions>
  164. <execution>
  165. <phase>package</phase>
  166. <goals>
  167. <goal>shade</goal>
  168. </goals>
  169. <configuration>
  170. <filters>
  171. <filter>
  172. <artifact>*:*</artifact>
  173. <excludes>
  174. <exclude>META-INF/*.SF</exclude>
  175. <exclude>META-INF/*.DSA</exclude>
  176. <exclude>META-INF/*.RSA</exclude>
  177. </excludes>
  178. </filter>
  179. </filters>
  180. <transformers>
  181. <transformer
  182. implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
  183. <mainClass></mainClass>
  184. </transformer>
  185. </transformers>
  186. </configuration>
  187. </execution>
  188. </executions>
  189. </plugin>
  190. </plugins>
  191. </build>
  192. </project>

        2、数据展示

 

        3、代码编写


  
  1. package org.example.spark
  2. import org.apache.spark.rdd.RDD
  3. import org.apache.spark.{SparkConf, SparkContext}
  4. object word {
  5. def main(args: Array[String]): Unit = {
  6. //准备环境
  7. val conf = new SparkConf().setMaster("local[*]").setAppName("wordcount")
  8. val sc = new SparkContext(conf)
  9. //加载文件
  10. val rdd1: RDD[String] = sc.textFile("data/input/words.txt")
  11. // 处理数据
  12. val rdd2: RDD[String] = rdd1.flatMap(lp => {
  13. lp.split(" ")
  14. })
  15. val rdd3: RDD[(String, Int)] = rdd2.map(it => (it, 1))
  16. val rdd4: RDD[(String, Int)] = rdd3.reduceByKey((curr, agg) => curr + agg)
  17. val result: Array[(String, Int)] = rdd4.collect()
  18. result.foreach(i => println(i))
  19. }
  20. }

        4、本地运行

        注:单词统计案例本地效果如图所示 

二、集群运行spark程序

        1、修改代码

val rdd1: RDD[String] = sc.textFile("hdfs:///input/wordcount.txt")
 
rdd4.saveAsTextFile("hdfs://192.168.231.247:8020/output/output1")
 

        注:集群运行文件加载路径设置为hdfs,即每次集群运行从hdfs拿取数据,并将实时数据上传至hdfs

        2、打包jar

        注:双击maven中的package,maven会自动进行清除缓存,测试并打包为jar 

        3、找到项目路径中的jar包

        注:jar包大小最小的为不是带全部依赖的jar包,在集群运行不需要全部的依赖,即上传最小依赖的jar包即可 

        4、上传至linux

        注:此处使用xftp进行传输 jar包

        5、启动 hadoop 以及 spark 集群 

        6、进入spark安装目录下执行

bin/spark-submit --class org.example.spark.word --master spark://master:8020 /input/original-SparkDemo-1.0-SNAPSHOT.jar 
 

         注:单词统计集群运行如图所示

        7、进入hdfs  web端目录进行查看

 

Spark-两种运行方式(本地运行,提交集群运行)完成


文章来源: tuomasi.blog.csdn.net,作者:托马斯-酷涛,版权归原作者所有,如需转载,请联系作者。

原文链接:tuomasi.blog.csdn.net/article/details/122919106

【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。