大数据技术实践之Spark集群的安装配置
Spark集群的安装配置
1. spark安装包上传与解压
1 用XShell传输工具XFtp将Spark安装包spark-3.2.1-bin-hadoop2.7.tgz导入到opt目录下面的software文件夹下
2 将JDK和hadoop安装包解压到/opt/module目录下
[syf@hadoop102 ~]$ tar -zxvf /opt/software/spark-3.2.1-bin-hadoop2.7.tgz -C /opt/module/
3 给spark安装目录改名:
[syf@hadoop102 ~]$ cd /opt/module/
[syf@hadoop102 module]$ mv spark-3.2.1-bin-hadoop2.7/ spark-3.2.1/
2. 修改环境变量
[syf@hadoop102 module]$ sudo vim /etc/profile.d/my_env.sh
在原文件最后加入以下内容,修改环境变量
#SPARK_HOME
export SPARK_HOME=/opt/module/spark-3.2.1
export PATH=$PATH:$SPARK_HOME/bin
将新的环境变量同步到集群的其他服务器
[syf@hadoop102 module]$ sudo xsync /etc/profile.d/my_env.sh
让新的环境变量PATH生效
[syf@hadoop102 module]$ source /etc/profile
[syf@hadoop103 ~]$ source /etc/profile
[syf@hadoop104 ~]$ source /etc/profile
3. 参数配置
1 修改Workers
[syf@hadoop102 module]$ cd /opt/module/spark-3.2.1/conf/
[syf@hadoop102 conf]$ cp workers.template workers
[syf@hadoop102 conf]$ vim workers
删除掉原 localhost 后,添加以下内容
hadoop103
hadoop104
2 修改 spark-env.sh
[syf@hadoop102 conf]$ cp spark-env.sh.template spark-env.sh
[syf@hadoop102 conf]$ vim spark-env.sh
进入vim编辑器后,按快捷键大写G跳转到文件末尾
JAVA_HOME=/opt/module/jdk1.8.0_212
HADOOP_CONF_DIR=/opt/module/hadoop-3.1.3/etc/hadoop
SPARK_MASTER_IP=hadoop102
SPARK_MASTER_WEBUI_PORT=8085
SPARK_MASTER_PORT=7077
SPARK_WORKER_MEMORY=512m
SPARK_WORKER_CORES=1
SPARK_EXECUTOR_MEMORY=512m
SPARK_EXECUTOR_CORES=1
SPARK_WORKER_INSTANCES=1
3 修改 spark-defaults.conf
[syf@hadoop102 conf]$ cp spark-defaults.conf.template spark-defaults.conf
[syf@hadoop102 conf]$ vim spark-defaults.conf
spark.master spark://hadoop102:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop102:8020/spark-logs
spark.history.fs.logDirectory hdfs://hadoop102:8020/spark-logs
4 在hadoop集群中创建spark-logs目录
1) 启动hadoop集群和历史服务器
[syf@hadoop102 conf]$ start_hadoop.sh start
查看jps进程,确认集群启动成功
2) 创建spark-logs目录
[syf@hadoop102 conf]$ hdfs dfs -mkdir /spark-logs
5 集群同步
[syf@hadoop102 conf]$ xsync /opt/module/spark-3.2.1/
4. 启动关闭Spark
启动集群
[syf@hadoop102 module]$ $SPARK_HOME/sbin/start-all.sh
关闭集群
[syf@hadoop102 module]$ $SPARK_HOME/sbin/stop-all.sh
5. 启动spark-shell
6. 退出spark-shell
scala> :q
补充内容:几个一键启动或一键关闭命令文件
1. 一键查看集群所有服务器的jps进程——jpsall
(1)编辑新建jpsall文件
[syf@hadoop102 ~]$ vim /home/syf/bin/jpsall
(2)复制添加以下内容:
#!/bin/bash
for i in hadoop102 hadoop103 hadoop104
do
echo
echo =================$i JPS进程=============================
ssh $i "/opt/module/jdk1.8.0_212/bin/jps"
done
(3)按ESC,输入:wq保存退出
(4) 修改权限
[syf@hadoop102 ~]$ chmod +x /home/syf/bin/jpsall
2. 一键启动hadoop & spark —— start_spark
#!/bin/bash
echo "=================启动Hadoop集群======================="
echo "=================启动hdfs============================="
ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
echo "=================启动yarn============================="
ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
echo "=================启动历史服务器============================="
ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
echo "=================启动spark============================="
ssh hadoop102 "/opt/module/spark-3.2.1/sbin/start-all.sh"
ssh hadoop102 "/opt/module/spark-3.2.1/sbin/start-history-server.sh"
3. 一键关闭hadoop & spark —— stop_spark
#!/bin/bash
echo "=================关闭spark============================="
ssh hadoop102 "/opt/module/spark-3.2.1/sbin/stop-all.sh"
ssh hadoop102 "/opt/module/spark-3.2.1/sbin/stop-history-server.sh"
echo "=================关闭历史服务器============================="
ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
echo "=================关闭yarn============================="
ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
echo "=================关闭hdfs============================="
ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh“
- 点赞
- 收藏
- 关注作者
评论(0)