hadoop分布式安装(一)
1,建立目录,解压软件清单
mkdir -p /hadoop/{soft,nn,dn,tmp,zookeeper,jn}
tar xvf hadoop-2.8.3.tar.gz -C /hadoop/soft
tar xvf zookeeper-3.4.12.tar.gz -C /hadoop/soft
tar -zxvf jdk-8u161-linux-x64.gz -C /hadoop/soft
chown -R root:root /hadoop/soft
2,配置ssh,对等性,并测试免密码
ssh ceph1 date
ssh ceph2 date
ssh ceph3 date
3,配置zookeeper
[root@ceph1 ~]# cd /hadoop/soft/zookeeper-3.4.12/conf/
[root@ceph1 conf]# cp zoo_sample.cfg zoo.cf
dataDir=/hadoop/zookeeper
server.1=192.168.0.231:2888:3888
server.2=192.168.0.232:2888:3888
server.3=192.168.0.233:2888:3888
在每个节点配置不同ID。
节点1:
echo 1 > /hadoop/zookeeper/myid
节点2:
echo 2 > /hadoop/zookeeper/myid
节点3:
echo 3 > /hadoop/zookeeper/myid
4,启动和查看zookeeper
/hadoop/soft/zookeeper-3.4.12/bin/zkServer.sh stop
/hadoop/soft/zookeeper-3.4.12/bin/zkServer.sh start
/hadoop/soft/zookeeper-3.4.12/bin/zkServer.sh status
5,修改hadoop配置文件
/hadoop/soft/hadoop-2.8.3/etc/hadoop
core-site,hdfs-site.xml,
mapred-site.xml,yarn-site,slave
======core-site
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hahadoop</value>
</property>
<property>
<name>Hadoop.tmp.dir</name>
<value>/hadoop/tmp/</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>ceph1:2181,ceph2:2181,ceph3:2181</value>
</property>
</configuration>
========hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/dfs/name</value>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>${dfs.namenode.name.dir}</value>
<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/dfs/data</value>
<description>datanode存放block本地目录(需要修改)</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--指定hdfs的nameservice为ruozeclusterg7,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ruozeclusterg7</value>
</property>
<property>
<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
<name>dfs.ha.namenodes.ruozeclusterg7</name>
<value>nn1,nn2</value>
</property>
<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg7.nn1</name>
<value>ceph1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg7.nn2</name>
<value>ceph2:8020</value>
</property>
<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
<name>dfs.namenode.http-address.ruozeclusterg7.nn1</name>
<value>ceph1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ruozeclusterg7.nn2</name>
<value>ceph2:50070</value>
</property>
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://ceph1:8485;ceph2:8485;ceph3:8485/ruozeclusterg7</value>
</property>
<property>
<!--JournalNode存放数据地址 -->
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/dfs/jn</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ruozeclusterg7</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
==============mapred-site.xml
<configuration>
<property>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>ceph1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ceph1:19888</value>
</property>
</configuration>
==============yarn-site
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>30000</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-rm-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>ceph1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>ceph2</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>ceph1:2181,ceph2:2181,ceph3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>ceph1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>ceph1:8034</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>ceph1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>ceph2:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>ceph2:8034</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>ceph2:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
==============slaves
ceph1
ceph2
ceph3
6,初始化环境:
#三台机器都要启动(journalnode:8480)
/hadoop/soft/hadoop-2.8.3/sbin/hadoop-daemon.sh start journalnode
#格式化 hadoop(主节点执行)
/hadoop/soft/hadoop-2.8.3/bin/hdfs namenode -format
#####成功提示! INFO common.Storage: Storage directory /home/hadoop/data/dfs/name has been successfully formatted.
/hadoop/soft/hadoop-2.8.3/bin/hdfs zkfc -formatZK
#####成功提示! INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/hahadoop in ZK.
7,启动首节点namenode(50070)
/hadoop/soft/hadoop-2.8.3/sbin/hadoop-daemon.sh start namenode
8,启动备节点 namenode(50070)
备节点同步首节点 namenode 的元数据
/hadoop/soft/hadoop-2.8.3/bin/hdfs namenode -bootstrapStandby
/hadoop/soft/hadoop-2.8.3/sbin/hadoop-daemon.sh start namenode
9,启动dfs:(hdfs-site||http://ip:50070/)
主节点执行:
/hadoop/soft/hadoop-2.8.3/sbin/start-dfs.sh
#启动namenode,datanode,journalnode
#检查hdfs:
/hadoop/soft/hadoop-2.8.3/bin/hdfs dfs -put /etc/passwd /
/hadoop/soft/hadoop-2.8.3/bin/hdfs dfs -ls /
#主节点状态为active。
10,启动yarn
(yarn-site.xml|| ResourceManger(Active):http://ip:8088。ResourceManger(Standby):http://ip:8088/cluster/cluster)
#主节点:
/hadoop/soft/hadoop-2.8.3/sbin/start-yarn.sh
#启动所有节点nodemanager,启动主节点resourcemanager
#备节点:
/hadoop/soft/hadoop-2.8.3/sbin/yarn-daemon.sh start resourcemanager
##启动备节点resourcemanager
###也可以执行该命令/hadoop/soft/hadoop-2.8.3/sbin/start-yarn.sh
11,在主节点启动JobHistoryServermapred-site.xml||19888)
/hadoop/soft/hadoop-2.8.3/sbin/mr-jobhistory-daemon.sh start historyserver
12,状态主备份检查:
/hadoop/soft/hadoop-2.8.3/bin/hdfs haadmin -getServiceState nn1
/hadoop/soft/hadoop-2.8.3/bin/hdfs haadmin -getServiceState nn2
/hadoop/soft/hadoop-2.8.3/bin/yarn rmadmin -getServiceState rm1
/hadoop/soft/hadoop-2.8.3/bin/yarn rmadmin -getServiceState rm2
13,启动和停止命令
======================
停止:
关闭Hadoop(YARN-->HDFS)
#备节点:(ResourceManager)
/hadoop/soft/hadoop-2.8.3/sbin/yarn-daemon.sh stop resourcemanager
#主节点:(ResourceManager,NodeManager)
/hadoop/soft/hadoop-2.8.3/sbin/stop-yarn.sh
/hadoop/soft/hadoop-2.8.3/sbin/stop-dfs.sh
启动:
#主节点:(namenode,datanode,journalnode, ZK Failover Controllers)(resourcemanager,nodemanager)
/hadoop/soft/hadoop-2.8.3/sbin/start-dfs.sh
/hadoop/soft/hadoop-2.8.3/sbin/start-yarn.sh
#备节点:(resourcemanager)
/hadoop/soft/hadoop-2.8.3/sbin/yarn-daemon.sh start resourcemanager
//start-dfs.sh,start-all.sh区别:
start-all.sh,stop-all.sh提示已经弃用了,建议使用start-dfs.sh and start-yarn.sh。
start-dfs.sh 只启动namenode 和datanode, start-all.sh还包括yarn的resourcemanager 和nodemanager。
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
手动启动hdfs命令参考:
zkServer.sh start
hadoop-daemon.sh start journalnode
hadoop-daemon.sh start namenode
hadoop-daemon.sh start zkfc
//哪个节点先启动zkfc,哪个节点为active。
hadoop-daemon.sh start datanode
手动启动yarn命令参考:
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
==============
- 点赞
- 收藏
- 关注作者
评论(0)