hadoop分布式安装(一)

举报
snowofsummer 发表于 2020/08/20 13:55:20 2020/08/20
【摘要】 1,建立目录,解压软件清单mkdir -p /hadoop/{soft,nn,dn,tmp,zookeeper,jn}tar xvf hadoop-2.8.3.tar.gz -C /hadoop/softtar xvf zookeeper-3.4.12.tar.gz -C /hadoop/softtar -zxvf jdk-8u161-linux-x64.gz -C /hadoop/soft...

1,建立目录,解压软件清单

mkdir -p /hadoop/{soft,nn,dn,tmp,zookeeper,jn}

tar xvf hadoop-2.8.3.tar.gz -C /hadoop/soft

tar xvf zookeeper-3.4.12.tar.gz -C /hadoop/soft

tar -zxvf jdk-8u161-linux-x64.gz -C /hadoop/soft

chown -R root:root /hadoop/soft

 

2,配置ssh,对等性,并测试免密码

ssh ceph1 date

ssh ceph2 date

ssh ceph3 date

 

3,配置zookeeper

[root@ceph1 ~]# cd /hadoop/soft/zookeeper-3.4.12/conf/

[root@ceph1 conf]# cp zoo_sample.cfg zoo.cf

dataDir=/hadoop/zookeeper

server.1=192.168.0.231:2888:3888

server.2=192.168.0.232:2888:3888

server.3=192.168.0.233:2888:3888

在每个节点配置不同ID。

节点1:

echo 1 >  /hadoop/zookeeper/myid

节点2:

echo 2 >  /hadoop/zookeeper/myid

节点3:

echo 3 >  /hadoop/zookeeper/myid

 

4,启动和查看zookeeper

/hadoop/soft/zookeeper-3.4.12/bin/zkServer.sh stop

/hadoop/soft/zookeeper-3.4.12/bin/zkServer.sh start

/hadoop/soft/zookeeper-3.4.12/bin/zkServer.sh status

 

 

5,修改hadoop配置文件

/hadoop/soft/hadoop-2.8.3/etc/hadoop

core-site,hdfs-site.xml,

mapred-site.xml,yarn-site,slave 

 

======core-site

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://hahadoop</value>

</property>

<property>

<name>Hadoop.tmp.dir</name>

<value>/hadoop/tmp/</value>

</property>

<property>

<name>ha.zookeeper.quorum</name>

<value>ceph1:2181,ceph2:2181,ceph3:2181</value>

</property>

</configuration>

========hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>dfs.permissions.superusergroup</name>

<value>hadoop</value>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop/data/dfs/name</value>

</property>

<property>

<name>dfs.namenode.edits.dir</name>

<value>${dfs.namenode.name.dir}</value>

<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/data/dfs/data</value>

<description>datanode存放block本地目录(需要修改)</description>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>dfs.blocksize</name>

<value>134217728</value>

</property>

<!--======================================================================= -->

<!--HDFS高可用配置 -->

<!--指定hdfs的nameservice为ruozeclusterg7,需要和core-site.xml中的保持一致 -->

<property>

<name>dfs.nameservices</name>

<value>ruozeclusterg7</value>

</property>

<property>

<!--设置NameNode IDs 此版本最大只支持两个NameNode -->

<name>dfs.ha.namenodes.ruozeclusterg7</name>

<value>nn1,nn2</value>

</property>

<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->

<property>

<name>dfs.namenode.rpc-address.ruozeclusterg7.nn1</name>

<value>ceph1:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.ruozeclusterg7.nn2</name>

<value>ceph2:8020</value>

</property>

<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->

<property>

<name>dfs.namenode.http-address.ruozeclusterg7.nn1</name>

<value>ceph1:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.ruozeclusterg7.nn2</name>

<value>ceph2:50070</value>

</property>

<property>

<name>dfs.journalnode.http-address</name>

<value>0.0.0.0:8480</value>

</property>

<property>

<name>dfs.journalnode.rpc-address</name>

<value>0.0.0.0:8485</value>

</property>

<property>

<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://ceph1:8485;ceph2:8485;ceph3:8485/ruozeclusterg7</value>

</property>

<property>

<!--JournalNode存放数据地址 -->

<name>dfs.journalnode.edits.dir</name>

<value>/home/hadoop/data/dfs/jn</value>

</property>

<property>

<name>dfs.client.failover.proxy.provider.ruozeclusterg7</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/root/.ssh/id_rsa</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.connect-timeout</name>

<value>30000</value>

</property>

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

</configuration>

==============mapred-site.xml

<configuration>

<property>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>ceph1:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>ceph1:19888</value>

</property>

</configuration>

 

==============yarn-site

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

<name>yarn.resourcemanager.connect.retry-interval.ms</name>

<value>30000</value>

</property>

<property>

<name>yarn.resourcemanager.ha.enabled</name>

<value>true</value>

</property>

<property>

<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<property>

<name>yarn.resourcemanager.cluster-id</name>

<value>yarn-rm-cluster</value>

</property>

<property>

<name>yarn.resourcemanager.ha.rm-ids</name>

<value>rm1,rm2</value>

</property>

<property>

<name>yarn.resourcemanager.hostname.rm1</name>

<value>ceph1</value>

</property>

<property>

<name>yarn.resourcemanager.hostname.rm2</name>

<value>ceph2</value>

</property>

<property>

<name>yarn.resourcemanager.recovery.enabled</name>

<value>true</value>

</property>

<property>

<name>yarn.resourcemanager.zk-address</name>

<value>ceph1:2181,ceph2:2181,ceph3:2181</value>

</property>

<property>

<name>yarn.resourcemanager.address.rm1</name>

<value>ceph1:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address.rm1</name>

<value>ceph1:8034</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address.rm1</name>

<value>ceph1:8088</value>

</property>

<property>

<name>yarn.resourcemanager.address.rm2</name>

<value>ceph2:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address.rm2</name>

<value>ceph2:8034</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address.rm2</name>

<value>ceph2:8088</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

</configuration>

 

 

==============slaves

ceph1

ceph2

ceph3

 

6,初始化环境:

#三台机器都要启动(journalnode:8480)

/hadoop/soft/hadoop-2.8.3/sbin/hadoop-daemon.sh start journalnode

#格式化 hadoop(主节点执行)

/hadoop/soft/hadoop-2.8.3/bin/hdfs namenode -format

#####成功提示! INFO common.Storage: Storage directory /home/hadoop/data/dfs/name has been successfully formatted.

/hadoop/soft/hadoop-2.8.3/bin/hdfs zkfc -formatZK

#####成功提示! INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/hahadoop in ZK.

 

 

7,启动首节点namenode(50070)

/hadoop/soft/hadoop-2.8.3/sbin/hadoop-daemon.sh start namenode

 

8,启动备节点 namenode(50070)

备节点同步首节点 namenode 的元数据

/hadoop/soft/hadoop-2.8.3/bin/hdfs namenode -bootstrapStandby

/hadoop/soft/hadoop-2.8.3/sbin/hadoop-daemon.sh start namenode

 

 

9,启动dfs:(hdfs-site||http://ip:50070/)

主节点执行:

/hadoop/soft/hadoop-2.8.3/sbin/start-dfs.sh 

#启动namenode,datanode,journalnode

#检查hdfs:

/hadoop/soft/hadoop-2.8.3/bin/hdfs dfs -put /etc/passwd /

/hadoop/soft/hadoop-2.8.3/bin/hdfs dfs -ls /

#主节点状态为active。

 

 

10,启动yarn

(yarn-site.xml|| ResourceManger(Active):http://ip:8088。ResourceManger(Standby):http://ip:8088/cluster/cluster)

#主节点:

/hadoop/soft/hadoop-2.8.3/sbin/start-yarn.sh 

#启动所有节点nodemanager,启动主节点resourcemanager

#备节点:

/hadoop/soft/hadoop-2.8.3/sbin/yarn-daemon.sh start resourcemanager

##启动备节点resourcemanager

###也可以执行该命令/hadoop/soft/hadoop-2.8.3/sbin/start-yarn.sh

 

11,在主节点启动JobHistoryServermapred-site.xml||19888)

/hadoop/soft/hadoop-2.8.3/sbin/mr-jobhistory-daemon.sh start historyserver

 

12,状态主备份检查:

/hadoop/soft/hadoop-2.8.3/bin/hdfs haadmin -getServiceState nn1

/hadoop/soft/hadoop-2.8.3/bin/hdfs haadmin -getServiceState nn2

/hadoop/soft/hadoop-2.8.3/bin/yarn rmadmin -getServiceState rm1

/hadoop/soft/hadoop-2.8.3/bin/yarn rmadmin -getServiceState rm2

 

 

13,启动和停止命令

======================

 

停止:

关闭Hadoop(YARN-->HDFS)

#备节点:(ResourceManager)

/hadoop/soft/hadoop-2.8.3/sbin/yarn-daemon.sh stop resourcemanager

#主节点:(ResourceManager,NodeManager)

/hadoop/soft/hadoop-2.8.3/sbin/stop-yarn.sh

/hadoop/soft/hadoop-2.8.3/sbin/stop-dfs.sh

 

启动:

#主节点:(namenode,datanode,journalnode, ZK Failover Controllers)(resourcemanager,nodemanager)

/hadoop/soft/hadoop-2.8.3/sbin/start-dfs.sh

/hadoop/soft/hadoop-2.8.3/sbin/start-yarn.sh

#备节点:(resourcemanager)

/hadoop/soft/hadoop-2.8.3/sbin/yarn-daemon.sh start resourcemanager

 

//start-dfs.sh,start-all.sh区别:

start-all.sh,stop-all.sh提示已经弃用了,建议使用start-dfs.sh and start-yarn.sh。

start-dfs.sh 只启动namenode 和datanode, start-all.sh还包括yarn的resourcemanager 和nodemanager。

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh

 

手动启动hdfs命令参考:

zkServer.sh start

hadoop-daemon.sh start journalnode

hadoop-daemon.sh start namenode

hadoop-daemon.sh start zkfc

//哪个节点先启动zkfc,哪个节点为active。

hadoop-daemon.sh start datanode

手动启动yarn命令参考:

yarn-daemon.sh start resourcemanager

yarn-daemon.sh start nodemanager

 

 

==============

 

【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。