MRS:Spark对接开源ElasticSearch(从安装到代码实现)
MRS:Spark对接开源ElasticSearch
关键词: Kerberos认证 spark elasticsearch
摘要:实现Kerberos认证集群与开源elasticsearch对接
前期准备:
1. 创建MRS 2.0.0 分析集群,大数据组件至少包括Hadoop , Spark , Hive,开启Kerberos认证
2. 登入集群master节点,查看/etc/hosts文件,确认各个节点主机名
ElasticSearch6.2.3的安装:
1. 从https://repo.huaweicloud.com/elasticsearch/6.2.3/下载安装包. 以root用户上传安装包至master节点的/opt目录下,解压文件,修改配置
cd /opt
tar -zxvf elasticsearch-6.2.3.tar.gz
vim elasticsearch-6.2.3/config/elasticsearch.yml(严格遵守yml文件的格式),添加以下内容:
cluster.name: itismyes
cluster.routing.allocation.disk.threshold_enabled: false
node.name: node-1
network.host: 0.0.0.0
http.port: 9200
transport.tcp.port: 9300
discovery.zen.ping.unicast.hosts: ["node-ana-coremwut0001", " node-ana-coremwut0002"," node-ana-coremwut0003"]
bootstrap.system_call_filter: false
path.data: /srv/BigData/hadoop/data1/es
path.logs: /var/log/Bigdata/es
2. 分发elasticsearch-6.2.3文件夹至其他分析节点
scp -r elasticsearch-6.2.3 node-ana-coremwut0001://opt
scp -r elasticsearch-6.2.3 node-ana-coremwut0002://opt
scp -r elasticsearch-6.2.3 node-ana-coremwut0003://opt
3. 分别登入node-ana-coremwut0001,node-ana-coremwut0002和node-ana-coremwut0003节点,执行以下操作
vim /opt/elasticsearch-6.2.3/config/elasticsearch.yml
将node.name: 的值根据节点不同分别修改为node-1,node-2和node-3
chown -R omm:wheel /opt/elasticsearch-6.2.3/
su omm
mkdir /var/log/Bigdata/es
mkdir /srv/BigData/hadoop/data1/es
cd /opt/elasticsearch-6.2.3/bin
./elasticsearch -d
4. 可以在分析节点使用一下命令查看集群状态
curl localhost:9200/_cat/master?v
curl localhost:9200/_cat/nodes?v
curl localhost:9200/_cat/health?v
开发程序:
1. 参照官网https://support.huaweicloud.com/devg-mrs/mrs_06_0154.html准备开发用户,然后下载keytab与krb5.conf文件待用
2. 样例下载地址https://github.com/huaweicloud/huaweicloud-mrs-example/tree/mrs-2.0
3. 将huaweicloud-mrs-example-mrs-2.0\src\spark-examples\SparkJavaExample样例导入idea
4. 向工程的pom文件添加ES的依赖
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>6.2.3</version>
</dependency>
5. 将FemaleInfoCollection类下面main方法的代码全部注释掉,编写代码如下.将工程打的jar包,elasticsearch-hadoop-6.2.3.jar,user.keytab和krb.conf文件上传到/opt/jars目录下
SparkConf conf = new SparkConf().setAppName("CollectFemaleInfo");
conf.set("es.index.auto.create", "true");
conf.set("spark.es.nodes","node-ana-coremwut0001,node-ana-coremwut0002,node-ana-coremwut0003");
conf.set("spark.es.port","9200");
JavaSparkContext jsc = new JavaSparkContext(conf);
Map<String, ?> numbers = ImmutableMap.of("what", 1, "how", 2);
Map<String, ?> airports = ImmutableMap.of("dang", "dang", "Sduang", "duang");
JavaRDD<Map<String, ?>> javaRDD = jsc.parallelize(ImmutableList.of(numbers, airports));
JavaEsSpark.saveToEs(javaRDD, "spark/docs");
jsc.stop();
6. 在master节点下执行一下命令
spark-submit --conf spark.es.resource=index/type --jars /root/jars/elasticsearch-hadoop-6.2.3.jar --keytab /root/jars/krb5.conf --principal wwwww --class com.huawei.bigdata.spark.examples.FemaleInfoCollection --master yarn --deploy-mode cluster /root/jars/FemaleInfoCollection-mrs-2.0.jar
7. 查看结果
curl -XGET 'http://node-ana-coremwut0003:9200/spark/_search?pretty'
注:其他代码需求可参考https://www.elastic.co/guide/en/elasticsearch/hadoop/6.2/spark.html
- 点赞
- 收藏
- 关注作者
评论(0)