- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

大数据之 kafka 入门

小米粒-biubiubiu 发表于 2020/12/03 00:50:21 2020/12/03

【摘要】一、zookeeper 下载安装 cdh 版本的 zookeeper 下载地址 http://archive.cloudera.com/cdh5/cdh/5/zookeeper-3.4.5-cdh5.7.0.tar.gz 解压 tar -zxvf zookeeper-3.4.5-cdh5.7.0.tar.gz 配置环境变量 export ZK_HOME=/home/h...

一、zookeeper 下载安装

cdh 版本的 zookeeper 下载地址

http://archive.cloudera.com/cdh5/cdh/5/zookeeper-3.4.5-cdh5.7.0.tar.gz

解压 tar -zxvf zookeeper-3.4.5-cdh5.7.0.tar.gz

配置环境变量


      export ZK_HOME=/home/hadoop/app/zookeeper-3.4.5-cdh5.7.0
      export PATH=${ZK_HOME}/bin:$PATH

修改配置文件

cd /zookeeper-3.4.5-cdh5.7.0/conf

cp zoo_sample.cfg zoo.cfg

vim zoo.cfg


      #修改数据存放目录，默认 目录是/tmp/zookeeper 临时文件夹，重启系统之后数据会被清除
      dataDir=/home/hadoop/app/tmp/zookeeper

启动 zookeeper start

zkServer.sh
JMX enabled by default
Using config: /home/hadoop/app/zookeeper-3.4.5-cdh5.7.0/bin/../conf/zoo.cfg
Usage: /home/hadoop/app/zookeeper-3.4.5-cdh5.7.0/bin/zkServer.sh {start|start-foreground|stop|restart|status|upgrade|print-cmd}
[root@hadoop000 /home/hadoop/app/zookeeper-3.4.5-cdh5.7.0/conf]#

执行 jps 查看存在 QuorumPeerMain 说明启动成功

二、kafka 下载安装配置单节点单 b'roker 启动

1、下载地址：http://kafka.apache.org/downloads

https://archive.apache.org/dist/kafka/0.9.0.0/kafka_2.11-0.9.0.0.tgz

解压 kafka tar -zxvf kafka_2.11-0.9.0.0.gz

配置kafka 环境变量

vim /etc/profile


      export KAFKA_HOME=/home/hadoop/app/kafka_2.11-0.9.0.0
      export PATH=${KAFKA_HOME}/bin:$PATH

2、修改配置文件

vim /kafka_2.11-0.9.0.0/config/server.properties


      ############################# Server Basics #############################
      # broker 唯一id
      # The id of the broker. This must be set to a unique integer for each broker.
      broker.id=0
      ############################# Socket Server Settings #############################
      listeners=PLAINTEXT://:9092
      # The port the socket server listens on
      #port=9092
      # Hostname the broker will bind to. If not set, the server will bind to all interfaces
      host.name=hadoop000
      ############################# Log Basics #############################
      #修改 log 存放目录
      # A comma seperated list of directories under which to store log files
      log.dirs=/home/hadoop/app/tmp/kafka-logs
      ############################# Zookeeper #############################
      # Zookeeper connection string (see zookeeper docs for details).
      # This is a comma separated host:port pairs, each corresponding to a zk
      # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
      # You can also append an optional chroot string to the urls to specify the
      # root directory for all kafka znodes.
      zookeeper.connect=hadoop000:2181
      # Timeout in ms for connecting to zookeeper
      zookeeper.connection.timeout.ms=6000

3、启动 kafka

#kafka-server-start.sh
USAGE: /home/hadoop/app/kafka_2.11-0.9.0.0/bin/kafka-server-start.sh [-daemon] server.properties [--override property=value]*

启动命令

kafka-server-start.sh $KAFKA_HOME/config/server.properties

jps -m

三、kafka 的基本操作

1.创建 topic

kafka-topics.sh  --create --zookeeper hadoop000:2181 --replication-factor 1 --partitions 1 --topic hello_topic

2.生产者发送消息到 topic

kafka-console-producer.sh --broker-list hadoop000:9092 --topic hello_topic

hello

hadoop

spark

kafka

flume

3.消费者接收消息

kafka-console-consumer.sh  --zookeeper hadoop000:2181 --topic hello_topic --from-beginning

--from-beginning 的使用：加上该参数，会接收之前所有的消息，不加该参数只接收最后面发送的消息。

4. 查看所有topic 详情

kafka-topics.sh --describe --zookeeper hadoop000:2181

Topic:hello_topic PartitionCount:1 ReplicationFactor:1 Configs:
Topic: hello_topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0

5.查看指定 topic 详情

kafka-topics.sh --describe --zookeeper hadoop000:2181 --topic  hello_topic

Topic:hello_topic PartitionCount:1 ReplicationFactor:1 Configs:
Topic: hello_topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0

四、单节点多broker 部署及使用

cp server.properties server-1.properties

cp server.properties server-2.properties

cp server.properties server-3.properties

分别修改 broker.id 监听端口号，log 目录配置

server-1.properties


      # The id of the broker. This must be set to a unique integer for each broker.
      broker.id=1
      ############################# Socket Server Settings #############################
      listeners=PLAINTEXT://:9093
      ############################# Log Basics #############################
      # A comma seperated list of directories under which to store log files
      log.dirs=/home/hadoop/app/tmp/kafka-logs-1

server-2.properties


      # The id of the broker. This must be set to a unique integer for each broker.
      broker.id=2
      ############################# Socket Server Settings #############################
      listeners=PLAINTEXT://:9094
      ############################# Log Basics #############################
      # A comma seperated list of directories under which to store log files
      log.dirs=/home/hadoop/app/tmp/kafka-logs-2

server-3.properties


      # The id of the broker. This must be set to a unique integer for each broker.
      broker.id=3
      ############################# Socket Server Settings #############################
      listeners=PLAINTEXT://:9095
      ############################# Log Basics #############################
      # A comma seperated list of directories under which to store log files
      log.dirs=/home/hadoop/app/tmp/kafka-logs-3

启动三个 kafka 节点;

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-1.properties

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-2.properties

kafka-server-start.sh -daemon $KAFKA_HOME/config/server-3.properties

创建topic

kafka-topics.sh  --create --zookeeper hadoop000:2181 --replication-factor 3 --partitions 1 --topic my_replication_topic

发送消息

kafka-console-producer.sh --broker-list hadoop000:9093,hadoop000:9094,hadoop000:9095 --topic my_replication_topic

接收消息

kafka-console-consumer.sh  --zookeeper hadoop000:2181 --topic my_replication_topic

kafka 的容错性还是能够保障的，其中一个broker 挂掉了之后，仍然可以接收到消息。

五、java api 操作 kafka 完成生产者生产消息，消费者消费消息


      public class KafkaProperties {
      public static  String  ZOOKEEPER = "192.168.42.85:2181";
      public static String  BROKER_LIST = "192.168.42.85:9092";
      public static String  TOPIC = "hello_topic";
      public   static  String  GROUP_ID = "test_group";
      }


      public class KafkaProductor implements Runnable {
      private String topic;
      private Producer<Integer, String> producer;
      public KafkaProductor(String topic) {
       this.topic = topic;
       Properties properties = new Properties();
       properties.put("metadata.broker.list", KafkaProperties.BROKER_LIST);
       properties.put("serializer.class", "kafka.serializer.StringEncoder");
       properties.put("request.required.acks", "1");
       ProducerConfig producerConfig = new ProducerConfig(properties);
       producer = new Producer<Integer, String>(producerConfig);
       }
      public String getTopic() {
      return topic;
       }
      public void setTopic(String topic) {
       this.topic = topic;
       }
      public static final int THREAD_COUNT = 100;
      public static final int ALLOW_COUNT = 20;
      public static final CountDownLatch countDownLatch = new CountDownLatch(THREAD_COUNT);
      public static final Semaphore semaphore = new Semaphore(ALLOW_COUNT);
      public static final ExecutorService EXECUTOR_SERVICE = Executors.newCachedThreadPool();
      public static void main(String[] args) throws InterruptedException {
      for (int i = 0; i < THREAD_COUNT; i++) {
       EXECUTOR_SERVICE.submit(new KafkaProductor(KafkaProperties.TOPIC));
       }
      new Thread(new KafkaCustomer(KafkaProperties.TOPIC)).start();
       }
       @Override
      public void run() {
      try {
       semaphore.acquire();
       } catch (InterruptedException e) {
       e.printStackTrace();
       }
      int no = 0;
      while (no <= 10) {
      String message = "msg" + no;
       producer.send(new KeyedMessage<Integer, String>(topic, message));
       no++;
      try {
       Thread.sleep(2000);
       } catch (InterruptedException e) {
       e.printStackTrace();
       }
       }
       semaphore.release();
       }
      }


      public class KafkaCustomer implements Runnable {
      private String topic;
      private ConsumerConnector consumerConnector;
      public KafkaCustomer(String topic) {
       this.topic = topic;
       Properties properties = new Properties();
       properties.put("group.id", KafkaProperties.GROUP_ID);
       properties.put("zookeeper.connect", KafkaProperties.ZOOKEEPER);
       ConsumerConfig consumerConfig = new ConsumerConfig(properties);
       consumerConnector = Consumer.createJavaConsumerConnector(consumerConfig);
       }
       @Override
      public void run() {
       Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
       topicCountMap.put(KafkaProperties.TOPIC, 1);
       Map<String, List<KafkaStream<byte[], byte[]>>> streamMap = consumerConnector.createMessageStreams(topicCountMap);
       KafkaStream<byte[], byte[]> messageAndMetadata = streamMap.get(KafkaProperties.TOPIC).get(0);
       ConsumerIterator<byte[], byte[]> iterator = messageAndMetadata.iterator();
      while (iterator.hasNext()) {
      String msg = new String(iterator.next().message());
       System.out.println("receive msg ~~~~:" + msg);
       }
       }
      }

六、 0.9版本的flume 采集日志输出到 kafka

新版本的flume 配置不同

exec-memory-avro.conf 编写


      exec-memory-avro.sources =  exec-source
      exec-memory-avro.sinks  =  avro-sink
      exec-memory-avro.channels  =  memory-channel
      ＃描述/配置源
      exec-memory-avro.sources.exec-source.type  =  exec
      exec-memory-avro.sources.exec-source.command = tail -F /home/hadoop000/hello.txt
      exec-memory-avro.sources.exec-source.shell = /bin/bash -c
      ＃描述接收器
      exec-memory-avro.sinks.avro-sink.type  =  avro
      exec-memory-avro.sinks.avro-sink.hostname  =  hadoop000
      exec-memory-avro.sinks.avro-sink.port  =  44444
      ＃使用缓冲内存中事件的通道
      exec-memory-avro.channels.memory-channel.type  =  memory
      exec-memory-avro.channels.memory-channel.capacity  =  1000
      exec-memory-avro.channels.memory-channel.transactionCapacity  =  100
      ＃将源和接收器绑定到通道
      exec-memory-avro.sources.exec-source.channels  =  memory-channel
      exec-memory-avro.sinks.avro-sink.channel  =  memory-channel

avro-memory-kafka.conf 编写


      avro-memory-kafka.sources =  avro-source
      avro-memory-kafka.sinks  =  kafka-sink
      avro-memory-kafka.channels  =  memory-channel
      ＃描述/配置源
      avro-memory-kafka.sources.avro-source.type = avro
      avro-memory-kafka.sources.avro-source.bind= hadoop000
      avro-memory-kafka.sources.avro-source.port = 44444
      ＃描述接收器
      avro-memory-kafka.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink
      avro-memory-kafka.sinks.kafka-sink.brokerList = hadoop000:9093
      avro-memory-kafka.sinks.kafka-sink.topic = hello_topic
      avro-memory-kafka.sinks.kafka-sink.batchSize = 5
      avro-memory-kafka.sinks.kafka-sink.requireAcks = 1
      ＃使用缓冲内存中事件的通道
      avro-memory-kafka.channels.memory-channel.type = memory
      avro-memory-kafka.channels.memory-channel.capacity  =  1000
      avro-memory-kafka.channels.memory-channel.transactionCapacity  =  100
      ＃将源和接收器绑定到通道
      avro-memory-kafka.sources.avro-source.channels  =  memory-channel
      avro-memory-kafka.sinks.kafka-sink.channel  =  memory-channel

启动 kafka

kafka-server-start.sh $KAFKA_HOME/config/server-1.properties

启动两个 flume

flume-ng agent --name avro-memory-kafka --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/avro-memory-kafka.conf -Dflume.root.logger=INFO,console

flume-ng agent --name exec-memory-avro --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec-memory-avro.conf -Dflume.root.logger=INFO,console

echo hello world >> /home/hadoop000/hello.txt

文章来源: blog.csdn.net，作者：血煞风雨城2018，版权归原作者所有，如需转载，请联系作者。

原文链接：blog.csdn.net/qq_31905135/article/details/85260702

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

开发者空间

了解空间

工作台

开发平台

实战案例

空间活动

空间论坛

开发平台

软件开发生产线 CodeArts

AI平台ModelArts

数据治理中心 DataArts Studio

数字内容生产线 MetaStudio

精选服务

云数据库 GaussDB

云数据库 RDS for MySQL

MapReduce服务 MRS

数据仓库服务 DWS

分布式缓存服务Redis版

分布式消息服务 DMS

华为云实时音视频 SparkRTC

媒体处理 MPC

主机迁移服务 SMS

对象存储迁移服务 OMS

云消息服务 KooMessage

云手机服务 KooPhone

企业搜索服务 KooSearch

云地图服务 KooMap

更多开放能力

开发工具

API生态

CodeArts API

API Explorer

Terraform Explorer

KooCLI

API 中心

SDK 中心

开发服务

Codelabs

DevStar

低代码平台Astro

CodeArts IDE

Huawei Cloud Toolkit

Classroom

开发资源

开源镜像站

开源资源

开发语言

开发实践

入门精选

分发资源

企业应用中心

企业通用专区

教育专区

华为应用专区

政务云专区

硬件云服务商城

医疗健康专区

跳蚤市场

华为云开发者日

直播专区

开发者精品活动

DTSE Tech Talk

加入HCDE

加入HCDG

加入HCSD

加入HCWD

鲁班会

沃土云创计划·企业

沃土云创计划·高校

沃土云创计划·个人

沃土云创计划·开源共创

博客

论坛

专题

开发者榜单

学习路径

在线课程

动手实验

考试认证

培训服务