Prometheus系列--blackbox_exporter的安装与使用

举报
郁唯xiaolin 发表于 2021/05/22 08:51:53 2021/05/22
【摘要】 blackbox_exporter的安装与使用

一、简述

prometheus主要是通过一些exporter 进行监控信息的采集,比如:

node_exporter采集主机信息;

jmx_exporter采集java程序运行信息;

mysqld_exporter采集mysql相关信息;

redis_exporter采集redis相关信息;

blackbox_exporter采集http、dns、tcp、icmp、post、ssl等相关信息;

snmp_exporter采集一些网络设备的信息

pushgateway,可以实现跨网络的信息采集

其中node_exporter的Collectors和pushgateway,可以实现自定义监控指标。


二、blackbox_exporter安装

主要是blackbox_exporter的github地址的记录,下载,安装,

# 下载安装
cd /usr/local/src/
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-amd64.tar.gz
tar xzf blackbox_exporter-0.18.0.linux-amd64.tar.gz /usr/local/prometheus/
ln  -s /usr/local/prometheus/blackbox_exporter-0.18.0.linux-amd64 blackbox_exporter
​
#  修改blackbox_exporter的配置文件
]# vim  /usr/local/prometheus/blackbox_exporter/blackbox_exporter.yaml
modules:
  http_2xx:
    prober: http
    timeout: 5s
  http_post_2xx:
    prober: http
    timeout: 5s
    http:
      method: POST
  tcp_connect:
    prober: tcp
  pop3s_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^+OK"
      tls: true
      tls_config:
        insecure_skip_verify: false
  ssh_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^SSH-2.0-"
  irc_banner:
    prober: tcp
    tcp:
      query_response:
      - send: "NICK prober"
      - send: "USER prober prober prober :prober"
      - expect: "PING :([^ ]+)"
        send: "PONG ${1}"
      - expect: "^:[^ ]+ 001"
  icmp:
    prober: icmp
    timeout: 5s
​
# 使用systemctl管理程序启停
]# cat >> /usr/lib/systemd/system/blackbox_exporter.service <<"EOF"
[Unit]
Description=Prometheus blackbox_exporter
Requires=network.target remote-fs.target
After=network.target remote-fs.target
​
[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/prometheus/blackbox_exporter/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox_exporter.yaml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=5s
​
[Install]
WantedBy=multi-user.target
EOF
​
# 使用supervisor启停
cat >> /etc/supervisor.d/blackbox_exporter.ini <<"EOF"
[program:blackbox_exporter]                                   #
command=/usr/local/prometheus/blackbox_exporter/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox_exporter.yml ; the program (relative uses PATH, can take args)
numprocs=1                                                      ; number of processes copies to start (def 1)
directory=/usr/local/prometheus/blackbox_exporter/                      ; directory to cwd to before exec (def no cwd)
autostart=true                                                  ; start at supervisord start (default: true)
autorestart=true                                                ; retstart at unexpected quit (default: true)
startsecs=30                                                    ; number of secs prog must stay running (def. 1)
startretries=3                                                  ; max # of serial start failures (default 3)
exitcodes=0,2                                                   ; 'expected' exit codes for process (default 0,2)
stopsignal=QUIT                                                 ; signal used to kill process (default TERM)
stopwaitsecs=10                                                 ; max num secs to wait b4 SIGKILL (default 10)
user=root                                                       ; setuid to this UNIX account to run the program
redirect_stderr=true                                            ; redirect proc stderr to stdout (default false)
stdout_logfile=/usr/local/prometheus/blackbox_exporter/blackbox_exporter.stdout.log        ; stderr log path, NONE for none; default AUTO
stdout_logfile_maxbytes=64MB                                    ; max # logfile bytes b4 rotation (default 50MB)
stdout_logfile_backups=4                                        ; # of stdout logfile backups (default 10)
stdout_capture_maxbytes=1MB                                     ; number of bytes in 'capturemode' (default 0)
stdout_events_enabled=false                                     ; emit events on stdout writes (default false)
​
stopasgroup=true
killasgroup=true
EOF
​
​
#启动
# systemd 方式启动
systemctl daemon-reload 
systemctl enable blackbox_exporter
systemctl start blackbox_exporter
systemctl status blackbox_exporter
# supervisor方式启动
supervisorctl update
supervisorctl status
supervisorctl start blackbox_exporter
supervisorctl restart blackbox_exporter
​
#检查是否启动成功
ss -untlp |grep 9115
ps -ef |grep blackbox_exporter
​
如果启动不成功,使用systemd的,使用journal -xe 检查启动报错;使用supervisor,去日志文件检查启动报错。
​



三、配置prometheus-server的targets和roles

targets编写,因为看过前面的内容的都知道我的targets 使用filesd/xxx/*yml的形式的,yaml文件如下,监控目标主机9100端口:

]# vim /usr/local/prometheus/prometheus-server/prometheus.yml
- labels:
    InstanceId: i-xxxxxxxxxxxf
    Name: xxxxxxx
    PrivateIpAddress: 10.xx.xx.xx
    State: running
    category: ops
    drtype: ''
    env: prod
    group: ''
    lifecycle: long
    module: xxxxxxx
    node_name: ansible
    project: common
    provider: cloud
    resource: xxx
    software: ansible
  targets:
  # 监控9100 端口
  - 10.xx.xx.xx:9100
​


roles告警规则:

groups:
- name: NodeStatsAlert
  # 告警级别定义 0 信息,1 警告,2 一般严重,3 严重,4 灾难
  rules:
​
  # node_exporter状态
  - alert: NodeExporterDown
    expr: up{job="node_status"} != 1
    for: 5m
    labels:
      severity: warning
      level: 2
    annotations:
      summary: "端口9100探测失败"
      description: "服务器{{ $labels.node_name }}端口9100探测失败,请尽快检查node_exporter是否出现异常!"
​




这里需要注意下,因为alertmanager中会根据groups/name来确定发送告警信息,分组嘛,所以需要注意下这个groups的name的问题,如果alertmanager中没有配置此名称的告警,告警消息就不会发送到相应的位置。


重新载入prometheus-server的配置文件

# 以下二选一
systemctl restart prometheus
​
curl -X POST http://localhost:9090/-/reload


注意:

还有更多的告警模板,比如icmp,检测端口,检测ssl这些。贴一个ssl的吧,其他的网上搜索吧。

    # SSL证书即将过期
  - alert: SSLCertExpiringSoon
    expr: probe_ssl_earliest_cert_expiry{job="web_status"} - time() < 86400 * 30
    for: 10m
    labels:
      severity: warning
      level: 2
    annotations:
      summary: "SSL证书即将过期"
      description: "服务{{ $labels.instance }}SSL证书将在30天内过期,请尽快检查更新!"
​
    # SSL证书即将过期
  - alert: SSLCertExpiringSoon
    expr: probe_ssl_earliest_cert_expiry{job="web_status"} - time() < 86400 * 15
    for: 10m
    labels:
      severity: critical
      level: 3
    annotations:
      summary: "SSL证书即将过期"
      description: "服务{{ $labels.instance }}SSL证书将在15天内过期,请尽快检查更新!"
【版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息, 否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。