Prometheus系列--blackbox_exporter的安装与使用
一、简述
prometheus主要是通过一些exporter 进行监控信息的采集,比如:
node_exporter采集主机信息;
jmx_exporter采集java程序运行信息;
mysqld_exporter采集mysql相关信息;
redis_exporter采集redis相关信息;
blackbox_exporter采集http、dns、tcp、icmp、post、ssl等相关信息;
snmp_exporter采集一些网络设备的信息
pushgateway,可以实现跨网络的信息采集
其中node_exporter的Collectors和pushgateway,可以实现自定义监控指标。
二、blackbox_exporter安装
主要是blackbox_exporter的github地址的记录,下载,安装,
# 下载安装
cd /usr/local/src/
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-amd64.tar.gz
tar xzf blackbox_exporter-0.18.0.linux-amd64.tar.gz /usr/local/prometheus/
ln -s /usr/local/prometheus/blackbox_exporter-0.18.0.linux-amd64 blackbox_exporter
# 修改blackbox_exporter的配置文件
]# vim /usr/local/prometheus/blackbox_exporter/blackbox_exporter.yaml
modules:
http_2xx:
prober: http
timeout: 5s
http_post_2xx:
prober: http
timeout: 5s
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
timeout: 5s
# 使用systemctl管理程序启停
]# cat >> /usr/lib/systemd/system/blackbox_exporter.service <<"EOF"
[Unit]
Description=Prometheus blackbox_exporter
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/prometheus/blackbox_exporter/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox_exporter.yaml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
EOF
# 使用supervisor启停
cat >> /etc/supervisor.d/blackbox_exporter.ini <<"EOF"
[program:blackbox_exporter] #
command=/usr/local/prometheus/blackbox_exporter/blackbox_exporter --config.file=/usr/local/prometheus/blackbox_exporter/blackbox_exporter.yml ; the program (relative uses PATH, can take args)
numprocs=1 ; number of processes copies to start (def 1)
directory=/usr/local/prometheus/blackbox_exporter/ ; directory to cwd to before exec (def no cwd)
autostart=true ; start at supervisord start (default: true)
autorestart=true ; retstart at unexpected quit (default: true)
startsecs=30 ; number of secs prog must stay running (def. 1)
startretries=3 ; max # of serial start failures (default 3)
exitcodes=0,2 ; 'expected' exit codes for process (default 0,2)
stopsignal=QUIT ; signal used to kill process (default TERM)
stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10)
user=root ; setuid to this UNIX account to run the program
redirect_stderr=true ; redirect proc stderr to stdout (default false)
stdout_logfile=/usr/local/prometheus/blackbox_exporter/blackbox_exporter.stdout.log ; stderr log path, NONE for none; default AUTO
stdout_logfile_maxbytes=64MB ; max # logfile bytes b4 rotation (default 50MB)
stdout_logfile_backups=4 ; # of stdout logfile backups (default 10)
stdout_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0)
stdout_events_enabled=false ; emit events on stdout writes (default false)
stopasgroup=true
killasgroup=true
EOF
#启动
# systemd 方式启动
systemctl daemon-reload
systemctl enable blackbox_exporter
systemctl start blackbox_exporter
systemctl status blackbox_exporter
# supervisor方式启动
supervisorctl update
supervisorctl status
supervisorctl start blackbox_exporter
supervisorctl restart blackbox_exporter
#检查是否启动成功
ss -untlp |grep 9115
ps -ef |grep blackbox_exporter
如果启动不成功,使用systemd的,使用journal -xe 检查启动报错;使用supervisor,去日志文件检查启动报错。
三、配置prometheus-server的targets和roles
targets编写,因为看过前面的内容的都知道我的targets 使用filesd/xxx/*yml的形式的,yaml文件如下,监控目标主机9100端口:
]# vim /usr/local/prometheus/prometheus-server/prometheus.yml
- labels:
InstanceId: i-xxxxxxxxxxxf
Name: xxxxxxx
PrivateIpAddress: 10.xx.xx.xx
State: running
category: ops
drtype: ''
env: prod
group: ''
lifecycle: long
module: xxxxxxx
node_name: ansible
project: common
provider: cloud
resource: xxx
software: ansible
targets:
# 监控9100 端口
- 10.xx.xx.xx:9100
roles告警规则:
groups:
- name: NodeStatsAlert
# 告警级别定义 0 信息,1 警告,2 一般严重,3 严重,4 灾难
rules:
# node_exporter状态
- alert: NodeExporterDown
expr: up{job="node_status"} != 1
for: 5m
labels:
severity: warning
level: 2
annotations:
summary: "端口9100探测失败"
description: "服务器{{ $labels.node_name }}端口9100探测失败,请尽快检查node_exporter是否出现异常!"
这里需要注意下,因为alertmanager中会根据groups/name来确定发送告警信息,分组嘛,所以需要注意下这个groups的name的问题,如果alertmanager中没有配置此名称的告警,告警消息就不会发送到相应的位置。
重新载入prometheus-server的配置文件
# 以下二选一
systemctl restart prometheus
curl -X POST http://localhost:9090/-/reload
注意:
还有更多的告警模板,比如icmp,检测端口,检测ssl这些。贴一个ssl的吧,其他的网上搜索吧。
# SSL证书即将过期
- alert: SSLCertExpiringSoon
expr: probe_ssl_earliest_cert_expiry{job="web_status"} - time() < 86400 * 30
for: 10m
labels:
severity: warning
level: 2
annotations:
summary: "SSL证书即将过期"
description: "服务{{ $labels.instance }}SSL证书将在30天内过期,请尽快检查更新!"
# SSL证书即将过期
- alert: SSLCertExpiringSoon
expr: probe_ssl_earliest_cert_expiry{job="web_status"} - time() < 86400 * 15
for: 10m
labels:
severity: critical
level: 3
annotations:
summary: "SSL证书即将过期"
description: "服务{{ $labels.instance }}SSL证书将在15天内过期,请尽快检查更新!"
- 点赞
- 收藏
- 关注作者
评论(0)