使用Monit替代supversor管理及监控服务小结

本来准备使用satl推送supervisor来着,研究了一下monit官方文档并试用了一下,这几天使用下来与supervisor对比感受.

Monit - utility for monitoring services on a Unix system

  1. 管理监控:
  • 管理:process,programs(任何脚本),files,directories,filesystems.
  • 监控处理:auto restart not run, auto restart not respond, stop when use too much resource, monitor change(timestamps, checksum, size, 内容), 网络检测(tcp/ip, socket, protocol(任何协议), ssl), system resources(cpu,mem,load,io,空间…)
  • 服务依赖: 依赖检测,顺序启动服务
  1. 日志及监控:
  • 支持记录日志到syslog或自己的日志文件
  • error判断规则自定义发送alert message
  1. web界面
  2. 语法

SYNOPSIS
monit [options]

两者的ctl脚本操作比较相似, monit不仅仅是个服务管理系统,资源监控也是相当丰富,比supervisor强大太多.

区别如下,如有错误,请指正:

monit supervisor
非常小,安装简单 只支持python2.4-2.7之间的版本,pip安装稍显复杂
c编写,无其它依赖 python编写,依赖其它第三方库
配置语法灵活,功能强大 功能一般,只能处理了进程
支持监控告警 不支持告警
非侵入 侵入式,需要supervisor启动,而且不支持daemoned
支持服务依赖 依赖支持不友好

monit 使用帮助

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[root@web53 monit.d]# monit --help 
Usage: monit [options] {arguments}
Options are as follows:
-c file Use this control file
-d n Run as a daemon once per n seconds
-g name Set group name for start, stop, restart, monitor, unmonitor, status and summary
-l logfile Print log information to this file
-p pidfile Use this lock file in daemon mode
-s statefile Set the file monit should write state information to
-I Do not run in background (needed for run from init)
--id Print Monit's unique ID
--resetid Reset Monit's unique ID. Use with caution
-t Run syntax check for the control file
-v Verbose mode, work noisy (diagnostic output)
-vv Very verbose mode, same as -v plus log stacktrace on error
-H [filename] Print SHA1 and MD5 hashes of the file or of stdin if the
filename is omited; monit will exit afterwards
-V Print version number and patchlevel
-h Print this text
Optional action arguments for non-daemon mode are as follows:
start all - Start all services
start name - Only start the named service
stop all - Stop all services
stop name - Only stop the named service
restart all - Stop and start all services
restart name - Only restart the named service
monitor all - Enable monitoring of all services
monitor name - Only enable monitoring of the named service
unmonitor all - Disable monitoring of all services
unmonitor name - Only disable monitoring of the named service
reload - Reinitialize monit
status [name] - Print full status information for service(s)
summary [name] - Print short status information for service(s)
quit - Kill monit daemon process
validate - Check all services and start if not running
procmatch <pattern> - Test process matching pattern

(Action arguments operate on services defined in the control file)

当前系统进程

1
2
3
4
[graylog2@web53 bin]$ ps ux fww
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
graylog2 32286 0.0 0.0 108336 1908 pts/1 S 16:28 0:00 -bash
graylog2 2130 1.0 0.0 110200 1044 pts/1 R+ 21:53 0:00 \_ ps ux fww

配置monit的服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[root@web53 monit.d]# cat *
check process elasticsearch with pidfile /data/graylog2/elasticsearch-5.6.1/bin/elasticsearch.pid
group elasticsearch
start program = "/bin/bash -c 'source /data/graylog2/.bash_profile ; cd /data/graylog2/elasticsearch-5.6.1/bin/; ./elasticsearch -d --pidfile /data/graylog2/elasticsearch-5.6.1/bin/elasticsearch.pid'"
as UID graylog2 and gid graylog2
with timeout 60 seconds
stop program = "/bin/bash -c 'kill -9 `cat /data/graylog2/elasticsearch-5.6.1/bin/elasticsearch.pid`'"
as UID graylog2 and gid graylog2
if failed
host 127.0.0.1 port 9200
then restart
check process graylog2 with pidfile /tmp/graylog.pid
group graylog2
start program = "/bin/bash -c 'source /data/graylog2/.bash_profile ; /data/graylog2/graylog-2.3.1/bin/graylogctl start -f /data/graylog2/graylog-2.3.1/graylog.conf -d'" as UID graylog2 with timeout 60 seconds
stop program = "/bin/bash -c 'source /data/graylog2/.bash_profile ; /data/graylog2/graylog-2.3.1/bin/graylogctl stop'" as UID graylog2
if failed
host 192.168.1.53 port 12345
then restart

check process mongodb with pidfile /data/graylog2/mongodb-linux-x86_64-3.4.9/bin/mongod.pid
group mongodb
start program = "/bin/bash -c 'source /data/graylog2/.bash_profile ; /data/graylog2/mongodb-linux-x86_64-3.4.9/bin/mongod --dbpath /data/graylog2/mongodb-linux-x86_64-3.4.9/db/ --pidfilepath /data/graylog2/mongodb-linux-x86_64-3.4.9/bin/mongod.pid --logpath /data/graylog2/mongodb-linux-x86_64-3.4.9/bin/mongodb.log'"
as UID graylog2 and gid graylog2
with timeout 60 seconds
stop program = "/bin/bash -c 'source /data/graylog2/.bash_profile ; /data/graylog2/mongodb-linux-x86_64-3.4.9/bin/mongod --shutdown --dbpath /data/graylog2/mongodb-linux-x86_64-3.4.9/db/'"
as UID graylog2 and gid graylog2
if failed
host 192.168.1.53 port 27017
then restart

启动monit管理服务

1
2
3
4
5
[graylog2@web53 bin]$ ps ux 
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
graylog2 6109 120 1.6 15419244 1114420 ? Sl 22:00 2:17 /data/graylog2/jdk1.8.0_144/bin/java -Djava.library.path=/data/graylog2/
graylog2 7246 2.0 0.0 110236 1144 pts/1 R+ 22:01 0:00 ps ux
graylog2 32286 0.0 0.0 108336 1908 pts/1 S 16:28 0:00 -bash

从日志上可以看出启动过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[CST Sep 27 22:00:00] info     : Reinitializing monit daemon
[CST Sep 27 22:00:00] info : Awakened by the SIGHUP signal
Reinitializing Monit - Control file '/etc/monit.conf'
[CST Sep 27 22:00:00] info : Shutting down Monit HTTP server
[CST Sep 27 22:00:01] info : Monit HTTP server stopped
[CST Sep 27 22:00:01] error : Cannot translate 'web53' to FQDN name -- Name or service not known
[CST Sep 27 22:00:01] info : Starting Monit HTTP server at [0.0.0.0]:2812
[CST Sep 27 22:00:01] info : Monit HTTP server started
[CST Sep 27 22:00:01] info : 'web53' Monit reloaded
[CST Sep 27 22:00:01] error : 'graylog2' process is not running
[CST Sep 27 22:00:01] info : 'graylog2' trying to restart
[CST Sep 27 22:00:01] info : 'graylog2' start: /bin/bash
[CST Sep 27 22:01:05] info : 'graylog2' process is running with pid 6109
[CST Sep 27 22:02:05] error : 'graylog2' process is not running
[CST Sep 27 22:02:05] info : 'graylog2' trying to restart
[CST Sep 27 22:02:05] info : 'graylog2' start: /bin/bash
[CST Sep 28 14:55:21] error : 'elasticsearch' process is not running
[CST Sep 28 14:55:21] info : 'elasticsearch' trying to restart
[CST Sep 28 14:55:21] info : 'elasticsearch' start: /bin/bash
[CST Sep 28 14:56:24] info : 'elasticsearch' process is running with pid 4059
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
[root@web53 ~]# monit status
Cannot translate 'web53' to FQDN name -- Name or service not known
The Monit daemon 5.14 uptime: 17h 42m

Process 'mongodb'
status Running
monitoring status Monitored
pid 9674
parent pid 1
uid 507
effective uid 507
gid 0
uptime 3h 6m
children 0
memory 46.1 MB
memory total 46.1 MB
memory percent 0.0%
memory percent total 0.0%
cpu percent 0.0%
cpu percent total 0.0%
port response time 0.000s to [192.168.1.53]:27017 type TCP/IP protocol DEFAULT
data collected Thu, 28 Sep 2017 15:12:38

Process 'graylog2'
status Running
monitoring status Monitored
pid 31630
parent pid 1
uid 507
effective uid 507
gid 0
uptime 24m
children 0
memory 1.1 GB
memory total 1.1 GB
memory percent 1.7%
memory percent total 1.7%
cpu percent 0.0%
cpu percent total 0.0%
port response time 0.000s to [192.168.1.53]:12345 type TCP/IP protocol DEFAULT
data collected Thu, 28 Sep 2017 15:12:38

Process 'elasticsearch'
status Running
monitoring status Monitored
pid 4059
parent pid 1
uid 507
effective uid 507
gid 507
uptime 17m
children 0
memory 2.8 GB
memory total 2.8 GB
memory percent 4.5%
memory percent total 4.5%
cpu percent 0.0%
cpu percent total 0.0%
port response time 0.000s to [127.0.0.1]:9200 type TCP/IP protocol DEFAULT
data collected Thu, 28 Sep 2017 15:12:38

System 'web53'
status Running
monitoring status Monitored
load average [1.35] [1.32] [1.39]
cpu 5.2%us 1.0%sy 0.8%wa
memory usage 17.0 GB [27.0%]
swap usage 0 B [0.0%]
data collected Thu, 28 Sep 2017 15:12:38

manager page
process page

坚持原创技术分享,您的支持将鼓励我继续创作!