免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 2694 | 回复: 0
打印 上一主题 下一主题

基于nagios的运营监控系统安装配置指南 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2010-01-15 10:21 |只看该作者 |倒序浏览


1 目标
基于nagios及cacti搭建网络及系统监控系统,在系统出现故障时候,第一时间以邮件及手机短信方式通知管理员,及时排查并解决系统故障。
2 相关软件版本说明
操作系统:RedHat AS 4.0
Nagios: nagios 2.10
Nagios Plugins:nagios-plugins-1.4.11
Nagios Core Addons:
NRPE:2.11
NSCA:2.7.2
Apache HTTPD: 2.0.52
安装操作系统时候,建议采用完整的安装方式,需要安装httpd、gd-devel、libpng-devel、libjpeg-devel几个包。
如果需要监控mysql,对mysql用的也是Redhat AS缺省安装
3 nagios安装配置指南
3.1 Nagios系统的结构图


3.2 安装Nagios
3.3 安装nagios的插件
3.3.1 建立需要的目录,并赋予权限
useradd nagios
mkdir /usr/local/nagios
mkdir /usr/local/nagios/libexec
chown -R nagios:nagios /usr/local/nagios
3.3.2 解压nagios的安装包:
tar xzvf nagios-2.10.tar.gz
3.3.3 编译安装nagios
cd nagios-2.10;
./configure --prefix=/usr/local/nagios --with-cgiurl=/nagios/cgi-bin \
--with-htmurl=/nagios --with-nagios-user=nagios --with-nagios-group=nagios;
make all;
make install;
make install-init;
make install-commandmode;
make install-config;
3.3.4 安装Nagios plugins
tar xzvf nagios-plugins-1.4.7.tar.gz
cd /nagios-plugins-1.4.7
./configure --prefix=/usr/local/nagios --with-cgiurl=nagios/cgi-bin \
--with-mysql=/usr --enable-ssl --enable-command-args
make;
make install;
注意:此处要对mysql进行监控,采用的mysql是安装时候AS自带的mysql,安装路径为/usr(包含lib/mysql)
Nagios的安装路径为/usr/local/nagios/libexec,安装成功后确认/usr/local/nagios/libexec下有相关的脚本(例如check_http),如果没有安装到此处,修改resource.cfg中的$USER1$
3.3.5 apache 配置
l 修改/etc/httpd/conf/httpd.conf,在文件末尾添加如下内容:
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin/"

Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user


Alias /nagios "/usr/local/nagios/share/"

Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user

l 创建nagios管理员nagiosadmin的用户认证密码文件
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
注意:这里的用户既是apache的nagios管理界面的登录认证用户也和nagios监控中的权限有关联。
l 重启apache使apache的配置生效.
apachectl restart
3.3.6 Nagios配置
在/usr/local/nagios的etc目录下的配置文件有:
Config file Description
cgi.cfg CGI脚本的相关设定,如用户认证等
commands.cfg commands定义文件
localhost.cfg localhost本机的一个配置范例
nagios.cfg nagios的主配置文件
resource.cfg 监控使用到的脚本文件

根据具体使用情况,将配置文件的结构做以下规划,为了方便将来的维护和管理:
配置文件结构如下:
etc/ |-- cgi.cfg
|-- commands.cfg
|-- nagios.cfg
|-- resource.cfg
(以上为nagios系统主配置文件)
etc/servers |-- contacts.cfg 管理人员和管理人员组的的默认初始化设定文件
|-- hosts.cfg 服务器的默认初始化设定文件
|-- services.cfg 监控服务的默认初始化设定文件
|-- timeperiod.cfg 时间周期默认初始化设定文件
以上为监控服务相关的配置文件,都是由原localhost.cfg文件中拆分出来的,这样方面理解和管理。
l 建立servers目录
mkdir /usr/local/nagios/etc/servers
chown –R nagios.nagios /usr/local/nagios/etc/servers
chown –R 755 /usr/local/nagios/etc/servers
l 设置 cgi.cfg :
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
以上设定nagiosadmin为nagios最高权限,有权查看所有hosts和services的状态.
l 设置nagios.cfg :
#cfg_file=/usr/local/nagios/etc/localhost.cfg
cfg_file=/usr/local/nagios/etc/servers/contacts.cfg
cfg_file=/usr/local/nagios/etc/servers/hosts.cfg
cfg_file=/usr/local/nagios/etc/servers/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servers/services.cfg
cfg_file=/usr/local/nagios/etc/servers/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
#cfg_dir=/usr/local/nagios/etc/servers
其他相关的参数参看nagios的手册。
需要特别指出的的地方是参数interval_length=60定义了nagios所有配置文件中与时间相关的参数所采用的缺省单位,缺省为60秒。例如interval_length=60,如果在hosts.cfg中定义notification_interval=1,则表示主机的提醒周期为1分钟。
l 设置resource.cfg
$USER1$=/usr/local/nagios/libexec
l /usr/local/nagios/etc/servers目录下各文件的内容简介(红色标注的是需要修改或要注意的地方):
timeperiod.cfg
###############################################################################
###############################################################################
#
# TIME PERIODS
#
###############################################################################
###############################################################################
# This defines a timeperiod where all times are valid for checks,
# notifications, etc. The classic "24x7" support nightmare. :-)
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
# 'workhours' timeperiod definition
define timeperiod{
timeperiod_name workhours
alias "Normal" Working Hours
monday 09:00-17:00
tuesday 09:00-17:00
wednesday 09:00-17:00
thursday 09:00-17:00
friday 09:00-17:00
}
# 'nonworkhours' timeperiod definition
define timeperiod{
timeperiod_name nonworkhours
alias Non-Work Hours
sunday 00:00-24:00
monday 00:00-09:00,17:00-24:00
tuesday 00:00-09:00,17:00-24:00
wednesday 00:00-09:00,17:00-24:00
thursday 09:00-17:00
friday 09:00-17:00
}
# 'none' timeperiod definition
define timeperiod{
timeperiod_name none
alias No Time Is A Good Time
}
定义各种监控的时间段,如24X7全天候的监控, none不工作时间以及workhours工作和nonworkhours非工作时间段等,在hosts和services的定义中可以引用.
contacts.cfg
###############################################################################
###############################################################################
#
# CONTACTS
#
###############################################################################
###############################################################################
# In this simple config file, a single contact will receive all alerts.
# This assumes that you have an account (or email alias) called
# "nagios-admin" on the local host.
define contact{
contact_name nagiosadmin
alias nagiosadmin
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,r
service_notification_commands notify-by-sms, notify-by-email
host_notification_commands notify-by-sms, host-notify-by-email
email
liangchuan@mobile-soft.cn

pager 13910823366
}
define contact{
contact_name nagiosadmin2
alias nagiosadmin2
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,r
service_notification_commands notify-by-sms, notify-by-email
host_notification_commands notify-by-sms, host-notify-by-email
email jincheng.xiao@mobile-soft.cn
[/url]

pager 13141202288
}
###############################################################################
###############################################################################
#
# CONTACT GROUPS
#
###############################################################################
###############################################################################
# We only have one contact in this simple configuration file, so there is
# no need to create more than one contact group.
define contactgroup{
contactgroup_name nagios
alias nagios
members nagiosadmin,nagiosadmin2
}
定义nagios的管理员成员(contact)和管理员组(contactgroup),以及管理员(contact)的联系方式mail或sms(pager传呼机标识手机号)。
hosts.cfg
###############################################################################
###############################################################################
#
# HOSTS
#
###############################################################################
###############################################################################
# Generic host definition template - This is NOT a real host, just a template!
define host{
name generic-host
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7
register 0
}
define host {
use generic-host
host_name 192.168.1.5
address 192.168.1.5
check_command check-host-alive
max_check_attempts 1
notification_interval 1
notification_period 24x7
notification_options d,u,r
contact_groups nagios
}
generic-host定义默认的hosts公共属性。192.168.1.5定义需要监控的服务器信息。
hostgroups.cfg
###############################################################################
###############################################################################
#
# HOST GROUPS
#
###############################################################################
###############################################################################
# We only have one host in our simple config file, so there is no need to
# create more than one hostgroup.
define hostgroup{
hostgroup_name mobilesoft
alias mobilesoft
members 192.168.1.5
}
定义hosts所属的分组,方便监控时的观察.hostgroup_name定义分组名称,alias为别名,members定义成员名称,内容为每台hosts配置文件中定义的host_name内容.
services.cfg
###############################################################################
###############################################################################
#
# SERVICES
#
###############################################################################
###############################################################################
# Generic service definition template - This is NOT a real service, just a template!
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
register 0
}
define service{
use generic-service
host_name 192.168.1.5
service_description mobilessoft esales platform
is_volatile 0
check_period 24x7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups nagios
notification_options w,u,c,r
notification_interval 1
notification_period 24x7
check_command check-mobilesoft!www.mobile-soft.cn
}
generic-service定义默认的services公共属性,在每个service定义中引用.
192.168.1.5定义对最终监控主机192.168.1.5的配置文件,包含每台被监控主机的定义和需要监控的服务.
l 修改/usr/local/nagios/etc/commands.cfg,增加对notify-by-sms及check-mobilesoft的定义
#notify-by-sms
define command {
command_name notify-by-sms
command_line $USER1$/notify_by_sms $HOSTADDRESS$ $CONTACTPAGER$
}
#check-mobilesoft
define command {
command_name check-mobilesoft
command_line $USER1$/check_mobilesoft $ARG1$
}
l 创建check_mobilesoft 的plugin脚本
在/usr/local/nagios/libexec下创建check_mobilesoft
touch check_mobilesoft
chown nagios.nagios check_mobilesoft
chmod 755 check_mobilesoft
其中check_mobilesoft的内容如下:
#!/bin/sh
#use the nagios check_http shell to check the monitored machine has some problem
PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
. $PROGPATH/utils.sh
hostip=$1
result=`/usr/local/nagios/libexec/check_http -H ${hostip} –u /test –s “success” -w 0.0009 -c 0.0040 -t 10 |grep "200 OK"`
if [ -z "${result}" ] ; then
exit $STATE_CRITICAL
else
exit $STATE_OK
fi
注意此处调用了libexec/utils.sh,此脚本文件定义了诸如返回码等变量及一些工具。
l 创建notify_by_sms 的plugin脚本
在/usr/local/nagios/libexec下创建notify_by_sms
touch notif_by_sms
chown nagios.nagios notify_by_sms
chmod 755 notify_by_sms
其中notify_by_sms的内容如下:
#!/bin/sh
#use the nagios check_http shell to check the monitored machine has some problem
#if having problem ,then use the sms gateway to notify the admin
hostip=$1
phonenumber=$2
result=`/usr/local/nagios/libexec/check_http -H ${hostip} -w 0.0009 -c 0.0040 -t 10 |grep "200 OK"`
if [ -z "${result}" ] ; then
/usr/bin/wget “http://my-sms-ip:8800/Send%20Text%20Message.htm?PhoneNumber=${phonenumber}&Text=${hostip}%20machine%20Have%20Problem”
fi
l 检查配置文件正确性
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
执行该命令检查所有配置文件是否正确,如有问题,继续修改配置文件。
l 配置开机时候自动启动nagios及apache服务
chkconfig --add nagios
chkconfig --add httpd
或者直接修改/etc/rc.local,增加如下内容
/sbin/service nagios start
/usr/sbin/ apachectl start


l 启动nagios服务及apache 服务
service nagios restart
apachectl restart
l 登录nagios管理界面,监控服务器情况
访问nagios的服务器web界面
[url=http://ip/nagios]http://ip/nagios

,输入nagiosadmin 的用户名及密码,开始进行监控。
3.3.7 客户端监控代理NRPE的安装配置:

4 cacti安装配置指南

5 参考文档
http://nagios.sourceforge.net/docs/2_0/toc.html

http://gentoo-wiki.com/HOWTO_Install_Nagios



本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u3/107531/showart_2149556.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP