免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 755 | 回复: 0
打印 上一主题 下一主题

heartbeat热备服务集群架设笔记 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2007-07-02 11:31 |只看该作者 |倒序浏览

第一次接触集群,用了快三天的时间了还没真正实现点什么,在CU上也相对比较少这方面的资料,但为了你我方便还是记下点东西来。下面是我从网上收集的一些相关资料。
首先说下集群的概念吧.
可以把Linux集群分为三类。一类是高可用性集群,运行于两个或多个节点上,目的是在系统出现某些故障的情况下,仍能继续对外提供服务。高可用性集群的设计思想就是要最大限度地减少服务中断时间。这类集群中比较著名的有Turbolinux TurboHA、Heartbeat、Kimberlite等。第二类是负载均衡集群,目的是提供和节点个数成正比的负载能力,这种集群很适合提供大访问量的Web服务。负载均衡集群往往也具有一定的高可用性特点。Turbolinux Cluster Server、Linux Virtual Server都属于负载均衡集群。另一类是超级计算集群,按照计算关联程度的不同,又可以分为两种。一种是任务片方式,要把计算任务分成任务片,再把任务片分配给各节点,在各节点上分别计算后再把结果汇总,生成最终计算结果。另一种是并行计算方式,节点之间在计算过程中大量地交换数据,可以进行具有强耦合关系的计算。这两种超级计算集群分别适用于不同类型的数据处理工作。有了超级计算集群软件,企业利用若干台PC机就可以完成通常只有超级计算机才能完成的计算任务。这类软件有Turbolinux EnFusion、SCore等。
到网上找了找相关软件,于是觉得heartbeat挺不错的,配置有简单方便。那介绍下heartbeat吧。
heartbeat 自动把ip 服务 通过心跳监测 主服务器瘫痪时 自动切到备用机器 做高可用性集群用。
大致阐述一下heartbeat的工作原理:heartbeat最核心的包括两个部分,心跳监测部分和资源接管部分,心跳监测可以通过网络链路和串口进行,而且支持冗余链路,它们之间相互发送报文来告诉对方自己当前的状态,如果在指定的时间内未受到对方发送的报文,那么就认为对方失效,这时需启动资源接管模块来接管运行在对方主机上的资源或者服务。
软件准备:
heartbeat-2.0.6.tar.gz 下载地址:http://download.chinaunix.net/download.php?id=24742&ResourceID=67
我在rh环境下编译过程中缺少libnet库,翻遍光盘网上也找了好多好久都没有合适的最后发现软来用FC里边的RPM包libnet-devel-1.1.2.1-6.fc4.i386.rpm挺管用!下载地址:http://rpmfind.net/linux/rpm2html/search.php?query=libnet-devel
编译完成后需要找到三个主配置文件,复制到/etc/ha.d/ha.cf、/etc/ha.d/haresources、/etc/ha.d/authkeys。
三个文件的主要配置如下:
/etc/ha.d/ha.cf
# If any of debugfile, logfile and logfacility are defined then they
# will be used. If debugfile and/or logfile are not defined and
# logfacility is defined then the respective logging and debug
# messages will be loged to syslog. If logfacility is not defined
# then debugfile and logfile will be used to log messges. If
# logfacility is not defined and debugfile and/or logfile are not
# defined then defaults will be used for debugfile and logfile as
# required and messages will be sent there.
#
# File to wirte debug messages to
debugfile /var/log/ha-debug
#
#
# File to write other messages to
#
logfile /var/log/ha-log
#
#
# Facility to use for syslog()/logger
#
logfacility local0
#
#
# keepalive: how many seconds between heartbeats
#
keepalive 2
#
# deadtime: seconds-to-declare-host-dead
#
deadtime 10
#
#
# Very first dead time (initdead)
#
# On some machines/OSes, etc. the network takes a while to come up
# and start working right after you've been rebooted. As a result
# we have a separate dead time for when things first come up.
# It should be at least twice the normal dead time.
#
initdead 120
#
# hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
# serial serialportname ...
serial /dev/ttyS0
#
#
# Baud rate for serial ports...
#
baud 19200
#
# What UDP port to use for communication?
#
udpport 694
#
# What interfaces to heartbeat over?
#
udp eth1
#
# Set up a multicast heartbeat medium
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on
# [mcast group] multicast group to join (class D multicast address
# 224.0.0.0 - 239.255.255.255)
# [port] udp port to sendto/rcvfrom (no real reason to differ
# from the port used for broadcast heartbeats)
# [ttl] the ttl value for outbound heartbeats. this effects
# how far the multicast packet will propagate. (0-255)
# [loop] toggles loopback for outbound multicast heartbeats.
# if enabled, an outbound packet will be looped back and
# received by the interface it was sent on. (0 or 1)
#
#
mcast eth1 225.0.0.1 694 1 1
#
# Watchdog is the watchdog timer. If our own heart doesn't beat for
# a minute, then our machine will reboot.
#
watchdog /dev/watchdog
#
# "Legacy" STONITH support
# Using this directive assumes that there is one stonith
# device in the cluster. Parameters to this device are
# read from a configuration file. The format of this line is:
#
# stonith   
#333
# NOTE: it is up to you to maintain this file on each node in the
# cluster!
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
# STONITH support
# You can configure multiple stonith devices using this directive.
# The format of the line is:
# stonith_host   
#  is the machine the stonith device is attached
# to or * to mean it is accessible from any host.
#  is the type of stonith device (a list of
# supported drives is in /usr/lib/stonith.)
#  are driver specific parameters. To see the
# format for a particular device, run:
# stonith -l -t  
#
#
# Note that if you put your stonith device access information in
# here, and you make this file publically readable, you're asking
# for a denial of service attack ;-)
#
#
#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#
# Tell what machines are in the cluster
# node nodename ... -- must match uname -n
node xuanfei1-server
node xuanfei2-server
/etc/ha.d/haresources
#
#just.linux-ha.org 135.9.216.110
#
#-------------------------------------------------------------------
#
# Assuming the adminstrative addresses are on the same subnet...
# A little more complex case: One service address, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address...
#
#just.linux-ha.org 135.9.216.110 http
#-------------------------------------------------------------------
#
# A little more complex case: Three service addresses, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address...
#
#just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd
#-------------------------------------------------------------------
#
# One service address, with funny subnet and bcast addr
# Stop and start httpd service with the subnet address
#
#just.linux-ha.org 135.9.216.3/4/135.9.216.12 httpd
#
#-------------------------------------------------------------------
#
# An example where a shared filesystem is to be used.
# Note that multiple aguments are passed to this script using
# the delimiter '::' to separate each argument.
#
#node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2
xuanfei.com 10.0.0.10 httpd smb
/etc/ha.d/authkeys
# Authentication file. Must be mode 600
#
#
# Must have exactly one auth directive at the front.
# auth send authentication using this method-id
#
# Then, list the method and key that go with that method-id
#
# Available methods: crc sha1, md5. Crc doesn't need/want a key.
#
# You normally only have one authentication method-id listed in this file
#
# Put more than one to make a smooth transition when changing auth
# methods and/or keys.
#
#
# sha1 is believed to be the "best", md5 next best.
#
# crc adds no security, except from packet corruption.
# Use only on physically secure networks.
#
auth 1
1 crc
#2 sha1 HI!
#3 md5 Hello!
记得修改文件authkeys的属性为600 chmod 600 /etc/ha.d/authkeys
copy配置文件到其它节点:
scp /etc/ha.d/ha.cf /etc/ha.d/haresources /etc/ha.d/authkeys root@xuanfei2:/etc/ha.d/
#vi /etc/hosts
node1的hosts内容如下:
127.0.0.1        localhost.localdomain   localhost
10.0.0.1   xuanfei1              HA01
192.168.0.1      HA01
192.168.0.2     HA02
10.0.0.2   xuanfei2
node2的hosts内容如下:
127.0.0.1       localhost.localdomain   localhost
10.0.0.2  xuanfei2              HA02
192.168.0.2      HA02
192.168.0.1      HA01
10.0.0.1  xuanfei1
设置ipvsadm的巡回监测
---------------------------
ipvsadm -A -t 10.0.0.10:80 -s rr
ipvsadm -a -t 10.0.0.10:80 -r 10.0.0.1:80 -m
ipvsadm -a -t 10.0.0.10:80 -r 10.0.0.2:80 -m
执行后进行监测:
#ipvsadm --list
如果返回结果与下相同,则设置正确。
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.0.10:http rr
  -> xuanfei2:http               Local   1      0          0
  -> xuanfei1:http               Masq    1      0          0
HA服务的启动、关闭
首先关闭两台机器需要集群的服务,因为heartbeat 启动时会自动服务打开(测试的时候会有几秒钟的滞后)
/etc/rc.d/init.d/httpd stop
/etc/rc.d/init.d/smb stop
启动HA: service heartbeat start
关闭HA; service heartbeat stop
防火墙设置
/bin/iptables
-A RH-Firewall-1-INPUT -p udp -m udp --dport 694 -d 10.0.0.201 -j ACCEPT
测试
在别的机器里输入:http://10.0.0.10 (虚拟的地址)
可以看到xuanfei1-server ,想办法让xuanfei1-server死机,大概3-5秒,可以看到页面变成xuanfei2-server,服务成功的转换了,等xuanfei1-server服务起来后,页面又切换到
xuanfei1-server,几乎没有延时。这样就提高了系统的高可用性。
现在 Linux 集群在很多领域都已经变得非常流行了。随着集群技术的出现以及开放源码软件日益得到采纳,现在只需要传统高性能机器的很少一部分成本就可以构建一台超级计算机了。 那么好的技术怎么可以不用的呢!大家继续努力!如以上内容有误或有不足之处,望朋友能给予意见或者建议!谢谢:)
TAG
heartbeat

集群


本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/24390/showart_332361.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP