论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2011-08-29 12:23 |只看该作者 |倒序浏览

本帖最后由 rwx_hc 于 2011-08-29 12:47 编辑

在RHEL5下配置安装Heartbeat+drbd+pacemaker做双机热备

前言：近期在做双机热备实验，其间并不顺利，几乎每一步都遇到这样那样问题。由于内核支持问题导致drbd.ko编译不成功，
由于编译选项导致路径问题，由于防火墙导致HA split brain，由于内核iptables模块选项问题导致IPaddr2资源clone失败等等。
相信也有不少朋友在这些方面被困扰过，因此整理一下分享出来。

实验目的：双机热备，支持服务：httpd、postgresql(注：出于简洁目的postgresql部分配置本文未贴出来)

环境：
      操作系统：Red Hat Enterprise Linux Server release 5.2 (Tikanga)
      内核版本：2.6.32.41
      虚拟机：VMware Workstation 7.0.0 build-203739
      磁盘：系统盘/dev/hda，数据盘/dev/hdb
      分区情况：系统分区/dev/hda1，交换分区/dev/hda2，drbd资源分区/dev/hdb1 /dev/hdb2

      节点1：ha1       192.168.1.130
      节点2：ha2       192.168.1.131
      集群IP：192.168.1.132

软件准备：
      //Heartbeat依赖此软件提供的静态库：libnet.a
      libnet-1.1.2.1.tar.gz

      //heartbeat套件：
      Heartbeat-3-0-STABLE-3.0.4.tar.bz2
      ClusterLabs-resource-agents-agents-1[1].0.4-0-gc06b6f3.tar.gz
      Pacemaker-1-0-db98485d06ed.tar.bz2
      Reusable-Cluster-Components-glue--glue-1.0.7.tar.bz2

      //drbd：提供磁盘网络同步
      drbd-8.4.0.tar.gz

注：本文档以符号//为注释说明，在配置文件或脚本中请删除掉或用相应注释符号如#

安装前准备：
      内核需要重新编译，主要是编译内核模块drbd.ko依赖问题。
      CONFIG_SYSFS_DEPRECATED=y       //解决从2.6.18-92.el5升级到2.6.32.41时，提示：mount: could not find filesystem ‘/dev/root’
            位置： General setup  ---> enable deprecated sysfs features which may confuse old userspace tools
      CRYPTO_CRC32C=y //解决加载模块drbd.ko报错：未解析的符号：crc32c
            位置：-*- Cryptographic API  --->  -*- CRC32c CRC algorithm
      CONFIG_CONNECTOR=y       //解决LD链接报错：connector.o文件没有找到！
            位置：Device Drivers  ---> Generic Driver Options  ---> <*> Connector - unified userspace <-> kernelspace link
      注：上面的选项如果不知道位置可以在内核配置界面按键：/ 来查找内容。

      File systems  --->
            如果需要支持负载均衡，那么需要新的文件系统支持：GFS2或OCFS2（注：drbd双主模式在RHEL5上没有测试成功，主要是没有支持dlm-pcmk、gfs-pcmk等模块）
            GFS2依赖选项(iptables的ClusterIP功能也依赖它)：General setup  ---> Prompt for development and/or incomplete code/drivers
      -*- Networking support  --->
            如果需要支持负载均衡，那么这块内容需要仔细配置了，否则clone IP资源是无法成功启动的，这个地方我是经历了N次失败，
            最后查看了日志文件：/var/log/messagest和ocf脚本/usr/lib/ocf/resource.d/heartbeat/IPaddr2才发现问题出在iptables缺少模块支持！
            建议：<>项设定成模块，[]项则编译到内核。也就是：<M> 方式设定所有 Networking options 下的选项。

      编译简要说明：
            make bzImage && make modules && make modules_install && make install
      drbd内核模块是单独编译安装的，安装路径：/lib/modules/2.6.32.41/updates/drbd.ko
      等drbd编译并安装后就可以打包这些模块，拷贝到节点上了。
      打包命令：
            tar czvf kerne_l2.6.32.41_package.tar.gz /boot/""{System.map,System.map-2.6.32.41,vmlinuz-2.6.32.41,vmlinuz,initrd-2.6.32.41.img} /boot/grub/grub.conf /lib/firmware/ /lib/modules/2.6.32.41/
      安装打包文件到节点：上传打包文件到目的节点上
            tar -C / --overwrite -xzf kerne_l2.6.32.41_package.tar.gz

安装部分：
      这里假定以root身份登录到系统，软件目录在，/root/ha/

      节点ha1:
      cd /root/ha
      解压源文件：
      tar xjf Heartbeat-3-0-STABLE-3.0.4.tar.bz2
      tar xjf Pacemaker-1-0-db98485d06ed.tar.bz2
      tar xjf Reusable-Cluster-Components-glue--glue-1.0.7.tar.bz2
      tar xzf ClusterLabs-resource-agents-agents-1[1].0.4-0-gc06b6f3.tar.gz
      tar xzf drbd-8.4.0.tar.gz
      tar xzf libnet-1.1.2.1.tar.gz

      由于编译时会调用/usr/bin/xsltproc通过网络获取doc，因此耗时且麻烦（不能忍受的慢，一个项目就要数小时）。因此想了个办法阻止：
      mv /usr/bin/xsltproc /usr/bin/xsltproc.bak
      echo -e '#!/bin/sh \n exit 0' > /usr/bin/xsltproc
      chmod +x /usr/bin/xsltproc

      如果需要完整安装，则不要替换此程序，同时需要配置缺省路由和DNS服务器：
      route add default gw 192.168.1.1
      vim /etc/resolv.conf 添加行：nameserver       202.98.96.68       //当然也可以添加其他dns服务器地址

      cd libnet
      ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
      make && make install
      打包到dst目录：
      mkdir dst
      make DESTDIR=$PWD/dst install

      cd Reusable-Cluster-Components-glue--glue-1.0.7
      ./autogen.sh
      ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
      make && make -i install                      //注意这里添加了-i 选项，由于替换了/usr/bin/xsltproc为一个空脚本（不想联网取文档！），不加-i将会退出安装。
      打包到dst目录：
      mkdir dst
      make -i DESTDIR=$PWD/dst install

      cd ClusterLabs-resource-agents-c06b6f3
      ./autogen.sh
      ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
      make && make -i install
      ocf-shellfuncs路径问题：
      ocf脚本依赖此脚本，有些脚本寻找路径为：/usr/lib/ocf/resource.d/heartbeat/ocf-shellfuncs
      而有些寻找路径为：/usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs
      但其真实路径却是：/usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
      解决办法：添加软链接到脚本依赖的路径：其他3个文件同样处理：ocf-binaries  ocf-directories  ocf-returncodes
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs /usr/lib/ocf/resource.d/heartbeat/ocf-shellfuncs
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs /usr/lib/ocf/resource.d/heartbeat/ocf-binaries
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs /usr/lib/ocf/resource.d/heartbeat/.ocf-directories
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs /usr/lib/ocf/resource.d/heartbeat/ocf-directories
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs /usr/lib/ocf/resource.d/heartbeat/.ocf-returncodes
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs /usr/lib/ocf/resource.d/heartbeat/ocf-returncodes

      打包到dst目录：
      mkdir dst
      make -i DESTDIR=$PWD/dst install
      修改打包文件路径问题：
      cd dst/usr/lib/ocf/lib/heartbeat/
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs .ocf-shellfuncs
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs ocf-shellfuncs
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs .ocf-binaries
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs ocf-binaries
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs .ocf-directories
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs ocf-directories
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs .ocf-returncodes
      ln -s /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs ocf-returncodes


      cd Heartbeat-3-0-STABLE-3.0.4
      ./bootstrap
      ./configure --help

      重要: 需要添加用户及组启动服务
      groupadd -g 60 haclient
      useradd -u 17 -g haclient hacluster
      ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-group-id=60 --with-ccmuser-id=17
      make && make -i install
      打包到dst目录：
      mkdir dst
      make -i DESTDIR=$PWD/dst install

      cd Pacemaker-1-0-db98485d06ed
      ./autogen.sh
      ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-heartbeat
      make && make -i install
      打包到dst目录：
      mkdir dst
      make -i DESTDIR=$PWD/dst install

      drbd编译部分：
      drbd编译分为两个部分，内核模块编译和用户工具编译。
      首先编译内核模块：
      cd cd drbd-8.4.0
      ./configure       --with-km
      编译报错：
      提示drbd/drbd_receiver.c的drbd_do_auth中mdev没有定义，出错行如下：
      #if !defined(CONFIG_CRYPTO_HMAC) && !defined(CONFIG_CRYPTO_HMAC_MODULE)
      STATIC int drbd_do_auth(struct drbd_tconn *tconn)
      {
            dev_err(DEV, "This kernel was build without CONFIG_CRYPTO_HMAC.\n");
            dev_err(DEV, "You need to disable 'cram-hmac-alg' in drbd.conf.\n");
            return -1;
      }
      #else
      #define CHALLENGE_LEN 64
      代码中就只有调用了dev_err,内有宏定义DEV
      估计就是DEV定义问题，查看了内核中相关头文件定义：
      #define DEV mdev
      看不出问题原因，只好将这两行调用全部注释了，好在仅是输出消息日志，估计并无大碍。

      make && make install
      如果成功模块将安装在：/lib/modules/2.6.32.41/updates/drbd.ko
      modprobe -v drbd             //如果没有出错消息，恭喜你安装OK

      用户工具编译：
      make clean
      ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-utils --with-pacemaker --with-heartbeat
      make && make -i install
      打包到dst目录：
      mkdir dst
      make -i DESTDIR=$PWD/dst install

      节点ha2上重复是面安装步骤。也可以将打包目录中的安装文件tar出直接解压到节点上，节省编译时间。
      至此安装部分完成，接下来是配置部分。

drbd, linux, pacemaker, drbd, linux, pacemaker

评分

参与人数 1	可用积分 +5	收起理由
yuhongchun	+ 5	谢谢分享~DRBD在Centos下蛮好安装的~~~

查看全部评分

文库|博客

rwx_hc

丰衣足食

论坛徽章:: 0

2楼 [报告]

发表于 2011-08-29 12:29 |只看该作者

本帖最后由 rwx_hc 于 2011-08-29 12:35 编辑

-----------------------------------------------------
配置部分1：heartbeat
-----------------------------------------------------
节点ha1:

主机解析配置：
vim /etc/hosts //添加下面行：
192.168.1.130 ha1
192.168.1.131 ha2

vim /etc/sysconfig/network //修改下面行
HOSTNAME=ha1

//指令方式修改主机名，否则主机名生效需要重启系统
#hostname ha1

//节点ha2上做相应配置

编辑heartbeat配置文件：详细参数设置可以参考文档例子：/usr/share/doc/ha.cf
vim /etc/ha.d/ha.cf

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility    local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
#mcast eth0 225.0.0.0 694 1 0
#ha1
ucast eth0 192.168.1.131
#ha2
#ucast eth0 192.168.1.130
auto_failback on
node ha1
node ha2
ping 192.168.1.2
#ping_group group1 192.168.1.1 192.168.1.2 192.168.1.3 192.168.1.4 192.168.1.5
pacemaker respawn

为HeartBeat创建key文件：签名算法可以是md5 sha1 crc crc32c，这里采用sha1，节点ha2上需要拷贝这个key文件，
两边的key文件必须相同！
(echo -ne "auth 1\n1 sha1 ";dd if=/dev/urandom bs=512 count=1 | openssl sha1 ) > /etc/ha.d/authkeys
chmod 600 /etc/ha.d/authkeys
拷贝到节点ha2上：
scp /etc/ha.d/authkeys root@ha2:/etc/ha.d/authkeys
scp /etc/ha.d/ha.cf root@ha2:/etc/ha.d/ha.cf
ha2上：修改ha.cf：两个节点仅此行配置是不同的，因为需要配置的是对方的IP地址，如果有两个以上节点则需要组播地址！

#ucast eth0 192.168.1.131
ucast eth0 192.168.1.130

重要：防火墙设置：setup --> Firewall configuration
Security Level: ( ) Enabled (*) Disabled SELinux: Disabled
或者定制：需要开放端口：UDP 694
定制选项中设置：Other ports ha-cluster:udp
直接修改配置文件：

节点ha1和ha2上：
将heartbeat服务设置为开机启动
chkconfig --level 3 heartbeat on

启动服务
service heartbeat start
注意查看信息提示和日志：/var/log/ha-log
如果工作不正常，日志是很重要的参考信息。

状态查看：
crm_mon -1
或
crm status
输出应如下：
   crm status
============
Last updated: Fri Aug 26 19:42:31 2011
Stack: Heartbeat
Current DC: ha2 (eec53f08-e761-461c-9bc3-9520159451f2) - partition with quorum
Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
2 Nodes configured, unknown expected votes
============

Online: [ ha1 ha2 ]

pacemaker控制文件： /var/lib/heartbeat/crm/cib.xml

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

rwx_hc

丰衣足食

论坛徽章:: 0

3楼 [报告]

发表于 2011-08-29 12:30 |只看该作者

本帖最后由 rwx_hc 于 2011-08-29 12:37 编辑

------------------------------------------------
配置部分2：drbd
------------------------------------------------
准备：硬盘为/dev/hdb，分区为/dev/hdb1，hdb上应预留一些空间来保存drbd数据。
也就是说/dev/hdb分区时应保留最后一些柱面不使用。（查看官方文档发现使用meta-disk internal需要预留磁盘空间在末端！！）。
计算公式为：disk.MB / 32768 + 1
如果一个G，占用1.03M
1个T，占用：31.5M
2个T，占用：62M

配置文件，这个采用缺省即可，无需再配置。
[root@ha1 ~]# cat /etc/drbd.conf
# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

全局资源文件：
[root@ha1 drbd.d]#cat /etc/drbd.d/global_common.conf
global {
      usage-count yes;
}

common {
      handlers {
            pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
            pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
            local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
            # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
            # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
            # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
            # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
            # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
      }

      startup {
            # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
      }

      options {
            # cpu-mask on-no-data-accessible
      }

      disk {
            # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
            # disk-drain md-flushes resync-rate resync-after al-extents
            # c-plan-ahead c-delay-target c-fill-target c-max-rate
            # c-min-rate disk-timeout
      }

      net {
            protocol C;
            # protocol timeout max-epoch-size max-buffers unplug-watermark
            # connect-int ping-int sndbuf-size rcvbuf-size ko-count
            # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
            # after-sb-1pri after-sb-2pri always-asbp rr-conflict
            # ping-timeout data-integrity-alg tcp-cork on-congestion
            # congestion-fill congestion-extents csums-alg verify-alg
            # use-rle
      }
}

资源文件：/etc/drbd.d/*.res //后缀名为res
[root@ha1 drbd.d]# cat r0.res
resource r0 {
      handlers {
            split-brain "/usr/lib/drbd/notify-split-brain.sh root";
      }
      net {
            after-sb-0pri discard-zero-changes;
            after-sb-1pri discard-secondary;
            after-sb-2pri disconnect;
      }

      disk {
            on-io-error detach;
            disk-flushes no;
            md-flushes no;
      }
      volume 0 {
            device  /dev/drbd1;
            disk /dev/hdb1;
            meta-disk    internal;
      }

      on ha1 {
            address 192.168.1.130:7789;
      }

      on ha2 {
            address 192.168.1.131:7789;
      }
}

创建drbd分区：
清除原残存的文件系统信息：
dd if=/dev/zero bs=1M count=1 of=/dev/hdb1
drbdadm create-md r0

加载内核模块：
modprobe drbd

启动资源：
drbdadm up r0

两个节点同样配置处理。

查看：
cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@AS5-build, 2011-08-10 12:43:25

   1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
      ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:977028

设备初始化同步，选定ha1做为源
在ha1上：
drbdadm primary --force r0
或下面命令，推荐用下面命令：
drbdadm -- --overwrite-data-of-peer primary r0
另一节点ha2上会自动同步ha1上的数据！

ha1已成为主设备，可以使用了：
mkfs.xfs /dev/drbd1
mount /dev/drbd1 /mnt
umount /dev/drbd1

注：单主模式下某一时刻仅在主设备上，drbd盘才可以mount使用！
使用crm配置drbd在下面部分介绍。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

rwx_hc

丰衣足食

论坛徽章:: 0

4楼 [报告]

发表于 2011-08-29 12:31 |只看该作者

--------------------------------------------
配置部分3：使用crm命令行工具
--------------------------------------------
crm提供非交互和交互式两种界面给用户，在交互式界面如果按TAB键会有帮助提示和命令补全，
这一点和cisco提供的界面很类似！

查看配置:
#crm
#crm(live)#configure
crm(live)configure# show
node $id="76b083b0-ba70-458b-b47a-5a54d9fedd15" ha1
node $id="eec53f08-e761-461c-9bc3-9520159451f2" ha2
property $id="cib-bootstrap-options" \
         dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \
         cluster-infrastructure="Heartbeat"

验证配置是否OK：
crm(live)configure# verify
crm_verify[3488]: 2011/08/20_13:16:39 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[3488]: 2011/08/20_13:16:39 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[3488]: 2011/08/20_13:16:39 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

禁用stonith:
crm configure property stonith-enabled=false
当然也可以是:
crm(live)configure# property stonith-enabled=false
crm(live)configure#commit

禁用节点合法人数检查：//两个节点必须配置此项为ignore，否则一个节点down了，集群也就不工作了。
crm configure property no-quorum-policy=ignore

资源粘滞设置，防止资源在节点中移动，当一个资源在提供服务时，尽量留在当前节点上。
如果不设置：首先资源运行在node1上，如果node1失效，资源将移至node2，之后node1恢复，资源将重新回到node1上运行，
这样将导致资源频繁切换，大多数情况这并不是我们想要的结果。
crm configure rsc_defaults resource-stickiness=100

配置第一个资源，集群IP（共享IP:192.168.1.132）
crm configure
crm(live)configure#primitive ip-cluster ocf:heartbeat:IPaddr2 \
params ip="192.168.1.132" clusterip_hash="sourceip-sourceport" \
      op monitor interval="60"

crm中如何修改资源配置：//在configure模式下输入命令edit或edit 资源ID
修改整个配置：
crm(live)configure#edit
只修改资源ip-cluster：
crm(live)configure#edit ip-cluster

缺省编辑工具是vim，编辑完成就保存退出即可。
验证后提交修改：
crm(live)configure# verify
crm(live)configure#commit
查看：资源clusterIP在ha2上启动
crm(live)#status
Online: [ ha1 ha2 ]

ip-cluster    (ocf::heartbeat:IPaddr2):    Started ha2

设置资源ip-cluster在某个节点上优先运行：这里假定希望ha1优先
crm(live)configure# location location-1 ip-cluster 100: ha1
crm(live)configure#commit
crm(live)node#standby ha1
crm(live)node#standby ha2
crm(live)node#online ha1
crm(live)node#online ha2
查看：资源ip-cluster优先在ha1上启动了！
crm(live)#status
Online: [ ha1 ha2 ]

ip-cluster    (ocf::heartbeat:IPaddr2):    Started ha1

准备httpd：
系统现状：缺省已安装Apache服务，配置文件：/etc/httpd/conf/httpd.conf
服务启动脚本：/etc/init.d/httpd
加入一个页面：vim /var/www/html/index.html
内容如下：
<html>
<body>My Test Site - ha1</body>
</html>

另一节点：vim /var/www/html/index.html
<html>
<body>My Test Site - ha2</body>
</html>

修改httpd配置文件：
vim /etc/httpd/conf/httpd.conf
取消对下面行的注释：//source agent会使用server-status URL来查看服务状态。
<Location /server-status>
      SetHandler server-status
      Order deny,allow
      Deny from all
      Allow from 127.0.0.1
</Location>

测试httpd服务：
[root@ha1 ~]# service httpd start
Starting httpd: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.1.130 for ServerName

vim /etc/httpd/conf/httpd.conf
找到行：#ServerName www.example.com:80
修改为：ServerName 192.168.1.130:80
重启服务OK！

资源配置：
crm(live)configure#primitive srv-apache ocf:heartbeat:apache \
      params configfile="/etc/httpd/conf/httpd.conf" \
      op monitor interval="60" \
      op start interval="0" timeout="40" \
      op stop interval="0" timeout="60"
crm(live)configure#commit

查看当前集群运行状态：两个资源分别运行在不同节点上，heartbeat缺省会平均分配资源到不同节点！
crm(live)# status
============
Last updated: Fri Aug 26 20:14:24 2011
Stack: Heartbeat
Current DC: ha2 (eec53f08-e761-461c-9bc3-9520159451f2) - partition with quorum
Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ ha1 ha2 ]

   ip-cluster    (ocf::heartbeat:IPaddr2):    Started ha1
   srv-apache (ocf::heartbeat:apache):       Started ha2

   配置两个资源在同一节点上运行：特别是两个资源互相依赖时更是需要如此！
   crm(live)configure#colocation colocation-1 inf: ip-cluster srv-apache
   crm(live)configure# commit
   查看：两个资源已在同一节点上运行！
   crm(live)#status
   Online: [ ha1 ha2 ]

   ip-cluster    (ocf::heartbeat:IPaddr2):    Started ha1
   srv-apache (ocf::heartbeat:apache):       Started ha1

配置资源启动顺序：如果资源web依赖资源ip-cluster则可以如下配置，保证ip-cluster先于web启动。
crm(live)configure# order order-1 inf: ip-cluster srv-apache
crm(live)configure# commit

重置资源失败记数：
当资源失败时，手工管理资源，否则集群不会自动重启资源：
crm(live)resource# failcount srv-apache show ha1
scope=status  name=fail-count-srv-apache value=INFINITY
crm(live)resource# failcount srv-apache show ha2
scope=status  name=fail-count-web value=INFINITY
失败计数已被设置为无穷大，资源永远不会再起来了！
crm(live)resource# failcount srv-apache set ha1 0
crm(live)resource# failcount srv-apache set ha2 0

crm status命令中出现失败信息：
Failed actions:
      srv-apache_monitor_60000 (node=ha2, call=34, rc=7, status=complete): not running
      srv-apache_start_0 (node=ha2, call=36, rc=5, status=complete): not installed
清除信息命令如下：
crm(live)resource# cleanup srv-apache ha1
crm(live)resource# cleanup srv-apache ha2
或
crm(live)resource# cleanup srv-apache

crm中配置drbd主从资源：
crm(live)configure#primitive drbd-apache ocf:linbit:drbd \
      params drbd_resource="r0" \
      op monitor interval="60" role="Master" timeout="10" \
      op monitor interval="60" role="Slave" timeout="20" \
      op start interval="0" timeout="240" \
      op stop interval="0" timeout="100"
crm(live)configure#ms ms-drbd-apache drbd-apache \
      meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1" notify="yes" globally-unique="false" target-role="Started"

配置文件系统资源：
crm(live)configure#primitive fs-apache ocf:heartbeat:Filesystem \
      params fstype="ext3" directory="/var/www/html" device="/dev/drbd1" \
      op monitor interval="10" timeout="40" \
      op start interval="0" timeout="60" \
      op stop interval="0" timeout="60"

配置资源组：当配置成资源组时，这些资源就最好以组的面目出现了，否则可能出现死锁情况。
个人经验：一个组中的资源最好不要有启动顺序依赖关系，组的作用仅是管理上的方便。
crm(live)configure#group group-apache ip-cluster srv-apache
这一步操作后，crm会自动调整和这个组资源group-apache相关的配置，但其调整并不是很完美，
还需要手工编辑：
crm(live)configure#edit
编辑完成结果如下：
crm(live)configure#show
node $id="76b083b0-ba70-458b-b47a-5a54d9fedd15" ha1 \
      attributes standby="off"
node $id="eec53f08-e761-461c-9bc3-9520159451f2" ha2 \
         attributes standby="off"
primitive drbd-apache ocf:linbit:drbd \
         params drbd_resource="r0" \
         op monitor interval="60" role="Master" timeout="10" \
         op monitor interval="60" role="Slave" timeout="20" \
         op start interval="0" timeout="240" \
         op stop interval="0" timeout="100"
primitive fs-apache ocf:heartbeat:Filesystem \
         params fstype="xfs" directory="/var/www/html" device="/dev/drbd1" \
         op monitor interval="10" timeout="40" \
         op start interval="0" timeout="60" \
         op stop interval="0" timeout="60"
primitive ip-cluster ocf:heartbeat:IPaddr2 \
         params ip="192.168.1.132" clusterip_hash="sourceip-sourceport" \
         op monitor interval="60"
primitive srv-apache ocf:heartbeat:apache \
         params configfile="/etc/httpd/conf/httpd.conf" \
         op monitor interval="60" \
         op start interval="0" timeout="40" \
         op stop interval="0" timeout="60"
group group-apache ip-cluster srv-apache
ms ms-drbd-apache drbd-apache \
         meta clone-max="2" clone-node-max="1" master-max="1" master-node-max="1" notify="yes" globally-unique="false" target-role="Started"
location location-1 group-apache 100: ha1
colocation colocation-1 inf: fs-apache ms-drbd-apache:Master group-apache
order order-1 inf: ms-drbd-apache:promote fs-apache:start group-apache:start
property $id="cib-bootstrap-options" \
         stonith-enabled="false" \
         no-quorum-policy="ignore" \
         dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \
         cluster-infrastructure="Heartbeat" \
         last-lrm-refresh="1314397927"

最后查看下集群工作状态：
crm(live)#status

Online: [ ha1 ha2 ]

   fs-apache    (ocf::heartbeat:Filesystem): Started ha1
   Master/Slave Set: ms-drbd-apache
      Masters: [ ha1 ]
      Slaves: [ ha2 ]
   Resource Group: group-apache
      ip-cluster (ocf::heartbeat:IPaddr2):    Started ha1
      srv-apache (ocf::heartbeat:apache):       Started ha1

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

rwx_hc

丰衣足食

论坛徽章:: 0

5楼 [报告]

发表于 2011-08-29 12:32 |只看该作者

-----------------------------------------------------------------------------
其他：
-----------------------------------------------------------------------------

如何查看资源参数设定：
查看资源所有分类：
[root@ha2 ~]# crm ra
crm(live)ra# classes
heartbeat
lsb
ocf / heartbeat linbit pacemaker
stonith

查看资源apache:
crm(live)ra# meta apache
or
crm(live)ra# info apache
or
crm(live)ra# info ocf:heartbeat:apache

查看类heartbeat支持的资源：
crm(live)ra# list heartbeat
AudibleAlarm Delay          Filesystem    ICP          IPaddr
IPaddr2       IPsrcaddr    IPv6addr       LVM          LinuxSCSI
MailTo       OCF          Raid1          SendArp       ServeRAID
WAS          WinPopup       Xinetd       apache       db2
drbddisk       drbdupper    hto-mapfuncs ids          ldirectord
portblock

在线帮助是非常有用的：
比如查看如何使用命令list:
crm(live)ra# help list
crm(live)ra# help list

List available resource agents for the given class. If the class
is `ocf`, supply a provider to get agents which are available
only from that provider.

Usage:
...............
         list <class> [<provider>]
...............
Example:
...............
         list ocf pacemaker
...............

命令行下查看资源解析：
比如某个资源配置不成功，提示解析错误时，你可以通过下面命令验证：例子apache资源解析：
[root@ha2 ~]# lrmadmin -M ocf apache heartbeat
如果成功会有该资源的详细信息输出，从中可以查看参数配置详情。如果不成功则要找原因了，
多数情况是依赖路径有问题！

命令行下查看资源在节点得分情况：查看这个输出可以发现哪些资源因失败而被设置为负无穷大而不能起来。
[root@ha2 ~]# ptest -sL

查看集群工作状态：
[root@ha2 ~]# crm_mon -1
or
[root@ha2 ~]# crm status

查看drbd工作状态：
[root@ha2 ~]# drbd-overview
or
[root@ha2 ~]# cat /proc/drbd

参考文档：
http://www.linux-ha.org/doc/users-guide/users-guide.html
http://www.drbd.org/users-guide-emb/p-intro.html
http://www.clusterlabs.org/doc/z ... m_Scratch-zh-CN.pdf

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

芬达7402

家境小康

论坛徽章:: 0

6楼 [报告]

发表于 2011-08-30 11:17 |只看该作者

楼主介绍下有什么防止脑裂的办法？我这里是用脚本去ping网关，感觉效果不是很好。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

rwx_hc

丰衣足食

论坛徽章:: 0

7楼 [报告]

发表于 2011-09-06 16:50 |只看该作者

楼主介绍下有什么防止脑裂的办法？我这里是用脚本去ping网关，感觉效果不是很好。
芬达7402 发表于 2011-08-30 11:17

/etc/ja.d/ha.cf
中可以配置ping_group来代替ping
如：
ping_group group1 1.1.1.1 1.1.1.2 1.1.1.3
这样就不会被单个IP影响了

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

芬达7402

家境小康

论坛徽章:: 0

8楼 [报告]

发表于 2011-09-07 09:29 |只看该作者

heartbeat，我用的不是很多，慢慢学习！多谢楼上的朋友

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

zhumengkun

白手起家

论坛徽章:: 0

9楼 [报告]

发表于 2011-12-12 15:36 |只看该作者

请问怎样设置资源才能使资源在迁移时不重启·········急急急急急急急急急··········

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

aixcradent

家境小康

论坛徽章:: 0

10楼 [报告]

发表于 2012-03-12 17:18 |只看该作者

回复 1# rwx_hc

crm_verify -L -V
crm_verify[18115]: 2012/01/09_05:37:01 WARN: unpack_rsc_op: Processing failed op fs-apache_last_failure_0 on drbd55: unknown error (1)
crm_verify[18115]: 2012/01/09_05:37:01 WARN: unpack_rsc_op: Processing failed op srv-apache_last_failure_0 on drbd55: unknown error (1)
crm_verify[18115]: 2012/01/09_05:37:01 ERROR: unpack_rsc_op: Hard error - drbd-apache_last_failure_0 failed with rc=6: Preventing drbd-apache from re-starting anywhere in the cluster
crm_verify[18115]: 2012/01/09_05:37:01 WARN: unpack_rsc_op: Processing failed op srv-apache_last_failure_0 on drbd57: unknown error (1)
crm_verify[18115]: 2012/01/09_05:37:01 ERROR: unpack_rsc_op: Hard error - drbd-apache_last_failure_0 failed with rc=6: Preventing drbd-apache from re-starting anywhere in the cluster
crm_verify[18115]: 2012/01/09_05:37:01 WARN: common_apply_stickiness: Forcing fs-apache away from drbd55 after 1000000 failures (max=1000000)
以上信息是什么原因，我查了半天也没查到。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

返回列表

Chinaunix › 论坛 › IT运维 › 集群和高可用 › 在RHEL5下配置安装Heartbeat+drbd+pacemaker做双机热备

在RHEL5下配置安装Heartbeat+drbd+pacemaker做双机热备 [复制链接]

评分