论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2008-01-08 22:24 |只看该作者 |倒序浏览

一:概述:

DRBD 是由内核模块和相关脚本而构成，用以构建高可用性的集群。其实现方式是通过网络来镜像整个设备。您可以把它看作是一种网络RAID。 Drbd 负责接收数据，把数据写到本地磁盘，然后发送给另一个主机。另一个主机再将数据存到自己的磁盘中。

源码下载http://oss.linbit.com/drbd/0.7/drbd-0.7.19.tar.gz

核心参考文档：http://www.drbd.org/drbd-howto.html

二:主要实现

假设有两台机器nannan：192.168.0.136 需要镜像的硬盘：/dev/hdc3

IXDBA.NET社区论坛

  root：192.168.0.139 需要镜像的硬盘：/dev/hdc2

主服务器为192.168.0.136 简称为136

备份服务器为192.168.0.139 简称为139

平常对数据读写都在136上实现，当136 down掉后可以启动139，实现数据的热备份。

真正的热切换需要才用HA 来实现。

三:下载安装

安装环境：Red Hat Enterprise Linux AS release 4，内核版本：2.6.9-22.EL

确认内核源码存在,可到http://oss.linbit.com/drbd/去载.

下载源码注意：当前最新的drbd-8.0pre3，配置文件无法正常配置，出现一大堆错误，所以下载以前的稳定版本。tar源码解包后

运行:

A、make KDIR=/usr/src/linux /*内核所在的位置*/

/*如果你没有更改内核可以直接运行make，软件会直接到/lib/module里边去寻找系统环境，如果是新的内核需要对内核进行编译安装，

否则make时候会错误中断掉*/

B、make install

安装完主要生成命令：drbdsetup ,drbdadmin

和配置文件：/etc/drbd.conf ,启动文件，/etc/init.d/drbd

模块文件：drbd.ko(在编译好的安装包目录下的drbd下可以找到）

所有命令和配置文件都可以在源码包编译成功的目录下面找到。

./scripts/drbd.conf是最原始的配置文件，当/etc/drbd.conf被破坏，可以直接拷贝覆盖掉。

C、创建硬件设备drbd

mknod /dev/drbd0 b 147 0

mknod /dev/drbd1 b 147 1

mknod /dev/drbd2 b 147 2

或者用shell来建立多个：

#for i in $(seq 0 15) ; do mknod /dev/drbd$i b 147 $i ; done

D、DRBD 协议说明

A 数据一旦写入磁盘并发送到网络中就认为完成了写入操作。

B 收到接收确认就认为完成了写入操作。

C 收到写入确认就认为完成了写入操作。

您还可以选择其它参数来将数据传输给磁盘和网络选项。更多详情，请参见drbdsetup 手册页。

四:配置drbd

修改/etc/drbd.conf

主要修改了：机器名和设备名ip地址

==================================================

  on web1{

device    /dev/drbd0;

disk    /dev/hdc3;

address 192.168.0.136:7788;

meta-disk  internal;

# meta-disk is either 'internal' or '/dev/ice/name [idx]'

#

# You can use a single block device to store meta-data

# of multiple DRBD's.

# E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1];

# for two different resources. In this case the meta-disk

# would need to be at least 256 MB in size.

#

# 'internal' means, that the last 128 MB of the lower device

# are used to store the meta-data.

# You must not give an index with 'internal'.

  }

on web2 {

device /dev/drbd0;

disk    /dev/hdc2;

address 192.168.0.139:7788;

meta-disk internal;

  }

==================================================

下面是整个drbd.conf的配置文件:

注意：配置的版本是drbd-0.7.19.tar.gz,不同版本无法兼容。修改的地方就以上几个地方和注释,还有把除了resource r0 外的其他配置块如resource r1等全部去掉。

也就是说

/****

on root {

device /dev/drbd0;

disk    /dev/hdc2;

address 192.168.0.139:7788;

meta-disk internal;

  }

}

****/

后面的所有内容全部去掉,

下面是一个完整的配置好的drbd.conf

#

# drbd.conf example

#

skip {

  As you can see, you can also comment chunks of text

  with a 'skip[optional nonsense]{ skipped text }' section.

  This comes in handy, if you just want to comment out

  some 'resource <some name> {...}' section:

  just precede it with 'skip'.

  The basic format of option assignment is

  <option name><linear whitespace><value>;



  It should be obvious from the examples below,

  but if you really care to know the details:



  <option name> :=

      valid options in the respective scope

  <value>  := <num>|<string>|<choice>|...

            depending on the set of allowed values

            for the respective option.

  <num> := [0-9]+, sometimes with an optional suffix of K,M,G

  <string> := (<name>|\"([^\"\\\n]*|\\.)*\")+

  <name> := [/_.A-Za-z0-9-]+

}

# global {

# use this if you want to define more resources later

# without reloading the module.

# by default we load the module with exactly as many devices

# as configured mentioned in this file.

#

# minor-count 5;

# The user dialog counts and displays the seconds it waited so

# far. You might want to disable this if you have the console

# of your server connected to a serial terminal server with

# limited logging capacity.

# The Dialog will print the count each 'dialog-refresh' seconds,

# set it to 0 to disable redrawing completely. [ default = 1 ]

#

# dialog-refresh 5; # 5 seconds

# You might disable one of drbdadm's sanity check.

# disable-ip-verification;

# }

#

# this need not be r#, you may use phony resource names,

# like "resource web" or "resource mail", too

#

resource r0 {

  protocol C;

# what should be done in case the cluster starts up in

  # degraded mode, but knows it has inconsistent data.

  incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";

  startup {

degr-wfc-timeout 120; # 2 minutes.

  }

  disk {

  }

  net {

  }

  syncer {

rate 10M;

group 1;

al-extents 257;

  }

  on web1{

device    /dev/drbd0;

disk    /dev/hdc3;

address 192.168.0.136:7788;

meta-disk  internal;

  }

  on web2 {

device /dev/drbd0;

disk    /dev/hdc2;

address 192.168.0.139:7788;

meta-disk internal;

  }

}

=================================================

五、启动drbd

先确认两台要镜像的机器是否正常，之间的网络是否通畅，需要加载的硬盘是否处于umount状态。

A、 drbd采用的是模块控制的方式

所以先要加载drbd.ko 模块

在136服务器执行:

#insmod drbd.ko 或者modprobe drbd

drbd.ko可以在编译好的源码包里找到。

判断是否加载成功可以使用lsmod来查看：

#lsmod

Module    size Used by

drbd    143088 -

有的话表示加载模块成功

#drbdadm up all

启动drbd服务，使他挂在后台状态下运行.

可以使用命令netstat -an查看

有启动端口7788，同时也监听对方的7788端口，来实现数据交换。

netstat -ant的输出结果里有一行：

#netstat -ant

tcp       0    0 192.168.0.136:7789       0.0.0.0:*                LISTEN

B:在139服务器执行

#modprobe drbd

#/etc/rc.d/init.d/drbd start

netstat -atn的输出结果，说明两台虚拟机的drbd服务已经连接上了：

#netstat -ant

tcp       0    0 192.168.0.136:7789       192.168.0.139:32845       ESTABLISHED

tcp       0    0 192.168.0.136:32770       192.168.0.139:7789       ESTABLISHED

六:设置权限

drbd的基本服务都起来了，现在需要对主的服务器也就使192.168.0.136这台服务器进行配置，让他能够对drbd0设备进行读写。

在136机器上运行

#drbdadm -- --do-what-I-say  primary all #设置136服务器为同步主目录,也就是同步以136的这个分区为准.

注意命令格式需要一样,没有任何提示的话表示基本上成功了

#sfdisk -s

可以看见有一个硬件设备：/dev/drbd0

此时，两台设备之间就建立起一个镜像，您可以查看/proc/drbd 进行核实。

# cat /proc/drbd

如果原来硬盘没有文件系统的话，现在您可以在设备/dev/drbd0上创建一个文件系统，然后把它加载到136上。

#mkfs.ext3 -j /dev/drbd0

#mkdir /mnt/gaojf

# mount /dev/drbd  /mnt/gaojf

现在/dev/drbd0就等于你服务器上面的一个硬件设备，你可以对他进行任何的读写操作。

七:drbd测试:

1:在136主服务器上:

#drbdadm primary all

#touch /mnt/gaojf/gaojf

完后,在执行

#umount /mnt/gaojf

#drbdadm secondary all

2:接着到139备用服务器上执行

#drbdadm primary all

#mkdir -p /mnt/gaojf

#mount /dev/drbd0 /mnt/gaojf

#ls -l /mnt/gaojf/gaojf

gaojf

Ok,没问题,可以看到数据在136服务器写入,在139马上可以看到.

几点注意的地方：

1. mount drbd设备以前必须把设备切换到primary状态。

2. 两个节点中，同一时刻只能有一台处于primary状态，另一台处于secondary状态。

3. 处于secondary状态的服务器上不能加载drbd设备。

4. 主备服务器同步的两个分区大小最好相同,这样不至于浪费磁盘空间,因为drbd磁盘镜像相当于网络raid 1.

5: 在配置过程中,如果出现下面错误

Missing argument 'meta_data_dev'

USAGE:

disk lower_dev meta_data_dev meta_data_index [{--size|-d} 0 ... 8587575296]

   [{--on-io-error|-e} {pass_on|call-local-io-error|detach}]

   [{--fencing|-f} {dont-care|resource-only|resource-and-stonith}] [{--use-bmbv|-b}]

那么就还要初始化meta-data area

drbdadm create-md r0

更详细的参考：http://www.ixdba.net/article/5c/125.html

文库|博客

molecar

家境小康

论坛徽章:: 0

2楼 [报告]

发表于 2008-01-08 22:48 |只看该作者

蚂蚁。我用821测试过了。这玩意儿目前来说不太可靠啊！

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

exitgogo

丰衣足食

论坛徽章:: 0

3楼 [报告]

发表于 2008-01-08 22:50 |只看该作者

确实有一定问题，对于关键的，大型的应用的确不太适合，一般的应用还是可以的。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

xinyv

小富即安

论坛徽章:: 0

4楼 [报告]

发表于 2008-01-09 09:02 |只看该作者

也许这个方法能解决你们说的不太可靠的问题。：）
在这种方法中我遇到的一个问题就是当集群中主节点down机或拔掉网线的时候，从可以顺利接管主服务器，实现高可用，但这个时候如果你往现在的主（原来的从）的drbd磁盘上写入数据，那么在另一台机器重新启动或插上网线的时候，就会发生“split brain” ，这个时候drbd的数据就不是同步的了，想要同步就必须手工恢复。很奇怪的是如果没有mount drbd就很少发生这种情况。这就好像一个半自动的高可用，我们需要经常去监视他是否断掉了，那怕是重启一台机器都经常会发生“split brain”，我看了drbd的文档，里面有很多策略应该可以避免我上面的情况，可惜我的实验全部失败了。希望有成功的指点我一下。
我的e文不太好，我这段帮助贴上来，e文好的自己看吧。
   -A, --after-sb-0pri asb-0p-policy
            possible policies are:

            disconnect
                  No automatic resynchronisation, simply disconnect.

            discard-younger-primary
                  Auto sync from the node that was primary before the split-  brain situation occurred.

            discard-older-primary
                  Auto sync from the node that became primary as second during the split-brain situation.

            discard-zero-changes
                  In case one node did not write anything since the split brain became evident, sync from the node that wrote something to the node that did
                  not write anything. In case none wrote anything this policy uses a random decission to perform a "resync" of 0 blocks. In case  both  have
                  written something this policy disconnects the nodes.

            discard-least-changes
                  Auto sync from the node that touched more blocks during the split brain situation.

            discard-node-NODENAME
                  Auto sync to the named node.
   -B, --after-sb-1pri asb-1p-policy
            possible policies are:

            disconnect
                  No automatic resynchronisation, simply disconnect.

            consensus
                  Discard the version of the secondary if the outcome of the after-sb-0pri algorithm would also destroy the current secondary’s data. Other-
                  wise disconnect.

            discard-secondary
                  Discard the secondary’s version.

            call-pri-lost-after-sb
                  Always honour the outcome of the after-sb-0pri
                  algorithm. In case it decides the current secondary has the right data, call the pri-lost-after-sb on the current primary.

            violently-as0p
                  Always honour the outcome of the after-sb-0pri
                  algorithm. In case it decides the current secondary has the right data, accept a possible instantaneous change of the primary’s data.

   -C, --after-sb-2pri asb-2p-policy
            possible policies are:

            disconnect
                  No automatic resynchronisation, simply disconnect.

            call-pri-lost-after-sb
                  Always honour the outcome of the after-sb-0pri
                  algorithm. In case it decides the current secondary has the right data, call the pri-lost-after-sb on the current primary.

            violently-as0p
                  Always honour the outcome of the after-sb-0pri
                  algorithm. In case it decides the current secondary has the right data, accept a possible instantaneous change of the primary’s data.

我理解的意思就是 split brain， after - sb （不知道你会不会和我的理解一样……）
这里我解决的办法是用一个脚本来检测 split brain ，并自动判断同步。
shell.1 发到启动组里面
#!/bin/sh
PATH=$PATH:/sbin:/usr/sbin:/usr/local/bin
[ -f /proc/drbd ] || exit 1
if ( grep 'Secondary/Unknown' /proc/drbd );then
      drbdadm disconnect all
      drbdadm -- --discard-my-data connect all
      (sleep 1;echo 'drbd';sleep 2;echo 'drbd';sleep 3)|telnet 192.16.1.22
fi
shell.2 由heartbeat执行。
#! /bin/bash
#
# chkconfig: 345 15 88
# description: Linux High availability services .

# Source function library.
. /etc/init.d/functions
[ ! -f /etc/sysconfig/network ] && exit 1
. /etc/sysconfig/network

# Check that networking is up.
[ "${NETWORKING}" = "no" ] && exit 0

# if the ip configuration utility isn't around we can't function.
[ -x /sbin/ip ] || exit 1;[ -f /proc/drbd ] || exit 1
DRBDSTATE=$(drbdadm state all)
while $(grep -E "SyncSource.*Inconsistent" /proc/drbd >/dev/null 2<&

do
      sleep 10
done

start () {
      sleep 5
      ip addr add 192.16.1.20/24 brd 192.16.3.255 dev bond0
      /etc/init.d/portmap start
      drbdadm primary all
      mount -t ext3 -o rw /dev/drbd0 /mnt/disk0
      mount -t ext3 -o rw /dev/drbd1 /mnt/disk1
      /etc/init.d/nfs start
      /etc/init.d/nfslock start
      exportfs -avr
      return $RETVAL
}
stop () {
      ip addr del 192.16.1.20/24 brd 192.16.1.255 dev bond0
      /etc/init.d/nfs stop
      /etc/init.d/nfslock stop
      umount /mnt/disk0
      umount /mnt/disk1
      drbdadm secondary all
      /etc/init.d/portmap stop
      if ( grep 'Secondary/Unknown' /proc/drbd );then
      exec /etc/rc.d/my-shell.sh;fi
      return $RETVAL
}

# See how we were called.
case "$1" in
  start|stop)
      $1
      ;;
  restart|reload)
      /etc/init.d/$0 stop
      /etc/init.d/$0 start
      ;;
  *)
      echo $"Usage: $0 {start|stop|restart|reload}"
      exit 1
esac

exit 0
添加用户 drbd
passwd
drbd

:105:105

RBD:/home/drbd:/sbin/drbdsh
drbdsh文件
#!/bin/sh
# Variables and Function definition
PATH=$PATH:/sbin:/usr/sbin:/usr/local/bin

#Program Main
[ -f /proc/drbd ] || exit 1
TEMP=$(drbdadm state all)
D_STATE=(${TEMP//\// })
if ( echo ${D_STATE[@]}|grep Primary >/dev/null 2<&- ) && \
( echo ${D_STATE[@]}|grep Unknown >/dev/null 2<&- );then
      drbdadm connect all
      else exit 1
fi
exit 0
启动 telnet
在 hosts.deny 里添加 in.telnet :ALL :ALL EXCEPT 192.16.1.22
这样就可以保证每次启动后数据的同步了。
下面是我的部分配置文件：
drbd.conf
global { usage-count yes; }
common { syncer { rate 10M; } }
resource r0 {
      protocol C;
      handlers { pri-on-incon-degr "halt -f"; }
      disk { on-io-error detach; }
      net {
            cram-hmac-alg sha1;
            shared-secret "800hr_disk_0";
      }
      on test01 {
            device       /dev/drbd0;
            disk          /dev/sda6;
            address       192.16.1.21:7789;
            meta-disk    internal;
      }
      on test02 {
            device       /dev/drbd0;
            disk          /dev/sda6;
            address       192.16.1.22:7789;
            meta-disk    internal;
      }
}

[ 本帖最后由 xinyv 于 2008-1-9 09:03 编辑 ]

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

molecar

家境小康

论坛徽章:: 0

5楼 [报告]

发表于 2008-01-09 11:57 |只看该作者

我目前也没弄明白怎么保证数据完整性的。呵呵！所以还是等有了商业应用看看别人用的效果吧！

[ 本帖最后由 molecar 于 2008-1-9 12:12 编辑 ]

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

jason0127

家境小康

论坛徽章:: 0

6楼 [报告]

发表于 2008-01-19 16:28 |只看该作者

net {
      after-sb-0pri discard-older-primary;
      after-sb-1pri call-pri-lost-after-sb;
      after-sb-2pri call-pri-lost-after-sb;
}
在drbd.conf中配置如上参数,也许能解决各位所遇到的split brain问题

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

troyme

小富即安

论坛徽章:: 0

7楼 [报告]

发表于 2008-01-21 09:43 |只看该作者

split brain是很正常的现象
主节点当机后，辅节点成为primary，并且有数据的改动
然后主节点重启后，默认肯定是primary，但是又由于数据的不一致性，于是两个primary就开始split brain了。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

molecar

家境小康

论坛徽章:: 0

8楼 [报告]

发表于 2008-02-04 13:21 |只看该作者

推荐一下，这个是蚂蚁原创的。可否考虑设置为精华

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

szwzzak

白手起家

论坛徽章:: 0

9楼 [报告]

发表于 2008-03-04 11:15 |只看该作者

DRBD

刚好我手上有一个菲律宾的博彩项目。数据库采用的是MYSQL 5.0.41.用户要求在不增加投资情况下提供高可用性。我初步测试了HEARTBEAT+DRBD+NFS+MYSQL+MON这个组合。（HEARTBEAT 2.0.8;DRBD 8.2.4,Red Hat Enterprise Linux AS release 4，内核版本：2.6.9-34.EL
）
计划满足以下要求：
1）  service heartbeat stop
2）  kiallall heartbeat
3）  reboot
4）  拔掉生产网络网线
5）  拔掉主服务器的电源线
在以上情况下，集群可以自动进行集群ip和mysql切换，同时保证数据完整性。
我参照mysql公司提供的mysql+drbd集群文档进行了配置，同时参考了你的设置。
结论是
service heartbeat stop；kiallall heartbeat；拔掉生产网络网线
以上情况是满足我的要求的。
但是当拔掉主服务器的电源线后，备份服务器heartbeat不能启动相应资源。
错误信息:return code 20 from /etc/ha.d/resource.d/drbddisk
我分析DRBD需要两个节点cs处于connected状态才能设置primary状态,进而挂载/dev/drbd0设备。当拔掉主服务器的电源线后，cs肯定不是connected状态，因此不能将备份服务器drbd设置为primary状态。从而造成heartbeat服务不能启动相应资源。
按照我的测试，heartbeat+mysql是可以满足以上五个要求的（测试时未考虑数据完整性。）
我的drbd.conf
on web1{

device    /dev/drbd0;

disk    /dev/drbd0;

address 10.10.10.1:7788;(使用的是心跳网线)

meta-disk  internal;

  }
net {
      after-sb-0pri discard-older-primary;
      after-sb-1pri call-pri-lost-after-sb;
      after-sb-2pri call-pri-lost-after-sb;
}
ha.cf
使用ipfail,dopd同步进程
Haresources
Local139 192.168.30.249 drbddisk::r0 filesystem::.dev/drbd0::/opt/mysql/data portmap nfslock nfs mysqld
在我这里拔掉生产网络网线的数据完整性已经可以保证了。数据没有问题。
你们在配置drbd+和heartbeat过程中能满足我的全部用户要求吗？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

molecar

家境小康

论坛徽章:: 0

10楼 [报告]

发表于 2008-03-12 16:18 |只看该作者

最好能让mysql server带数据，同是正在写入或读取数据的时候宕机！然后再检验数据

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

12 / 2 页下一页

返回列表

Chinaunix › 论坛 › IT运维 › 集群和高可用 › drbd安装使用指南

drbd安装使用指南 [复制链接]

DRBD