- 论坛徽章:
- 0
|
pv missing simulation and solution
我上次的pv missing问题已经找到解决办法了,下面我把整个过程模拟一次,并贴出解决办法。
上次的问题以及分析、测试过程,可以看以下的帖子:
http://www.loveunix.net/discuz/viewthread.php?tid=55989&extra=page%3D1
说明:我的情况非常特殊,大家在遇到pv missing的时候要小心分析,不一定和我的状态一样。
其实pvid和vgid在pv上都有3个地方记录着,分别是LVM record in disk和两份的VGDA。我所列出的情况是LVM record in disk里面的VGID错了,其它地方的数值都正常。varyonvg的时候,这块盘的状态是LVM_LVMRECNMTCH,意思是VGDA indicates this PV is a member of the VG, but VG id in the PV's LVM record does not match this VG 。
**********************************************************
我首先把hdisk4加入到datavg中,令这个vg有3块盘,这样其中一块有问题也能varyonvg。
root@dms3:/>lspv
hdisk1 00387cfa9c67ad93 rootvg active
hdisk2 00387cfa4390b175 datavg active
hdisk3 00387cfa4390bcb6 datavg active
hdisk4 000dab4d14a95e82 datavg active
我们先留意一下hdisk4的pvid和vgid:
root@dms3:/home/oper>lspv hdisk4
PHYSICAL VOLUME: hdisk4 VOLUME GROUP: datavg
PV IDENTIFIER: 000dab4d14a95e82 VG IDENTIFIER 00387cfa00004c00000000ffb812fc1a
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 64 megabyte(s) LOGICAL VOLUMES: 1
TOTAL PPs: 596 (38144 megabytes) VG DESCRIPTORS: 1
FREE PPs: 592 (37888 megabytes) HOT SPARE: no
USED PPs: 4 (256 megabytes)
FREE DISTRIBUTION: 120..115..119..119..119
USED DISTRIBUTION: 00..04..00..00..00
然后我们再次确认vgid在硬盘头部的位置:
root@dms3:/tmp>od -x /dev/hdisk4|more
0000000 c9c2 d4c1 0000 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0000200 000d ab4d 14a9 5e82 0000 0000 0000 0000
0000220 0000 0000 0000 0000 0000 0000 0000 0000
*
0007000 5f4c 564d 0038 7cfa 0000 4c00 0000 00ff
0007020 b812 fc1a 0000 1074 0000 0832 0000 0088
0007040 0000 08c2 04a8 16ff 0000 0100 0003 001a
0007060 0000 0008 0000 0080 0000 08ba 0008 0000
0007100 0000 0000 0000 0000 0000 0000 0000 0000
*
0010000 4445 4645 4354 0000 0000 0000 0000 0000
0010020 0000 0000 0000 0000 0000 0000 0000 0000
*
0107000 4445 4645 4354 0000 0000 0000 0000 0000
0107020 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000 43e9 8e20 32d4 2a79 0000 0000 0000 0000
我写了一个修改vgid的脚本,目的是修改vgid的最后8个数字,即7020开始那8个数字:b812fc1a。这个脚本是基于IBM redbook上面修改pvid的脚本改成的。
root@dms3:/tmp>cat chvgid.sh
#!/usr/bin/ksh
vgid=$1
disk=$2
set -A a `echo $vgid|\
awk '{
for (f=1; f
print "ibase=16\nobase=8\n"toupper(substr($0,f,2))
}
}'|bc 2>/dev/null`
/usr/bin/echo "\0"${a[0]}"\0"${a[1]}"\0"${a[2]}"\0"${a[3]}"\c"|dd bs=1 seek=3600 of=/dev/$disk
我用12342234这8个数字替换了原来的vgid最后8个数字:
root@dms3:/tmp>./chvgid.sh 12342234 hdisk4
4+0 records in.
4+0 records out.
检查一下7020位置的数值:
root@dms3:/tmp>od -x /dev/hdisk4|more
0000000 c9c2 d4c1 0000 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0000200 000d ab4d 14a9 5e82 0000 0000 0000 0000
0000220 0000 0000 0000 0000 0000 0000 0000 0000
*
0007000 5f4c 564d 0038 7cfa 0000 4c00 0000 00ff
0007020 1234 2234 0000 1074 0000 0832 0000 0088
0007040 0000 08c2 04a8 16ff 0000 0100 0003 001a
0007060 0000 0008 0000 0080 0000 08ba 0008 0000
0007100 0000 0000 0000 0000 0000 0000 0000 0000
*
0010000 4445 4645 4354 0000 0000 0000 0000 0000
0010020 0000 0000 0000 0000 0000 0000 0000 0000
*
0107000 4445 4645 4354 0000 0000 0000 0000 0000
0107020 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000 43e9 8e20 32d4 2a79 0000 0000 0000 0000
重起机器,datavg自动varyon,error report会出现以下两条记录:
5BEAD71B 0208151906 I S LIBLVM Activation of a no quorum volume group
26120107 0208151906 U S LIBLVM PHYSICAL VOLUME DEFINED AS MISSING
详细的内容如下:
root@dms3:/>errpt -a|more
---------------------------------------------------------------------------
LABEL: LVM_QUORUMNOQUORUM
IDENTIFIER: 5BEAD71B
Date/Time: Wed Feb 8 15:19:12 HKG
Sequence Number: 36365
Machine Id: 00387CFA4C00
Node Id: dms3
Class: S
Type: INFO
Resource Name: LIBLVM
Description
Activation of a no quorum volume group with out 100% of the disks
Probable Causes
One or more physical volumes are not available and MISSINGPV_VARYON variable is on
Detail Data
MAJOR/MINOR DEVICE NUMBER
0038 7CFA
SENSE DATA
0000 4C00 0000 00FF B812 FC1A 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: LVM_MISSPVADDED
IDENTIFIER: 26120107
Date/Time: Wed Feb 8 15:19:12 HKG
Sequence Number: 36364
Machine Id: 00387CFA4C00
Node Id: dms3
Class: S
Type: UNKN
Resource Name: LIBLVM
Description
PHYSICAL VOLUME DEFINED AS MISSING
Probable Causes
POWER, DRIVE, ADAPTER, OR CABLE FAILURE
Detail Data
MAJOR/MINOR DEVICE NUMBER
0012 0004
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
大家留意这个vg虽然可以varyon,但只有两块盘是active。
root@dms3:/>lsvg datavg
VOLUME GROUP: datavg VG IDENTIFIER: 00387cfa00004c00000000ffb812fc1a
VG STATE: active PP SIZE: 64 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1788 (114432 megabytes)
MAX LVs: 256 FREE PPs: 594 (38016 megabytes)
LVs: 18 USED PPs: 1194 (76416 megabytes)
OPEN LVs: 12 QUORUM: 2
TOTAL PVs: 3 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 2 AUTO ON: yes
MAX PPs per PV: 1016 MAX PVs: 32
LTG size: 128 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
当然,我们改了vgid的hdisk4的状态就变成了missing:
root@dms3:/>lspv hdisk4
PHYSICAL VOLUME: hdisk4 VOLUME GROUP: datavg
PV IDENTIFIER: 000dab4d14a95e82 VG IDENTIFIER 00387cfa00004c00000000ffb812fc1a
PV STATE: missing
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 64 megabyte(s) LOGICAL VOLUMES: 1
TOTAL PPs: 596 (38144 megabytes) VG DESCRIPTORS: 1
FREE PPs: 592 (37888 megabytes) HOT SPARE: no
USED PPs: 4 (256 megabytes)
FREE DISTRIBUTION: 120..115..119..119..119
USED DISTRIBUTION: 00..04..00..00..00
root@dms3:/home/oper>varyoffvg datavg
root@dms3:/home/oper>exportvg datavg
我们重新importvg,就会见到久违了的LVMRECNMTCH状态:
root@dms3:/home/oper>importvg -y datavg hdisk2
PV Status: hdisk2 00387cfa4390b175 PVACTIVE
hdisk3 00387cfa4390bcb6 PVACTIVE
000dab4d14a95e82 NONAME
varyonvg: Volume group datavg is varied on.
0516-1281 synclvodm: Warning, lv control block of lvdemon
has been over written.
0516-622 synclvodm: Warning, cannot write lv control block data.
datavg
PV Status: hdisk2 00387cfa4390b175 PVACTIVE
hdisk3 00387cfa4390bcb6 PVACTIVE
hdisk4 000dab4d14a95e82 LVMRECNMTCH
varyonvg: Volume group datavg is varied on.
我们再exportvg,然后把vgid改回原来那8个数字:
root@dms3:/tmp>./chvgid.sh b812fc1a hdisk4
4+0 records in.
4+0 records out.
检查一下7020的数值:
root@dms3:/tmp>od -x /dev/hdisk4|more
0000000 c9c2 d4c1 0000 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
0000200 000d ab4d 14a9 5e82 0000 0000 0000 0000
0000220 0000 0000 0000 0000 0000 0000 0000 0000
*
0007000 5f4c 564d 0038 7cfa 0000 4c00 0000 00ff
0007020 b812 fc1a 0000 1074 0000 0832 0000 0088
0007040 0000 08c2 04a8 16ff 0000 0100 0003 001a
0007060 0000 0008 0000 0080 0000 08ba 0008 0000
0007100 0000 0000 0000 0000 0000 0000 0000 0000
*
0010000 4445 4645 4354 0000 0000 0000 0000 0000
0010020 0000 0000 0000 0000 0000 0000 0000 0000
*
0107000 4445 4645 4354 0000 0000 0000 0000 0000
0107020 0000 0000 0000 0000 0000 0000 0000 0000
*
0200000 43e9 8e20 32d4 2a79 0000 0000 0000 0000
最后做一次importvg,可以看到没有任何报错信息:
root@dms3:/tmp>importvg -y datavg hdisk2
datavg
我们看到3个pv都active了:
root@dms3:/tmp>lsvg datavg
VOLUME GROUP: datavg VG IDENTIFIER: 00387cfa00004c00000000ffb812fc1a
VG STATE: active PP SIZE: 64 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 1788 (114432 megabytes)
MAX LVs: 256 FREE PPs: 594 (38016 megabytes)
LVs: 18 USED PPs: 1194 (76416 megabytes)
OPEN LVs: 0 QUORUM: 2
TOTAL PVs: 3 VG DESCRIPTORS: 3
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 3 AUTO ON: yes
MAX PPs per PV: 1016 MAX PVs: 32
LTG size: 128 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
检查hdisk4的状态也完全正常了。
root@dms3:/tmp>lspv hdisk4
PHYSICAL VOLUME: hdisk4 VOLUME GROUP: datavg
PV IDENTIFIER: 000dab4d14a95e82 VG IDENTIFIER 00387cfa00004c00000000ffb812fc1a
PV STATE: active
STALE PARTITIONS: 0 ALLOCATABLE: yes
PP SIZE: 64 megabyte(s) LOGICAL VOLUMES: 1
TOTAL PPs: 596 (38144 megabytes) VG DESCRIPTORS: 1
FREE PPs: 592 (37888 megabytes) HOT SPARE: no
USED PPs: 4 (256 megabytes)
FREE DISTRIBUTION: 120..115..119..119..119
USED DISTRIBUTION: 00..04..00..00..00
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/6049/showart_73374.html |
|