- 论坛徽章:
- 0
|
This article demostrates adding a node to a running VCS cluster without any impacts on production.
Environment: Solaris 8/SPARC, VRTSvcs 2.0
In the following case, sjcitdb24, sjcitdb63 are the existing nodes of the cluster, and sjcitdb75 is the one to be added to the cluster. "bopsmart" is the service group with a production Oracle instance, and "ClusterService" is a service group which is for management only.
Procedure
=======
. Install O.S. and the S/W, patches, and ensure SW to be identical to the ones on the existing nodes
- make sure their O.S. release & version are the same.
- make sure Vertias softwares of the same versions are installed on the host by comparing the output of "pkginfo | grep VERITAS" & "pkginfo -l SWname" to to the one on the existing node.
- make sure the VCS patch is the same by comparing the output of "showrev -p | grep vcs".
. Make sure cables for data LAN(bge1, bge2), heartbeat LAN(bge3, ce0), SAN have been patched, and verified - LAN connection can be verified by such as "snoop -d ge0".
. Freeze the production serivce groups in the cluster.
. Deconfigure llt, gab by running "gabconfig -U; lltconfig -U"
. Create /etc/llttab, /etc/llthosts, /etc/gabtab -
/etc/llttab:
set-node /etc/nodename
set-cluster 154
link bge3 /dev/bge:3 - ether - -
link ce0 /dev/ce:0 - ether - -
start
/etc/llthosts:
0 sjcitdb63
1 sjcitdb24
2 sjcitdb75
/etc/gabtab:
/sbin/gabconfig -c -n1
. Configure llt, gab by running "/etc/rc2.d/S70llt start" & "/etc/rc2.d/S92gab start".
. Verify llt, gab by "lltstat -l" & "gabcofnig -a".
. Append "2 sjcitdb75" onto /etc/llthosts on sjcitdb24, sjcitdb63.
. Add sjcitdb75 to the cluster by the following command lines on either sjcitdb24 or sjcitdb63:
haconf -makerw
hasys -add sjcitdb75
hasys -modify sjcitdb75 Limits Databases 1
hasys -modify sjcitdb75 SourceFile "./main.cf"
hagrp -modify ClusterService SystemList -add sjcitdb75 2
hagrp -modify ClusterService AutoStartList -add sjcitdb75
hares -modify mnic Device bge1 "10.112.155.144" bge2 "10.112.155.144" -sys sjcitdb75
hares -modify mnic RouteOptions "default 10.112.155.1 0" -sys sjcitdb75
hagrp -modify bopsmart SystemList -add sjcitdb75 2 # I didnot run it and I will run it several days later
haconf -dump -makero
. Verify the cluster by running "hastatus -sum".
. Copy the VCS configuration directory /etc/VRTSvcs/conf/config onto sjcitdb75.
. Start VCS on sjcitdb75 by running "hastart" on sjcitdb75.
. Verify the cluster by running "hastatus -sum", and switching "ClusterService" to the new node - sjcitdb75 ("hagrp -switch ClusterService -to sjcitdb75" ).
. If there are no issues, unfreeze the service groups which were frozen previously.
Operations
========
sjcitdb75# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
bge1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 10.112.155.144 netmask ffffff00 broadcast 10.112.155.255
ether 0:14:4f:56:f5:d6
sjcitdb75# snoop -d ce0
Using device /dev/ce (promiscuous mode)
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
^Csjcitdb75# snoop -d bge3
Using device /dev/bge (promiscuous mode)
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
? -> (broadcast) ETHER Type=CAFE (Unknown), size = 70 bytes
^Csjcitdb75# pwd
/
sjcitdb75# ls -al /etc/llttab
-rw-r--r-- 1 root other 109 Jan 19 00:07 /etc/llttab
sjcitdb75# cat /etc/llttab
set-node /etc/nodename
set-cluster 154
link bge3 /dev/bge:3 - ether - -
link ce0 /dev/ce:0 - ether - -
start
sjcitdb75# cat /etc/llthosts
0 sjcitdb63
1 sjcitdb24
2 sjcitdb75
sjcitdb75# gabconfig -U; lltconfig -U
GAB:gabconfig:25000: open failed : LLT not configured
lltconfig: this will attempt to stop and reset LLT. Confirm (y/n)? y
sjcitdb75# cat /etc/gabtab
/sbin/gabconfig -c -n1
sjcitdb75# /etc/rc2.d/S70llt start
Starting LLT...
Starting LLT done.
sjcitdb75# lltstat -l
LLT link information:
Link Tag State Type Pri SAP MTU Addrlen
Xmit Recv Err LateHB
Broadcast
0 bge3 on ether hipri 0xCAFE 1500 6
0 11 0 0
FF:FF:FF:FF:FF:FF
1 ce0 on ether hipri 0xCAFE 1500 6
0 33 0 0
FF:FF:FF:FF:FF:FF
sjcitdb75# /etc/rc2.d/S92gab start
Start GAB
sjcitdb75# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 36f7a413 membership 012
sjcitdb75# gabconfig -al
GAB Port Memberships
===============================================================
Port a gen 36f7a413 membership 012
sjcitdb75# ps -ef | grep ha
root 11425 11383 0 23:11:18 pts/1 0:00 grep ha
sjcitdb75# pwd
/
sjcitdb75# cd /etc/VRTSvcs/conf
sjcitdb75# ls -al
total 24
drwxrwxr-x 5 root sys 512 Nov 29 11:43 .
drwxrwxr-x 5 root sys 512 Nov 29 11:42 ..
-rw-rw-r-- 1 root sys 1970 Aug 13 2001 OracleTypes.cf
drwxr-xr-x 2 root other 512 Jan 8 03:27 config
drwxrwxr-x 2 root sys 512 Nov 29 11:43 sample_nfs
drwxrwxr-x 2 root sys 512 Nov 29 11:43 sample_oracle
-r--r--r-- 1 root sys 4666 May 18 2006 types.cf
sjcitdb75# mv config config.20070130
sjcitdb75# ssh sjcitdb24 "cd /etc/VRTSvcs/conf; tar cf - config" | tar xf -
root@sjcitdb24's password:
sjcitdb75# pwd
/etc/VRTSvcs/conf
sjcitdb75# ls -al
total 26
drwxrwxr-x 6 root sys 512 Jan 29 23:24 .
drwxrwxr-x 5 root sys 512 Nov 29 11:42 ..
-rw-rw-r-- 1 root sys 1970 Aug 13 2001 OracleTypes.cf
drwxr-xr-x 2 root other 1024 Jan 29 23:18 config
drwxr-xr-x 2 root other 512 Jan 8 03:27 config.20070130
drwxrwxr-x 2 root sys 512 Nov 29 11:43 sample_nfs
drwxrwxr-x 2 root sys 512 Nov 29 11:43 sample_oracle
-r--r--r-- 1 root sys 4666 May 18 2006 types.cf
sjcitdb75# cd config
sjcitdb75# ls -al
total 350
drwxr-xr-x 2 root other 1024 Jan 29 23:18 .
drwxrwxr-x 6 root sys 512 Jan 29 23:24 ..
-rw------- 1 root other 206 May 5 2003 CronjobsTypes.cf
-rwxr-xr-x 1 root other 177 Nov 13 2001 CronjobsTypes.cf.orig
-rw------- 1 root other 769 May 5 2003 OracleTypes.cf
-rw------- 1 root other 262 May 5 2003 SharePlexTypes.cf
-rw------- 1 root other 162 May 5 2003 SharePlexTypes.cf.previous
-rwxr-xr-x 1 root other 144 Nov 13 2001 SymlinkTypes.cf
-rw------- 2 root root 7829 Jan 29 23:18 main.cf
-rw------- 1 root other 425 Jan 10 2006 main.cf.01102006
-rw------- 1 root other 1144 Jan 10 2006 main.cf.10Jan2006.18:13:29
-rw------- 1 root other 1223 Jan 10 2006 main.cf.10Jan2006.18:15:13
-rw------- 1 root other 370 Jan 10 2006 main.cf.10Jan2006.18:25:30
-rw------- 1 root other 1223 Jan 10 2006 main.cf.10Jan2006.18:26:06
-rw------- 1 root other 370 Jan 10 2006 main.cf.10Jan2006.18:30:01
-rw------- 1 root other 1264 Jan 10 2006 main.cf.10Jan2006.18:31:39
-rw------- 1 root other 8136 Jan 10 2006 main.cf.10Jan2006.18:50:56
-rw------- 1 root other 8137 Jan 10 2006 main.cf.10Jan2006.18:57:36
-rw------- 1 root other 8125 Jan 11 2006 main.cf.11Jan2006.15:40:06
-rw------- 1 root other 7228 Jan 11 2006 main.cf.11Jan2006.16:06:44
-rw------- 1 root other 8125 Jan 11 2006 main.cf.11Jan2006.16:18:46
-rw------- 1 root other 7228 Jan 11 2006 main.cf.11Jan2006.16:33:25
-rw------- 1 root other 7469 Jan 11 2006 main.cf.11Jan2006.16:40:42
-rw------- 1 root other 7620 Jan 11 2006 main.cf.11Jan2006.17:11:05
-rw------- 1 root other 7228 Jan 11 2006 main.cf.11Jan2006.17:13:31
-rw------- 2 root other 7620 Jan 11 2006 main.cf.11Jan2006.17:17:02
-rw------- 2 root root 7829 Jan 29 23:18 main.cf.29Jan2007.23:18:49
-rw------- 2 root other 7620 Jan 11 2006 main.cf.previous
-rw------- 1 root other 47242 Jan 18 19:32 main.cmd
-rw------- 1 root other 4593 Jan 10 2006 types.cf
sjcitdb75# hastart
sjcitdb75# Jan 29 23:26:08 sjcitdb75 had[11440]: [ID 631877 user.alert] VCS:10080:System (sjcitdb75) - Membership: 0x7, Jeopardy: 0x0
Jan 29 23:26:08 sjcitdb75 had[11440]: [ID 592265 user.alert] VCS:10455:Operation 'hasys -modify(0x905)' rejected. Sysstate=CURRENT_DISCOVER_WAIT ,Channel=BCAST,Flags=0x40000
Jan 29 23:26:08 sjcitdb75 had[11440]: [ID 672758 user.alert] VCS:10455:Operation 'hasys -modify(0x905)' rejected. Sysstate=REMOTE_BUILD (PRE),Channel=BCAST,Flags=0x40000
Jan 29 23:26:08 sjcitdb75 last message repeated 1 time
sjcitdb75# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A sjcitdb24 RUNNING 0
A sjcitdb63 RUNNING 0
A sjcitdb75 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService sjcitdb24 Y N OFFLINE
B ClusterService sjcitdb63 Y N OFFLINE
B ClusterService sjcitdb75 Y N ONLINE
B bopsmart sjcitdb24 Y N ONLINE
B bopsmart sjcitdb63 Y N OFFLINE
-- GROUPS FROZEN
-- Group
G bopsmart
G bopsmart
-- RESOURCES DISABLED
-- Group Type Resource
H bopsmart Cronjobs bopsmart_oracron
H bopsmart DiskGroup bopsmart_dg
H bopsmart IPMultiNIC bopsmart_ip
H bopsmart Mount bopsmart_archive
H bopsmart Mount bopsmart_data1
H bopsmart Mount bopsmart_ebay
H bopsmart Mount bopsmart_home
H bopsmart Mount bopsmart_redo
H bopsmart Oracle bopsmart_oracle
H bopsmart Proxy bopsmart_proxy
H bopsmart Sqlnet bopsmart_lstn1
H bopsmart Sqlnet bopsmart_lstn2
H bopsmart Sqlnet bopsmart_lstn3
H bopsmart Sqlnet bopsmart_lstn4
sjcitdb75# hagrp -unfreeze bopsmart
sjcitdb75# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A sjcitdb24 RUNNING 0
A sjcitdb63 RUNNING 0
A sjcitdb75 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B ClusterService sjcitdb24 Y N OFFLINE
B ClusterService sjcitdb63 Y N OFFLINE
B ClusterService sjcitdb75 Y N ONLINE
B bopsmart sjcitdb24 Y N ONLINE
B bopsmart sjcitdb63 Y N OFFLINE
Logging
======
part of /var/VRTSvcs/log/engine_A.log
TAG_E 2007/01/29 23:03:11 VCS:50106:User root fired command: hagrp -freeze bopsmart
TAG_E 2007/01/29 23:06:08 VCS:10077:received new cluster membership
TAG_B 2007/01/29 23:06:08 VCS:10080:System (sjcitdb24) - Membership: 0x3, Jeopardy: 0x4
TAG_E 2007/01/29 23:09:06 VCS:50106:User root fired command: haconf -makerw
TAG_E 2007/01/29 23:09:34 VCS:50106:User root fired command: hasys -add sjcitdb75
TAG_E 2007/01/29 23:11:48 VCS:50106:User root fired command: hasys -add sjcitdb75
TAG_C 2007/01/29 23:11:48 VCS:10526:IpmHandle::recv peer exited errno 131
TAG_E 2007/01/29 23:16:51 VCS:50106:User root fired command: hasys -modify sjcitdb75 Limits Databases 1
TAG_E 2007/01/29 23:17:01 VCS:50106:User root fired command: hasys -modify sjcitdb75 SourceFile ./main.cf
TAG_E 2007/01/29 23:17:13 VCS:50106:User root fired command: hagrp -modify ... -add ClusterService SystemList sjcitdb75 2
TAG_E 2007/01/29 23:17:32 VCS:50106:User root fired command: hagrp -modify ... -add ClusterService AutoStartList sjcitdb75
TAG_E 2007/01/29 23:17:51 VCS:50106:User root fired command: hares -modify mnic Device bge1 10.112.155.144 bge2 10.112.155.144
sjcitdb75
TAG_E 2007/01/29 23:18:30 VCS:50106:User root fired command: hares -modify mnic RouteOptions default 10.112.155.1 0 sjcitdb75
TAG_E 2007/01/29 23:18:48 VCS:50106:User root fired command: haconf -dump -makero
TAG_E 2007/01/29 23:26:28 VCS:10077:received new cluster membership
TAG_B 2007/01/29 23:26:28 VCS:10080:System (sjcitdb24) - Membership: 0x7, Jeopardy: 0x0
TAG_D 2007/01/29 23:26:28 VCS:10322:System (Node '2') changed state from UNKNOWN to INITING
TAG_D 2007/01/29 23:26:28 VCS:10449:Group ClusterService autodisabled on node sjcitdb75 until it is probed
TAG_D 2007/01/29 23:26:28 VCS:10453:Node: 2 changed name from: 'sjcitdb75' to: 'sjcitdb75'
TAG_D 2007/01/29 23:26:28 VCS:10322:System sjcitdb75 (Node '2') changed state from UNKNOWN to INITING
TAG_D 2007/01/29 23:26:28 VCS:10322:System sjcitdb75 (Node '2') changed state from INITING to CURRENT_DISCOVER_WAIT
TAG_D 2007/01/29 23:26:28 VCS:10322:System sjcitdb75 (Node '2') changed state from CURRENT_DISCOVER_WAIT to REMOTE_BUILD
TAG_C 2007/01/29 23:26:28 VCS:10457:Checksums differ. System: sjcitdb75 has modification
date: Mon Jan 29 23:18:49 2007
System: sjcitdb75 is building configuration from system: sjcitdb63
which has modification date: Wed Jan 11 18:17:02 2006
TAG_D 2007/01/29 23:26:29 VCS:10322:System sjcitdb75 (Node '2') changed state from REMOTE_BUILD to RUNNING
TAG_E 2007/01/29 23:26:30 VCS:10304:Resource VRTSweb (Owner: unknown, Group: ClusterService) is offline on sjcitdb75 (First probe)
TAG_C 2007/01/29 23:26:32 (sjcitdb75) VCS:136003:IPMultiNIC:webip:monitor:MultiNICA mnic not probed. Will go online after probe succ
eeds
TAG_E 2007/01/29 23:27:33 VCS:10304:Resource webip (Owner: unknown, Group: ClusterService) is offline on sjcitdb75 (First probe)
TAG_D 2007/01/29 23:27:33 VCS:10438:Group ClusterService has been probed on system sjcitdb75
TAG_D 2007/01/29 23:27:33 VCS:10442:Initiating auto-start online of group ClusterService on system sjcitdb24
TAG_D 2007/01/29 23:31:03 VCS:10208:Initiating switch of group ClusterService from system sjcitdb24 to system sjcitdb75
TAG_D 2007/01/29 23:31:03 VCS:10300:Initiating Offline of Resource VRTSweb (Owner: unknown, Group: ClusterService) on System sjcitdb
24
TAG_C 2007/01/29 23:31:03 VCS:10526:IpmHandle::recv peer exited errno 131
TAG_C 2007/01/29 23:31:06 (sjcitdb24) VCS:144004rocess:VRTSweb:monitor:Open for /opt/VRTSvcs/bin/haweb failed, setting cookie to n
ull
TAG_E 2007/01/29 23:31:06 VCS:10305:Resource VRTSweb (Owner: unknown, Group: ClusterService) is offline on sjcitdb24 (VCS initiated)
TAG_D 2007/01/29 23:31:06 VCS:10300:Initiating Offline of Resource webip (Owner: unknown, Group: ClusterService) on System sjcitdb24
TAG_E 2007/01/29 23:31:10 VCS:10305:Resource webip (Owner: unknown, Group: ClusterService) is offline on sjcitdb24 (VCS initiated)
TAG_D 2007/01/29 23:31:10 VCS:10446:Group ClusterService is offline on system sjcitdb24
TAG_D 2007/01/29 23:31:10 VCS:10301:Initiating Online of Resource webip (Owner: unknown, Group: ClusterService) on System sjcitdb75
TAG_E 2007/01/29 23:31:15 VCS:10298:Resource webip (Owner: unknown, Group: ClusterService) is online on sjcitdb75 (VCS initiated)
TAG_D 2007/01/29 23:31:15 VCS:10301:Initiating Online of Resource VRTSweb (Owner: unknown, Group: ClusterService) on System sjcitdb7
5
TAG_E 2007/01/29 23:31:16 (sjcitdb75) VCS:13001:Resource(VRTSweb): Output of the completed operation (online)
2007-01-29 11:30:56 - ContextManager: Adding context Ctx( /gcm )
2007-01-29 11:30:56 - ContextManager: Adding context Ctx( /vcs )
2007-01-29 11:30:56 - ContextManager: Adding context Ctx( )
TAG_E 2007/01/29 23:31:16 VCS:10298:Resource VRTSweb (Owner: unknown, Group: ClusterService) is online on sjcitdb75 (VCS initiated)
TAG_D 2007/01/29 23:31:16 VCS:10447:Group ClusterService is online on system sjcitdb75
TAG_D 2007/01/29 23:31:16 VCS:10448:Group ClusterService failed over to system sjcitdb75
TAG_E 2007/01/29 23:31:16 (sjcitdb75) VCS:15002:hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/nfs_restart ClusterService
successfully
[ 本帖最后由 chinaux 于 2007-1-30 19:29 编辑 ] |
|