Failover simulation is an important part of configuration testing. Failover can be tested in several ways:
1. Removing the power from the cluster node
2. Generating a STOP error or blue screen
3. Removing ALL connections to the server to be tested. This means removing all cables from network interface cards (NICs) and host bus adapters (fibre or SCSI). Public and private NICs must be removed simultaneously. All fibre or SCSI cables must be removed simultaneously. This leaves no connectivity to the server to be tested. This will simulate a server leaving a cluster. 注意这里提到的是同时拔掉所有线,这一点很难做到。
实际上看到上面官方的说法,在实际情况下也是很难模拟的。
那么官方对于system down的情况下,cluster会出现的情况是怎么解释的呢?当正在接管应用的节点A出现system down,并且autofailover参数为true(默认),那么节点B会接管应用。
2. 应用在节点a上,将节点A的的所有网卡,HBA卡,拔掉(最后拔心跳)。 节点B会接管应用。
请注意这里提到了最后拔掉心跳,资源先出现故障,由于有心跳在,所以B节点会知道A节点资源的fault,B节点会接管应用。