论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2005-05-17 17:02 |只看该作者 |倒序浏览

人行的人民币交易支付系统，ReliantHA做的双机，系统使用一段时间没有什么问题，但现在会出现不定期的shutdown！现象为：运行一段时间后SYSB自动shutdown了，再停几天SYSA也Shutdown了！
大概会是什么原因啊？谢谢了！

文库|博客

lw371

家境小康

论坛徽章:: 0

2楼 [报告]

发表于 2005-05-17 17:26 |只看该作者

ReliantHA经常无故重新启动的问题！

看看两个机器的心跳连接线是否有问题了
这个在双机日志里面应当有记录的

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

冷月无声

丰衣足食

论坛徽章:: 0

3楼 [报告]

发表于 2005-05-23 22:23 |只看该作者

ReliantHA经常无故重新启动的问题！

心跳连接线到是没什么问题，系统shutdown的时候提示为：
GAB:port h halting system！
察看了一下系统的配置，发现swap和/tmp以及/var/tmp都很小；
在SCO上面看到可能和这个有关，
If the system is swapping excessively then this could cause enough latency at the heartbeat communication layer for a heartbeat to be missed and so a node be killed with a gab halt. Use the standard system tools "sar" and "rtpm" to monitor for swapping behaviour.

            In addition:

            Check /etc/conf/cf.d/stune for tuning that may conflict with the
            shared message queues that ReliantHA needs to operate such as:

               MSGSEG
               STRTHRESH

            Both of these values should be set to the default operating
            system values even if database vendors such as Oracle say that
            these values need to be set.

实验了一下好像还是不行啊！做到mout的时候还可以，但是到了执行Process_Online的时候总是shutdown！

系统初安装的时候很正常！

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

冷月无声

丰衣足食

论坛徽章:: 0

4楼 [报告]

发表于 2005-05-24 13:24 |只看该作者

ReliantHA经常无故重新启动的问题！

up！
各位帮忙给看看啊！

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

answer

荣誉版主

论坛徽章:: 1

5楼 [报告]

发表于 2005-05-25 23:07 |只看该作者

ReliantHA经常无故重新启动的问题！

原帖由 "冷月无声" 发表：
心跳连接线到是没什么问题，系统shutdown的时候提示为：
GAB:port h halting system！
察看了一下系统的配置，发现swap和/tmp以及/var/tmp都很小；
在SCO上面看到可能和这个有关，
If the system is swapping e..........

数据库一般需要比较大的swap
如果你现在的空间小可以用swap -a新加一快swap空间
如果是/tmp一般在500MB就可以了
如果小，可以修改环境变量来增加可用空间
引起GAB:port h halting system！
的原因一般有很多种，这个不好说
下面这篇文章是关于GAB:port h的一些说明
http://wdb1.sco.com/kb/showta?taid=116483&qid=765835106&sid=1891949006&pgnum=1
可以参考一下，另外看看系统日志里有什么报错

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

lijizheng

白手起家

论坛徽章:: 0

6楼 [报告]

发表于 2005-06-06 11:21 |只看该作者

ReliantHA经常无故重新启动的问题！

我曾经也遇到过此问题,解决方法是:
在2节点上删除全部网卡,重启,重新添加网卡,配置IP等,再重启.故障再没发生过.
但你的问题不知这样是否能解决.另外,你可试着在你.hl文件中删除PMOUNT,Nw,Process等命令,然后再一条条添加命令来试,看问题出在哪一步,再进一步解决.

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

lijizheng

白手起家

论坛徽章:: 0

7楼 [报告]

发表于 2005-06-21 10:28 |只看该作者

ReliantHA经常无故重新启动的问题！

I get an error, "GAB: Port h halting system" when using UnixWare 7 ReliantHa.

Problem
I have installed ReliantHA and when I run "hvstart" after a few seconds one or more servers shutdown displaying the message:
"GAB: Port h halting system".

and/or:

"System has halted and may be powered off (Press any key to reboot)."

This is a generic ReliantHA error message indicating that a ReliantHA node has been shutdown for some reason, often due to a communications failure of some kind.

Solution
解决方案：

Use the following tools to help diagnose the problem after first re-booting the servers in the cluster.
先将集群内服务器重启，再使用下列工具诊断问题，

1. Disconnect the public network and ping SYSA and ping SYSB. NOTE: These are the private network names that ReliantHA uses and are case sensitive.
１．断开公网，ping SYSA 和ping SYSB. 注意：这些是ReliantHA使用的内网名，大小写敏感。

2. Make sure when ReliantHA was configured with "mkcluster" that the external uname (or public name) was used for the name of the nodes and NOT SYSA or SYSB.
２．请确认当ReliantHA配置为"mkcluster"，使用结点的外部名（或公共名）而非SYSA或SYSB.
２．请确认在配置ReliantHA时"mkcluster"命令中使用的是结点外部名（或公共名）而非SYSA或SYSB.

3. Check the Release Notes of ReliantHA to look at the S99gab script's timeout values.
３．检查ReliantHA的版本说明查看S99gab脚本的超时值。

            These release notes are located at:
这些版本说明在：

            http://www.sco.com/products/clustering/notes/harelnot.html

4. Check the output from /usr/opt/reliant/log for any errors.
４．在/usr/opt/reliant/log 中差错

            This is a directory, most useful is the switchlog file.
这是一个目录，最有用的是switchlog文件。

            NOTE: It is normal to see errors such as:
看到下列错误是正常的：

            dynamic linker: commds: warning: copy relocation size mismatch
            for symbol svc_fdset
动态链结：　命令：　警告：svc_fdset符号　拷贝位置大小不匹配

5. If using Compaq Network Interface Cards (NIC) Netflex3 series, consider using the OU8 eeE8 (DDI

driver rather than Compaq's own "n100c" driver. This is because these cards are rebadged Intel Pro100B cards.
５．如果使用Compaq Network Interface Cards (NIC) Netflex3系列，用OU8 eeE8 (DDI

驱动而非康柏自己的N100C驱动。因为这些卡是Intel Pro100B型的卡。

            The latest "nd" package is available from:
最新的”nd”包在：

            ftp://ftp.sco.com/pub/unixware7/drivers/storage
            ftp://ftp.sco.com/pub/openunix8/drivers/storage
            ftp://ftp.sco.com/pub/unixware7/713/

            If the Compaq Insight Manager agents are installed for NIC　monitoring then this would need to be removed.
如果NIC已安装康柏识别管理器（Compaq Insight Manager agents）其“管理”应该被禁。

            Basically, ensure that the NIC can support a programmable MAC　address and that cross-over cables are used to directly connect                the nodes on the Private LAN.
一般地，保证NIC支持可编程MAC地址并且使用交叉线直接连接局域网的结点。

6. Check the latest patches are installed for the operating system available from:
６．检查操作系统最新版本：
            ftp://ftp.sco.com/pub/<os>;

7. Check the output of "mswconfig -l", "llstat -a" and "/etc/mswtab" for any errors.
７．有差错否：mswconfig –l
　　llstat –a
　　/etc/mswtab

8. If no specific config files are defined then hvstart will use a simple default set of scripts for basic testing between the nodes.
８．如果未制定配置文件，hvstart将使用简单默认脚本集来进行结点间测试

9. Running "ipcs -a" should allocate a message queue once "hvstart" has run. You can also see the status of ReliantHA with "hvdisp -a".
９．运行ipcs –a将在hvstart运行时分配一个信息队列。你也可以通过hvdisp –a查看reliantHA的状态。

10. Use the "truss" command to examine the output of the "hvstart" command to get an indication of when the failure occurs:
１０.使用truss命令检查hvstart命令的输出，获悉故障何时发生的：

            truss -f -o /hvstart.truss hvstart

11. If the system is swapping excessively then this could cause enough latency at the heartbeat communication layer for a heartbeat to be missed and so a node be killed with a gab halt. Use the standard system tools "sar" and "rtpm" to monitor for swapping behaviour.
１１．如果系统过度交换，将造成心跳（heartbeat）流通层的延迟，引起一个心跳被错过，一个结点被误“杀”。请使用标准系统工具"sar" and "rtpm"管理交换行为。

            In addition:另外：
Check /etc/conf/cf.d/stune for tuning that may conflict with the
            shared message queues that ReliantHA needs to operate such as:
检查/etc/conf/cf.d/stune以调整与（reliantHA要对其操作的）共享信息队列的冲突，例如：

               MSGSEG
               STRTHRESH

            Both of these values should be set to the default operating
            system values even if database vendors such as Oracle say that
            these values need to be set.
上两个值应该被设为默认操作系统值，即使数据库发行商如ORACLE说这些值该被设定

NOTE: MSGSSZ, MSGMNB and MSGTQL should be tuned from their default values to at least 524288, 65536 and 1000 respectively (add any further application related tuning to these values).
NOTE: The minimum requirement for ReliantHA is 2 private LAN connections.
注意：MSGSSZ, MSGMNB， MSGTQL应该分别被设为其默认值，即至少524288, 65536，1000　（还可对这些值进行应用程序相关的调整――如加一些值）

NOTE: Instead of a "real" NIC you could also use a (null modem) serial cable as the second interface.
注意：除了用“真实”NIC，你也可用（空MODEM）串行线作为第二接口。

               For Unisys: CBL6099-10M Null Modem Cable
　对UNISYS：CBL6099-10M Null空MODEM线
               For Compaq/HP: BC29Q-02M Null Modem Cable
　对COMPAQ/HP: BC29Q-02M Null Modem Cable

NOTE: In general, note that should a node fail if shared memory or disk buffering is used then this data will be lost when the second node takes over. This is important for databases that use this technology. Ensure that RAID controllers are configured to WRITE-THRU and not cached.
注意：通常，当一个结点有故障，如果使用共享内存或磁盘缓冲区，第二个结点接管时数据都被丢弃。此技术对数据库很重要，保证RAID控制器被配置为WRITE-THRU而非缓存。

NOTE: When you run "hvstart" manually, you will need to hit RETURN to return to the prompt.
注意：当手动运行“hvstart“，你要单击回车键回到命令行界面。

NOTE: With ReliantHA 1.1.3a a new option "gabconfig" option was added called -P.
注：对ReliantHA 1.1.3a，添加了新的gabconfig选项：－ｐ。

               The -P option was added as a standalone "debug" option for use
               after the gab driver is already configured which will generate
               a PANIC should "gab" halt.  By default it is turned off.  To
               turn it on set the value to -P 1.
－ｐ选项作为一个独立的调试选项，在gab驱动被配置为若产生PANIC就gab停。默认值是关，若要开，设置为－ｐ１．

               It is not recommended to use this feature within /etc/rc2.d. 　不推荐在/etc/rc2.d中使用此功能

               Create an S92gab file in /etc/init.d to execute this
               command at the end of the reboot, after entering multiuser
               mode in the following format:
　可在/etc/init.d新建一个S92gab文件执行此命令，这些应在重启，并进入多用户模式后，如下：

               /sbin/gabconfig -S 4000 -c
               /sbin/gabconfig -P 1

               Also add -D 63 to the previous line for more debug as:
　也可在前一行加-D 63获得更多调试功能：

               /sbin/gabconfig -S 4000 -c -D 63
               /sbin/gabconfig -P 1

NOTE: When replacing a private NIC, first remove the mswtab and clustertab, then recreate them again after the new card is installed.
NOTE: For RHA 1.1.4, please also run "rdu" for the Reliant Diags Utility.
注意：在替换一个私有NIC时，先删除mswtab和clustertab（群标签），在新卡安装后在重建他们。对于RHA1.1.4,还请运行“rdu”以获得Reliant Diags Utility。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

亢猫有悔

白手起家

论坛徽章:: 0

8楼 [报告]

发表于 2005-08-24 16:39 |只看该作者

ReliantHA经常无故重新启动的问题！

这种问题很简单的，只有两个可能性
如果是备机挂，就是心跳线问题。你可能用了不稳定的网线连接，或者其中一条心跳线为串口线。当发生串口阻塞的时候，系统就挂了。可以把串口换成网卡，这样一般都能解决。
如果是主机挂，通常是因为CPU负载太大，导致系统响应时间太慢。ReliantHA是老外设计出来的，比较教条+理想化，他们认为如果CPU IDEL时间在10%以下，那一定是系统出问题了，所以强制切换，呵呵
要解决的话，加CPU，或者减少一个数据库引擎，就可以搞定

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

返回列表

Chinaunix › 论坛 › 操作系统 › 其他UNIX › ReliantHA经常无故重新启动的问题！

[SCO UNIX] ReliantHA经常无故重新启动的问题！ [复制链接]

ReliantHA经常无故重新启动的问题！

ReliantHA经常无故重新启动的问题！

ReliantHA经常无故重新启动的问题！

ReliantHA经常无故重新启动的问题！

ReliantHA经常无故重新启动的问题！

ReliantHA经常无故重新启动的问题！

ReliantHA经常无故重新启动的问题！