论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2002-07-01 17:17 |只看该作者 |倒序浏览

/var/adm/message.0下面有下面记录：
n 29 22:32:44 main-db1 unix: LLT:10035: timer not called for 784 ticks
Jun 29 22:33:23 main-db1 unix: LLT:10035: timer not called for 117 ticks
Jun 29 22:33:24 main-db1 unix: LLT:10035: timer not called for 103 ticks
Jun 29 22:33:33 main-db1 unix: LLT:10035: timer not called for 110 ticks
Jun 29 22:33:38 main-db1 unix: LLT:10035: timer not called for 102 ticks
Jun 29 22:33:48 main-db1 unix: LLT:10035: timer not called for 103 ticks
Jun 29 22:34:15 main-db1 unix: LLT:10035: timer not called for 125 ticks
Jun 29 22:34:41 main-db1 unix: LLT:10035: timer not called for 142 ticks
Jun 29 23:43:30 main-db1 unix: LLT:10035: timer not called for 254 ticks
这个代表什么意思？

还有一个问题，我的vmstat 的log记录到，系统曾经有1S的时间，SYS_CPU达到95，这是从来没有出现过的。
但是Oracle的Log没有什么信息，这个可能是什么情况？
SQL>____select_*_from_stats$osstat_where_record_time_between_to_date(\'2002/06/29_22:40:00\',\'yyyy/mm/dd_hh24:mi:ss\')
__and__to_date(\'2002/06/29_22:45:00\',\'yyyy/mm/dd_hh24:mi:ss\')__2__;

HOST_NAME____________RECORD_TIME__________DISK_READ_DISK_WRITE__DISK_UTIL___USER_CPU____SYS_CPU___WAIT_CPU___IDLE_CPU
--------------------_-------------------_----------_----------_----------_----------_----------_----------_----------
main-db1_____________2002/06/29_22:43:55__________0__________0__________0__________0__________0__________0________100
main-db1_____________2002/06/29_22:43:55__________7__________2_________13__________3__________6__________0_________91
main-db1_____________2002/06/29_22:43:56__________1__________0_________.2__________1_________67__________0_________32
main-db1_____________2002/06/29_22:43:56__________0__________0__________0__________1_________95__________0__________4
main-db1_____________2002/06/29_22:43:56________148__________7_______52.8__________3_________43_________11_________43
main-db1_____________2002/06/29_22:43:56________249_________10_______88.4__________2_________10_________20_________68
main-db1_____________2002/06/29_22:43:57_________69__________6_______32.8__________1__________8__________5_________86

Fenng

家境小康

论坛徽章:: 0

2楼 [报告]

发表于 2002-07-01 18:41 |只看该作者

奇怪！
1 硬件问题？
2 随机数产生器错误？
3 安全问题？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

clu65

稍有积蓄

论坛徽章:: 0

3楼 [报告]

发表于 2002-07-01 20:47 |只看该作者

[Solaris]这个错误是什么意思？

I think this has something to do with the VCS(Veritas Cluster) software. Please check you veritas log files under /var/VRTSvcs/log to find out more information.

BTW, LLT and GAB are two components of VCS sitting in the kernel for Cluster heartbeat comm.

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

chao_ping

稍有积蓄

论坛徽章:: 0

4楼 [报告]

发表于 2002-07-02 09:55 |只看该作者

Re: [Solaris]这个错误是什么意思？

最初由 clu65 发布
[B]I think this has something to do with the VCS(Veritas Cluster) software. Please check you veritas log files under /var/VRTSvcs/log to find out more information.

BTW, LLT and GAB are two components of VCS sitting in the kernel for Cluster heartbeat comm. [/B]

你一说我倒是想起来了，LLT是veritas的东西。
我检查了相应的Log：
TAG_E 2002/06/29 22:35:03 VCS:10298:Resource ora_biddb (Owner: unknown Group: oracle) is online on main-db1 (VCS initiated)
TAG_C 2002/06/29 22:35:03 VCS:10447:Group oracle is online on system main-db1
TAG_C 2002/06/29 22:35:03 VCS:10448:Group oracle failed over to system main-db1
TAG_E 2002/06/30 21:50:18 (main-db1) VCS:13001:Output of the completed operation (monitor) on resource (oradg)
libthread panic: cannot create new lwp (PID: 17637 LWP 2)
不知道最后的lwp是什么意思？

还请教一个Veritas VCS的问题：
比如我两个节点的HA，只有一套Resource Group。现在的情况是，当第一个节点上面的Oracle 出现问题Down掉的时候 ,VCS就把他切换到第二个节点上面。
但是我想如果它能够直接尝试在第一个节点上重新启动一遍，如果成功，就启动，不成功，再切换到第二个节点，不知道能否实现？
Thanks～

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

clu65

稍有积蓄

论坛徽章:: 0

5楼 [报告]

发表于 2002-07-03 01:30 |只看该作者

Re: [Solaris]这个错误是什么意思？

LWP stands for Lightweight Process. Solaris use LWP to support running the multithread program.

The default behavior of VCS is to try to restart the Oracle on the same node in case it fails. So if it can not restart the Oracle Instance, then it will move to the second node.

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

chao_ping

稍有积蓄

论坛徽章:: 0

6楼 [报告]

发表于 2002-07-03 09:49 |只看该作者

Re: Re: [Solaris]这个错误是什么意思？

最初由 clu65 发布
[B]LWP stands for Lightweight Process. Solaris use LWP to support running the multithread program.

The default behavior of VCS is to try to restart the Oracle on the same node in case it fails. So if it can not restart the Oracle Instance, then it will move to the second node. [/B]

如果想要VCS在本地重新启动数据库的话，一般需要这边的resource的状态是正常的。但是非正常关闭之后，这个资源往往是处于ERROR状态。
需要手工clear 一下才能启动。
但是可能VCS不能自动执行这一步，如何做到在VCS的处理脚本里面自动执行呢？有没有这方面的经验？
Thanks