Quantcast
Channel: SQLParty » OMSA
Viewing all articles
Browse latest Browse all 3

使用OMSA解决Linux下”Package power limit notification”问题

$
0
0

某日发现某台Linux主机(Dell R720, Redhat 6.2)反应异常缓慢,实际基本没有运行什么应用程序,什么原因呢?

查看下系统日志:
shell> vi /var/log/messages

Sep 24 06:17:20 ch13 kernel: CPU6: Package power limit notification (total events = 368993)
Sep 24 06:17:20 ch13 kernel: CPU3: Package power limit notification (total events = 370933)
Sep 24 06:17:20 ch13 kernel: CPU9: Package power limit notification (total events = 370893)
Sep 24 06:17:20 ch13 kernel: CPU10: Package power limit notification (total events = 370938)
Sep 24 06:17:20 ch13 kernel: CPU6: Package power limit normal
Sep 24 06:17:20 ch13 kernel: CPU0: Package power limit normal
Sep 24 06:17:20 ch13 kernel: CPU1: Package power limit normal
Sep 24 06:17:20 ch13 kernel: CPU7: Package power limit normal
Sep 24 06:17:20 ch13 kernel: CPU8: Package power limit normal
Sep 24 06:17:20 ch13 kernel: CPU2: Package power limit normal
Sep 24 06:17:20 ch13 kernel: CPU3: Package power limit normal

Sep 24 06:17:20 ch13 kernel: CPU3: Core power limit normal
Sep 24 06:17:20 ch13 kernel: CPU9: Core power limit normal
Sep 24 06:17:20 ch13 kernel: CPU4: Core power limit normal
Sep 24 06:17:20 ch13 kernel: CPU10: Core power limit normal
Sep 24 06:17:20 ch13 kernel: CPU5: Package power limit notification (total events = 371505)
Sep 24 06:17:20 ch13 kernel: CPU11: Package power limit notification (total events = 371497)
Sep 24 06:17:51 ch13 kernel: [Hardware Error]: Machine check events logged
Sep 24 06:17:51 ch13 mcelog: Processor 7 below trip temperature. Throttling disabled
Sep 24 06:17:51 ch13 mcelog: Processor 1 below trip temperature. Throttling disabled
Sep 24 06:17:51 ch13 mcelog: Processor 2 below trip temperature. Throttling disabled
Sep 24 06:17:51 ch13 mcelog: Processor 8 below trip temperature. Throttling disabled
Sep 24 06:17:51 ch13 mcelog: Processor 0 below trip temperature. Throttling disabled
Sep 24 06:17:51 ch13 mcelog: Processor 6 below trip temperature. Throttling disabled
Sep 24 06:17:51 ch13 mcelog: Processor 3 below trip temperature. Throttling disabled
Sep 24 06:17:51 ch13 mcelog: Processor 9 below trip temperature. Throttling disabled
Sep 24 06:17:51 ch13 mcelog: Processor 10 below trip temperature. Throttling disabled
Sep 24 06:17:51 ch13 mcelog: Processor 4 below trip temperature. Throttling disabled

查看下系统的大致状况:

shell> vmstat 1 10
procs ———–memory———- —swap– —–io—- –system– —–cpu—–
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 12171416 251176 2000188    0    0     0     1    1    1  1  0 99  0  0
 0  0      0 12171400 251176 2000188    0    0     0     0 15936  236  1  0 99  0  0
 0  0      0 12171408 251176 2000188    0    0     0     4 16208  259  0  0 100  0  0
 0  0      0 12171408 251176 2000188    0    0     0    12 15119  193  0  0 100  0  0
 0  0      0 12171408 251176 2000188    0    0     0     0 16047  237  0  0 100  0  0
 0  0      0 12171408 251176 2000188    0    0     0     0 14348  187  0  0 100  0  0
 0  0      0 12171408 251176 2000188    0    0     0     0 14977  239  0  0 99  0  0
 1  0      0 12171400 251176 2000188    0    0     0     0 16226  216  0  0 100  0  0
 0  0      0 12171268 251176 2000188    0    0     0     0 8233  277  1  3 96  0  0
 0  0      0 12171392 251176 2000188    0    0     0     0 16465  193  0  0 100  0  0

发现上下文切换异常的大,查看中断信息,发现有部分信息异常的大,表明系统中断量极大:

shell> watch -n1 “cat /proc/interrupts”
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11
 TRM:     613877     596324     616331     613775     614013     614117     618081     617479     618912     616294     616631     616726   Thermal event interrupts  

基于以上信息,可以大致推测在硬件层面有限制或者有故障,导致操作系统处理不过来,大量的上下文切换来应对。

Google了一番也没有找到具体的方案,有网友有类似的问题,给出原因说是BIOS中设置了处理器的性能优化以及电源的限制,需要重新设置BIOS中相关内容,重启即可。决定尝试一下,但是遇到了一个新问题:主机托管在远程的IDC机房,没有相应的远程技术支持服务,咋整呢?

Dell提供的完整的系统管理系统Dell OpenManage Server Administrator(OMSA),提供了远程Web界面设置BIOS的功能。我们可以通过它来设置BIOS的相关项,然后远程重启即可。

OMSA的安装”Linux下安装Dell OpenManage Server Administrator(OMSA)“。

推荐的BIOS设置方案见: Configuring Low-Latency Environments on Dell PowerEdge 12th Generation Servers

文章中的推荐方案如下:

bios_setting
这里我们更新System Profile Settings(系统配置文件设置)部分,参考最后一列的推荐设置。这里主要调整:

  • C1E 已禁用
  • C状态 已禁用
  • 监控器/MWait 已禁用

配置完成的结果如下,事实证明确实不再显示同类警告或错误信息,OK!

bios_mysetting

参考:

http://www.sulabs.net/?p=405

The post 使用OMSA解决Linux下”Package power limit notification”问题 appeared first on SQLParty.


Viewing all articles
Browse latest Browse all 3

Trending Articles