服务热线:4006-981-828
|
HP-UX巡检指导手册
1、主机系统外观检查查看主机、阵列面板上是否有warning指示灯呈黄色或者fault红色指示灯亮。2、GSP/MP检查通过串口或者终端登录。对于HP9000机器我们通常称为GSP,而安腾系列的小机我们称为MP卡。 常用SL命令来消除前面板告警灯和查看主机错误机器代码(MAC) PS命令显示系统温度、风扇以及电源是否为normal, 退出命令方式:ctrl + B 键,可以退出该命令操作。 可以通过help command 查看帮助。 3、系统硬件配置1) 序列号、型号Ø 序列号: 查看命令 getconf MACHINE_SERIAL 或者machinfo, 显示一串数字和字母的组合即为序列号。 # getconf MACHINE_SERIAL DEH45277K2 # machinfo CPU info: Number of CPUs = 2 Clock speed = 1300 MHz Bus speed = 400 MT/s CPUID registers Platform info: model string = "ia64 hp server rx4640" machine id number = dee93512-0d53-11da-8c97-3db64c3af6c3 machine serial number = DEH45277K2 Ø 查看型号: 查看命令Model –a # model 9000/800/rp4440 如果安装了ignite软件包使用 print_manifest 命令可以获取更加详细信息,如: # print_manifest |more System Information Your Hewlett-Packard computer has software installed and configured as follows. The system was created March 17, 2009, 04:06:27 EDT. It was created with Ignite-UX revision C.7.2.94. ------------------------------------------------------------- NOTE: You should retain this information for future reference. ------------------------------------------------------------- System Hardware Model: ia64 hp server rx4640 Main Memory: 16354 MB Processors: 2 Proccesor(0) Speed: 1299 MHz Proccesor(1) Speed: 1299 MHz OS mode: 64 bit LAN hardware ID: 0x00306E5D948C 显示主机型号为rx4640、内存12G、2个cpu。 2) 硬件配置、状态 所有硬件状态: #ioscan –fnk|more Class I H/W Path Driver S/W State H/W Type Description ============================================================================== root 0 root CLAIMED BUS_NEXUS cell 0 0 cell CLAIMED BUS_NEXUS ioa 0 0/0 sba CLAIMED BUS_NEXUS System Bus Adapter (805) ba 0 0/0/0 lba CLAIMED BUS_NEXUS Local PCI Bus Adapter (782) tty 0 0/0/0/0/0 asio0 CLAIMED INTERFACE PCI SimpleComm (103c1290) /dev/diag/mux0 /dev/mux0 /dev/tty0p0 tty 1 0/0/0/0/1 asio0 CLAIMED INTERFACE PCI Serial (103c1048) /dev/GSPdiag1 /dev/mux1 /dev/tty1p2 /dev/diag/mux1 /dev/tty1p0 /dev/tty1p4 lan 0 0/0/0/1/0 igelan CLAIMED INTERFACE HP A7109-60001 PCI 1000Base-T Core ext_bus 0 0/0/0/2/0 c8xx CLAIMED INTERFACE SCSI C1010 Ultra Wide Single-Ended target 0 0/0/0/2/0.6 tgt CLAIMED DEVICE disk 0 0/0/0/2/0.6.0 sdisk CLAIMED DEVICE HP 146 GST3146855LC /dev/dsk/c0t6d0 /dev/rdsk/c0t6d0 target 1 0/0/0/2/0.7 tgt CLAIMED DEVICE ctl 0 0/0/0/2/0.7.0 sctl CLAIMED DEVICE Initiator /dev/rscsi/c0t7d0 ext_bus 1 0/0/0/2/1 c8xx CLAIMED INTERFACE SCSI C1010 Ultra Wide Single-Ended target 2 0/0/0/2/1.2 tgt CLAIMED DEVICE disk 1 0/0/0/2/1.2.0 sdisk NO_HW DEVICE Optiarc DVD RW AD-5170A /dev/dsk/c1t2d0 /dev/rdsk/c1t2d0 target 3 0/0/0/2/1.7 tgt CLAIMED DEVICE ctl 1 0/0/0/2/1.7.0 sctl CLAIMED DEVICE Initiator /dev/rscsi/c1t7d0 ext_bus 2 0/0/0/3/0 c8xx CLAIMED INTERFACE SCSI C1010 Ultra Wide Single-Ended target 4 0/0/0/3/0.6 tgt CLAIMED DEVICE disk 16 0/0/0/3/0.6.0 sdisk CLAIMED DEVICE HP 146 GST3146854LC /dev/dsk/c2t6d0 /dev/rdsk/c2t6d0 target 5 0/0/0/3/0.7 tgt CLAIMED DEVICE ctl 2 0/0/0/3/0.7.0 sctl CLAIMED DEVICE Initiator /dev/rscsi/c2t7d0 ext_bus 3 0/0/0/3/1 c8xx CLAIMED INTERFACE SCSI C1010 Ultra160 Wide LVD target 6 0/0/0/3/1.7 tgt CLA 其中status列显示为CLAIMED则说明硬件正常,否则需要查找原因。 Ø 查看硬盘状态: #ioscan –fnCdisk|more 显示与上类似,查看status状态是否为CLAIMED,否则存在硬盘故障。 Ø 查看处理器状态: #ioscan –fnCprocessor|more 显示与上类似,查看status状态是否为CLAIMED,否则存在处理器故障。 3) 内存配置 Ø 内存容量 例,以root 用户登陆主机: #print_manifest |more 显示如下显示如下: System Hardware Model: 9000/800/rp3440 Main Memory: 12286 MB 4)处理器(mstm) Ø 以root 用户登陆主机: #mstm 在mstm的硬件列表中,用空格键选中所有的“CPU”,然后在主菜单上选择 [ Tools ] à [ Information ] à [ Run ] 正常情况下,运行的结果应该是“Successful “; 如果运行结果不是”Successful“,可以尝试多运行几次Information。 如果多次运行Information 的结果仍然不是“Successful”,则 硬件存在问题。 如果在information命令的输出结果中有LPMC多次出现,则需要进一步检查。 4、主机系统硬件运行状况1) 处理器运行状态Ø 以root 用户登陆主机: #vmstat 2 5(间隔2秒显示5次统计数据) procs memory page disk faults cpu r b w swap free re mf pi po fr de sr dd f0 s0 -- in sy cs us sy id 0 0 0 741272 201352 63 14 0 2 2 0 0 1 0 0 0 4294967241 100 247 5 17 78 0 0 0 733232 242800 0 3 0 0 0 0 0 0 0 0 0 306 26 59 0 0 100 0 0 0 733232 242800 0 0 0 0 0 0 0 0 0 0 0 302 76 54 0 0 100 0 0 0 733232 242800 0 0 0 0 0 0 0 0 0 0 0 304 20 52 0 0 100 0 0 0 733232 242800 0 0 0 0 0 0 0 0 0 0 0 304 16 54 0 0 100显示信息解释如下: r 指每秒钟增加到运行队列中的线程数 b 指每秒钟因等待资源或I/O而被添加到等待队列中的线程数 us 指CPU处在用户模式销的时间百分比,即用户使用的CPU时间 sy 指CPU处在系统模式下的时间百分比,即系统内核使用的CPU时间 id 指CPU空闲的时间百分比,即运行队列是空的 如果id一直是0,则说明CPU一直处于繁忙状态 ² 运行队列的大小是评估CPU性能的关键因素。当运行队列增大,用户的响应时间就会加大;如果r的值不为零,则说明CPU还有更多的工作要执行 ² 如果us+sy小于90%,单用户系统并不关注CPU的限制。如果在多用户系统上us+sy超过了80%,进程可能把时间花在运行队列中等待获得CPU资源上,因此响应时间和吞吐量就变得很小 ² 如果在多用户系统上us+sy接近100%,说明可能是CPU的限制 3) 网络状态 Ø 网卡状态检查: 例,以root 用户登陆主机: #ifconfig –a配置或显示 TCP/IP 网络的网络接口参数(各网卡) 输出判断: lo0: flags=1000849 inet 127.0.0.1 netmask ff000000 eri0: flags=1000843 inet 192.168.1.40 netmask ffffff00 broadcast 192.168.1.255 ether 0:3:ba:2b:76:4b 网卡的flag为UP属正常。同时检查ip地址和netmask是否正确。 Ø 查看路由表 以root 用户登陆主机: #netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ------ --------- 192.168.3.1 192.168.1.71 UGH 1 144 192.168.1.0 192.168.1.40 U 11819764 eri0 224.0.0.0 192.168.1.40 U 1 0 eri0 default 192.168.1.40 UG 1 0 127.0.0.1 127.0.0.1 UH 2 601020 lo0 Ø 网络的检测 以root 用户登陆主机 #ping [ip address] 用ping命令对/etc/hosts文件中的IP地址进行操作,检测网络是否联通 不通则出现time out指示。 # ping 172.16.1.86 PING 172.16.1.86: 64 byte packets 64 bytes from 172.16.1.86: icmp_seq=0. time=0. ms 64 bytes from 172.16.1.86: icmp_seq=1. time=0. ms 64 bytes from 172.16.1.86: icmp_seq=2. time=0. ms 4)PDC 固件(Firmware)版本PDC固件版本可以用以下方法获得Ø 在机器重起十秒中断时进入PDC界面。 Ø 运行pdcinfo命令,在PDC version一项 Ø 在cstm或mstm中看CPU设备的information,以cstm为例 #cstm cstm>map Dev Last Last Op Num Path Product Active Tool Status === ==================== ========================= =========== ============= 1 system system () Information Successful 2 0 Bus Adapter (582) Information Successful 3 0/0 PCI Bus Adapter (782) Information Successful 4 0/0/0/0 Core PCI 100BT Interface Information Successful 5 0/0/1/0 PCI SCSI Interface (10000 Information Successful 6 0/0/1/0.3.0 SCSI Tape (HPC1537A) Information Successful 7 0/0/1/1 PCI SCSI Interface (10000 Information Successful 8 0/0/1/1.15.0 SCSI Disk (SEAGATEST31840 Information Successful 9 0/0/2/0 PCI SCSI Interface (10000 Information Successful 10 0/0/2/0.3.0 SCSI Disk (HPDVD-ROM) Information Successful 11 0/0/2/1 PCI SCSI Interface (10000 Information Successful 12 0/0/2/1.15.0 SCSI Disk (HP36.4GATLAS10 Information Successful 13 0/0/4/0 RS-232 Interface (103c104 Information Successful 14 0/0/5/0 RS-232 Interface (103c104 Information Successful 15 8 MEMORY (9b) Information Successful 16 160 CPU (5cb) Information Successful cstm>select dev 16 cstm>info -- Updating Map -- Updating Map... cstm>il Hardware path: 160 Product ID: CPU Module Type: 0 Hardware Model: 0x5cb Software Model: 0x4 Hardware Revision: 0 Software Revision: 0 Hardware ID: 0 Software ID: 566770598 Boot ID: 0x1 Software Option: 0x91 Processor Number: 0 Path: 160 Hard Physical Address: 0xfffffffffffa0000 Soft Physical Address: 0 Slot Number: 8 Software Capability: 0x100000f0 PDC Firmware Revision: 41.18 IODC Revision: 0 Instruction Cache [Kbyte]: 512 Processor Speed: N/A Processor State: N/A Monarch: Yes Active: Yes Data Cache [Kbyte]: 1024 5) 风扇转动情况 Ø 确认系统各风扇的运行转动情况,一般有电源风扇,系统冷却风扇等 5、系统记录信息 System Log and Message1) Errpt Log中是否有必须处理的硬件故障纪录 Ø 查看日志文件 #more /var/adm/syslog/syslog.log 查看最近的日志中是否有 warning、error、IO error、scsi_reset、EMS、temperature等关键字错误。 6、磁盘阵列检查 1) autoraid12H存储巡检 #arraydsp -i 显示如下: Arrays known to the ARMServer: Array with S/N: 0000000E00CA # arraydsp -a 0000000E00CA|more 显示如下: Vendor ID = HP Product ID = C5447A Array serial number = 0000000E00CA ---------------------------------------------------- Array State = READY Server name = SMCP2 Array type = 3 Mfg. Product Code = IJMTU00004 --- Disk space usage -------------------- Total physical = 34733 MB * Serial number = 098800001039 Firmware revision = HP01 Drive ID number = FFFFFFFD10AF0C0F Volume set serial no. = 16A57 15 Total capacity of all installed physical disks = 34733 MB * ……………… ---------------------------------------------------- Overall State of Array = READY Array configuration: Active Hot Spare Desired = DISABLED Auto Include = ENABLED Auto Rebuild = ENABLED Rebuild Priority = HIGH ……………… ---------------------------------------------------- Fan F1 = GOOD Fan F2 = GOOD Fan F3 = GOOD Power supply PS1 = GOOD Power supply PS2 = GOOD Power supply PS3 = GOOD Controller X: Overall state = GOOD Battery #0 state = GOOD DRAM #0 state = GOOD NVRAM #0 state = GOOD NVRAM #1 state = GOOD Controller Y: Overall state = GOOD Battery #0 state = GOOD Battery #1 state = GOOD DRAM #0 state = GOOD NVRAM #0 state = GOOD NVRAM #1 state = GOOD 查看”Overall State of Array”是否为Ready,若不是,则继续往下查看是否有unknown,failed,有则需要报修,同时从磁盘阵列的前面板也可以发现黄色指示灯。
|