Problem
On your Oracle Linux 5 box, you are experiencing a system slowdown. The response time is very low. The system performance degrades to unacceptable limits.
You are executing vmstat to check the system state:
[root@wordpress ~]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 3 0 1433712 38344 382980 0 0 0 1416 1333 278 0 2 0 98 0
0 3 0 1433712 38344 382980 0 0 0 1196 1278 269 0 2 0 97 0
0 3 0 1433712 38344 382980 0 0 0 2132 1544 323 1 4 0 96 0
0 3 0 1433712 38344 382980 0 0 0 128 1019 257 0 1 0 99 0
0 2 0 1433712 38344 382980 0 0 0 2628 1522 336 0 3 0 98 0
0 3 0 1433712 38344 382980 0 0 0 148 1057 339 1 1 46 53 0
0 3 0 1433712 38344 382980 0 0 0 1108 1266 315 0 2 0 99 0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 3 0 1433712 38344 382980 0 0 0 1416 1333 278 0 2 0 98 0
0 3 0 1433712 38344 382980 0 0 0 1196 1278 269 0 2 0 97 0
0 3 0 1433712 38344 382980 0 0 0 2132 1544 323 1 4 0 96 0
0 3 0 1433712 38344 382980 0 0 0 128 1019 257 0 1 0 99 0
0 2 0 1433712 38344 382980 0 0 0 2628 1522 336 0 3 0 98 0
0 3 0 1433712 38344 382980 0 0 0 148 1057 339 1 1 46 53 0
0 3 0 1433712 38344 382980 0 0 0 1108 1266 315 0 2 0 99 0
There are excessive I/O waits (the wa column >90%) and something is writing to disk (the bo column).
Solution
The vmstat analysis show that there are excessive writes to disks. The iotop utility will help you easily identify the top writers. But if iotop is not available on your system, you might use the manual approach below to identify the top processes writing to disks.
The /proc/<PID>/io pseudo-file contains I/O statistics for a particular process identified by PID. For example, a process with PID=157 shows statistics below:
[root@wordpress ~]# cat /proc/157/io
rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 0
write_bytes: 57344
cancelled_write_bytes: 0
rchar: 0
wchar: 0
syscr: 0
syscw: 0
read_bytes: 0
write_bytes: 57344
cancelled_write_bytes: 0
So, if you could find all the processes with a high write_bytes statistics which yet keeps growing you would identify the top writers.
To find the top writers follow these steps:
1) Identify all the io pseudo-files for the all running processes:
[root@wordpress ~]# find /proc/ -name "io" | grep -v "task"
/proc/1/io
/proc/2/io
.
.
.
/proc/5327/io
/proc/5328/io
/proc/1/io
/proc/2/io
.
.
.
/proc/5327/io
/proc/5328/io
Note that the list is in a sorted order.
2) Next find the strings containing the write_bytes statistics, optionally eliminating 0 valued statistics:
[root@wordpress ~]# find /proc/ -name "io" | grep -v "task" | xargs grep "^write_bytes" | grep -v "write_bytes: 0"
grep: /proc/5336/io: No such file or directory
grep: /proc/5337/io: No such file or directory
/proc/1/io:write_bytes: 9453568
/proc/157/io:write_bytes: 57344
/proc/417/io:write_bytes: 33939456
.
.
.
/proc/3970/io:write_bytes: 12288
/proc/4914/io:write_bytes: 4096
/proc/4917/io:write_bytes: 348160
grep: /proc/5336/io: No such file or directory
grep: /proc/5337/io: No such file or directory
/proc/1/io:write_bytes: 9453568
/proc/157/io:write_bytes: 57344
/proc/417/io:write_bytes: 33939456
.
.
.
/proc/3970/io:write_bytes: 12288
/proc/4914/io:write_bytes: 4096
/proc/4917/io:write_bytes: 348160
Ignore any No such file or directory errors. There can be processes spawned and died between invocations of commands on the pipeline. Thus some inconsistencies are expected but can be merely ignored. The writing process will remain alive.
3) Take two different snapshots of the writing statistics. Use the above command and redirect the output to a file:
find /proc/ -name "io" | grep -v "task" | xargs grep "^write_bytes" | grep -v "write_bytes: 0" > snap1
Wait a few seconds, and take the second snapshot:
find /proc/ -name "io" | grep -v "task" | xargs grep "^write_bytes" | grep -v "write_bytes: 0" > snap2
4) Now using the diff utility, compare the two snapshots
[root@wordpress ~]# diff snap1 snap2
2c2
< /proc/157/io:write_bytes: 53248
---
> /proc/157/io:write_bytes: 57344
40c40
< /proc/3684/io:write_bytes: 196755456
---
> /proc/3684/io:write_bytes: 196771840
41a42
> /proc/3737/io:write_bytes: 4096
2c2
< /proc/157/io:write_bytes: 53248
---
> /proc/157/io:write_bytes: 57344
40c40
< /proc/3684/io:write_bytes: 196755456
---
> /proc/3684/io:write_bytes: 196771840
41a42
> /proc/3737/io:write_bytes: 4096
The comparison showed that the process PID=3684 wrote so far about 196MB and keeps writing.
Taking more snapshots and comparing them might provide more detailed information.
5) Having the PID for top writers identify the with the ps utility:
[root@wordpress ~]# ps -ef | grep 3684
root 3684 3318 3 10:32 ? 00:00:31 /usr/bin/python -tt /usr/libexec/yum-updatesd-helper --check --dbus
root 3925 3737 0 10:47 pts/1 00:00:00 grep 3684
root 3684 3318 3 10:32 ? 00:00:31 /usr/bin/python -tt /usr/libexec/yum-updatesd-helper --check --dbus
root 3925 3737 0 10:47 pts/1 00:00:00 grep 3684
No comments:
Post a Comment