了解 I/O 的特点对于优化系统性能非常重要, I/O 是顺序的还是随机的, 是读操作还是写操作, 读写的比例是多少, I/O 数据块的大小, 这些都是影响性能的关键因素. 很多存储设备都基于特定的 I/O 模式做过调校, 通用的测试工具跑分都相当漂亮, 然而一到实际环境区别就来了, 同样的应用环境下, 不同的设备表现可能天差地别. 我就见过不同厂商的设备, 档次差不多, 测试跑分高的那个在生产环境下的 IO 响应速度却慢了十倍. 所以跑分高的设备真的不一定适合你的应用.
如果能够模拟出应用的 I/O 模式, 那么在问题复现, 乃至设备选型等方面都会有很大帮助. 在此之前, 了解 I/O 模式是第一步, 这并不容易, 像 iostat 之类的工具只能看到平均值, 然而应用系统的 I/O 请求有可能是波浪式的, 一秒之内也可以时高时低, I/O 延迟可能平均值不高但是波动很大, 而且 I/O 块大小也可以是变化的, 尤其现在大数据应用的块大小可能在很大的范围内变化, 与过去常见的交易型数据库有所不同, 它们的块大小基本是固定的.
要剖析生产系统的 I/O 模式, 好像没有现成的工具. 但是我们可以利用 blktrace 自己做一个, blktrace 在内核的 block layer 记录每一个 I/O, 提供了分析的素材. 它记录的格式如下:
下面是一个简化版的示例, 主要利用了 Event "Q" 和 "C", 分别表示 IO 开始和 IO 完成, 两者之间的耗时就相当于 iostat 看到的 await, 但 blktrace 可以精确到单个 IO:
- #!/bin/bash
- if [ $# -ne 1 ]; then
- echo "Usage: $0 <block_device_name>"
- exit
- fi
- if [ ! -b $1 ]; then
- echo "could not find block device $1"
- exit
- fi
- duration=10
- echo "running blktrace for $duration seconds to collect data..."
- timeout $duration blktrace -d $1>/dev/null 2>&1
- DEVNAME=`basename $1`
- echo "parsing blktrace data..."
- blkparse -i $DEVNAME |sort -g -k8 -k10 -k4 |awk '
- BEGIN {
- total_read=0;
- total_write=0;
- maxwait_read=0;
- maxwait_write=0;
- }
- {
- if ($6=="Q") {
- queue_ts=$4;
- block=$8;
- nblock=$10;
- rw=$7;
- };
- if ($6=="C" && $8==block && $10==nblock && $7==rw) {
- await=$4-queue_ts;
- if (rw=="R") {
- if (await>maxwait_read) maxwait_read=await;
- total_read++;
- read_count_block[nblock]++;
- if (await>0.001) read_count1++;
- if (await>0.01) read_count10++;
- if (await>0.02) read_count20++;
- if (await>0.03) read_count30++;
- }
- if (rw=="W") {
- if (await>maxwait_write) maxwait_write=await;
- total_write++;
- write_count_block[nblock]++;
- if (await>0.001) write_count1++;
- if (await>0.01) write_count10++;
- if (await>0.02) write_count20++;
- if (await>0.03) write_count30++;
- }
- }
- } END {
- printf("========\nsummary:\n========\n");
- printf("total number of reads: %d\n", total_read);
- printf("total number of writes: %d\n", total_write);
- printf("slowest read : %.6f second\n", maxwait_read);
- printf("slowest write: %.6f second\n", maxwait_write);
- printf("reads\n> 1ms: %d\n>10ms: %d\n>20ms: %d\n>30ms: %d\n", read_count1, read_count10, read_count20, read_count30);
- printf("writes\n> 1ms: %d\n>10ms: %d\n>20ms: %d\n>30ms: %d\n", write_count1, write_count10, write_count20, write_count30);
- printf("\nblock size:%16s\n","Read Count");
- for (i in read_count_block)
- printf("%10d:%16d\n", i, read_count_block[i]);
- printf("\nblock size:%16s\n","Write Count");
- for (i in write_count_block)
- printf("%10d:%16d\n", i, write_count_block[i]);
- }'#!/bin/bash if [ $# -ne 1 ]; then echo"Usage: $0 "exit fi if [ ! -b $1 ]; then echo"could not find block device $1"exit fi duration=10 echo"running blktrace for $duration seconds to collect data..."timeout $duration blktrace -d $1>/dev/null 2>&1 DEVNAME=`basename $1` echo"parsing blktrace data..."blkparse -i $DEVNAME |sort -g -k8 -k10 -k4 |awk' BEGIN { total_read=0; total_write=0; maxwait_read=0; maxwait_write=0; } { if ($6=="Q") { queue_ts=$4; block=$8; nblock=$10; rw=$7; }; if ($6=="C" && $8==block && $10==nblock && $7==rw) { await=$4-queue_ts; if (rw=="R") { if (await>maxwait_read) maxwait_read=await; total_read++; read_count_block[nblock]++; if (await>0.001) read_count1++; if (await>0.01) read_count10++; if (await>0.02) read_count20++; if (await>0.03) read_count30++; } if (rw=="W") { if (await>maxwait_write) maxwait_write=await; total_write++; write_count_block[nblock]++; if (await>0.001) write_count1++; if (await>0.01) write_count10++; if (await>0.02) write_count20++; if (await>0.03) write_count30++; } } } END { printf("========\nsummary:\n========\n"); printf("total number of reads: %d\n", total_read); printf("total number of writes: %d\n", total_write); printf("slowest read : %.6f second\n", maxwait_read); printf("slowest write: %.6f second\n", maxwait_write); printf("reads\n> 1ms: %d\n>10ms: %d\n>20ms: %d\n>30ms: %d\n", read_count1, read_count10, read_count20, read_count30); printf("writes\n> 1ms: %d\n>10ms: %d\n>20ms: %d\n>30ms: %d\n", write_count1, write_count10, write_count20, write_count30); printf("\nblock size:%16s\n","Read Count"); for (i in read_count_block) printf("%10d:%16d\n", i, read_count_block[i]); printf("\nblock size:%16s\n","Write Count"); for (i in write_count_block) printf("%10d:%16d\n", i, write_count_block[i]); }'
输出示例:
- ========
- summary:
- ========
- total number of reads: 1081513
- total number of writes: 0
- slowest read : 0.032560 second
- slowest write: 0.000000 second
- reads
- > 1ms: 18253
- >10ms: 17058
- >20ms: 17045
- >30ms: 780
- writes
- > 1ms: 0
- >10ms: 0
- >20ms: 0
- >30ms: 0
- block size: Read Count
- 256: 93756
- 248: 1538
- 64: 98084
- 56: 7475
- 8: 101218
- 48: 15889
- 240: 1637
- 232: 1651
- 224: 1942
- 40: 21693
- 216: 1811
- 32: 197893
- 208: 1907
- 24: 37787
- 128: 97382
- 16: 399850
- ======== summary: ======== total number of reads: 1081513 total number of writes: 0 slowest read : 0.032560 second slowest write: 0.000000 second reads> 1ms: 18253>10ms: 17058>20ms: 17045>30ms: 780 writes> 1ms: 0>10ms: 0>20ms: 0>30ms: 0 block size: Read Count 256: 93756 248: 1538 64: 98084 56: 7475 8: 101218 48: 15889 240: 1637 232: 1651 224: 1942 40: 21693 216: 1811 32: 197893 208: 1907 24: 37787 128: 97382 16: 399850
这个例子统计了 IO 的读 / 写数量, 最大延迟, 延迟的分布情况, 块大小及数量, 这些信息比 iostat 要具体得多, 有助于进一步了解系统的 IO 模式. blktrace 数据还有更多的利用空间等待你去发掘, 譬如还可以根据时间戳去统计每个毫秒内的 IO 数, 有助于更微观地了解 IO 请求数量的波动.
参考:
利用 BLKTRACE 分析 IO 性能 http://linuxperf.com/?p=161
转载自:
http://linuxperf.com/?cat=11
来源: http://www.bubuko.com/infodetail-3105397.html