这是第一次动手翻译一篇外文,看懂和翻懂是不一样的,你所见到的是 v3.0 版本…
感谢 依云
的科普和满满的批注,还有依云和传奇老师的最后的校正,以及,H 老师的文章分享~
- 信雅达
如果你发现本文有任何一处翻译不当的,欢迎指教,感谢感谢 (///▽///)
你所知的最简单的 Unix 命令是什么呢?
有
命令,用于将字符串打印到标准输出流,并以 o 为结束的命令。
- echo
在成堆的简单 Unix 命令中,也有
命令。如果你不带参数地运行
- yes
命令,你会得到一串无尽的被换行符分隔开的 y 字符流:
- yes
- y
- y
- y
- y
- (...你明白了吧)
一开始看似无意义的东西原来它是非常的有用:
- yes | sh糟心的安装.sh
你曾经有安装一个程序,需要你输入 "y" 并按下回车继续安装的经历吗?
命令就是你的救星。它会很好地履行安装程序继续执行的义务,而你可以继续观看 Pootie Tang. (一部歌舞喜剧)。
- yes
emmm,这是 BASIC 编写'yes'的一个基础版本:
- 10 PRINT "y"
- 20 GOTO 10
下面这个是用 Python 实现的编写'yes':
- while True:
- print("y")
看似很简单?不,执行速度没那么快! 事实证明,这个程序执行的速度非常慢。
- python yes.py | pv -r > /dev/null
- [4.17MiB/s]
和我 Mac 自带的版本执行速度相比:
- yes | pv -r > /dev/null
- [34.2MiB/s]
所以我重新写了一个执行速度更快的的 Rust 版本,这是我的第一次尝试:
- use std: :env;
- fn main() {
- let expletive = env: :args().nth(1).unwrap_or("y".into());
- loop {
- println ! ("{}", expletive);
- }
- }
解释一下:
字符串是第一个命令行的参数。
- expletive
这个词是我在
- expletive
书册里学会的;
- yes
给
- unwrap_or
传参,为了防止参数没有初始化,我们将
- expletive
作为默认值
- yes
方法将默认参数将从单个字符串转换为堆上的字符串
- into()
来,我们测试下效果:
- cargo run --release | pv -r > /dev/null
- Compiling yes v0.1.0
- Finished release [optimized] target(s) in 1.0 secs
- Running `target/release/yes`
- [2.35MiB/s]
emmm,速度上看上去并没有多大提升,它甚至比 Python 版本的运行速度更慢。这结果让我意外,于是我决定分析下用 C 实现的写入'yes'程序的源代码。
这是 C 语言的第一个版本 ,这是 Ken Thompson 在 1979 年 1 月 10 日 Unix 第七版里的 C 实现的编写'yes'程序:
- main(argc, argv)
- char **argv;
- {
- for (;;)
- printf("%s\n", argc>1? argv[1]: "y");
- }
这里没有魔法。
将它同 GitHub 上镜像的 GNU coreutils 的 128 行代码版 相比较,即使 25 年过去了,它依旧在发展更新。上一次的代码变动是在一年前,现在它执行速度快多啦:
- # brew install coreutils
- gyes | pv -r > /dev/null
- [854MiB/s]
最后,重头戏来了:
- /* Repeatedly output the buffer until there is a write error; then fail. */
- while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
- continue;
wow,让写入速度更快他们只是用了一个缓冲区。 常量
用来表明这个缓冲区的大小,根据不同的操作系统会选择不同的缓冲区大小【写入 / 读取】操作高效( 延伸阅读传送门 。我的系统的缓冲区大小是 1024 个字节,事实上,我用 8192 个字节能更高效。
- BUFSIZ
好,来看看我改进的 Rust 新版本:
- use std::io::{self, Write};
- const BUFSIZE: usize = 8192;
- fn main() {
- let expletive = env::args().nth(1).unwrap_or("y".into());
- let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
- loop {
- writeln!(writer, "{}", expletive).unwrap();
- }
- }
最关键的一点是,缓冲区的大小要是 4 的倍数以确保 内存对齐 。
现在运行速度是 51.3MiB/s ,比我系统默认的版本执行速度快多了,但仍然比 Ken Thompson 在 [高效的输入输出] (https://www.gnu.org/software/libc/manual/html_node/Controlling-Buffering.html) 文中说的 10.2GiB/s 慢。
再一次,Rust 社区没让我失望。
这篇文章刚发布到 Reddit 的 Rust 板块 , Reddit 的用户 nwydo 就提到了之前关于速率问题的 讨论 。这个是先前讨论人员的优化代码,它打破了我机子的 3GB/s 的速度:
- use std: :env;
- use std: :io: :{
- self,
- Write
- };
- use std: :process;
- use std: :borrow: :Cow;
- use std: :ffi: :OsString;
- pub const BUFFER_CAPACITY: usize = 64 * 1024;
- pub fn to_bytes(os_str: OsString) - >Vec < u8 > {
- use std: :os: :unix: :ffi: :OsStringExt;
- os_str.into_vec()
- }
- fn fill_up_buffer < 'a>(buffer: &'a mut[u8],
- output: &'a [u8]) -> &'a[u8] {
- if output.len() > buffer.len() / 2 {
- return output;
- }
- let mut buffer_size = output.len();
- buffer[..buffer_size].clone_from_slice(output);
- while buffer_size < buffer.len() / 2 {
- let(left, right) = buffer.split_at_mut(buffer_size);
- right[..buffer_size].clone_from_slice(left);
- buffer_size *= 2;
- }
- & buffer[..buffer_size]
- }
- fn write(output: &[u8]) {
- let stdout = io: :stdout();
- let mut locked = stdout.lock();
- let mut buffer = [0u8; BUFFER_CAPACITY];
- let filled = fill_up_buffer( & mut buffer, output);
- while locked.write_all(filled).is_ok() {}
- }
- fn main() {
- write( & env: :args_os().nth(1).map(to_bytes).map_or(Cow: :Borrowed( & b "y\n" [..], ), |mut arg | {
- arg.push(b '\n');
- Cow: :Owned(arg)
- },
- ));
- process: :exit(1);
- }
一个新的实现方式!
我唯一能做的事情就是 删除一个不必要的 mut 。
看似简单的 yes 程序其实没那么简单,它用了一个输出缓冲和内存对齐形式去提高性能。重新实现 Unix 工具很有意思,我很欣赏那些让电脑运行飞速的有趣的小技巧。
Unix Command
- yes
What's the simplest Unix command you know? There's
, which prints a string to stdout and
- echo
, which always terminates with an exit code of 0.
- true
Among the rows of simple Unix commands, there's also
. If you run it without arguments, you get an infinite stream of y's, separated by a newline:
- yes
- y
- y
- y
- y
- (...you get the idea)
What seems to be pointless in the beginning turns out to be pretty helpful :
- yes | sh boring_installation.sh
Ever installed a program, which required you to type "y" and hit enter to keep going?
to the rescue! It will carefully fulfill this duty, so you can keep watching Pootie Tang .
- yes
Here's a basic version in... uhm... BASIC.
- 10 PRINT "y"
- 20 GOTO 10
And here's the same thing in Python:
- while True:
- print("y")
Simple, eh? Not so quick! Turns out, that program is quite slow.
- python yes.py | pv -r > /dev/null
- [4.17MiB/s]
Compare that with the built-in version on my Mac:
yes | pv -r > /dev/null [34.2MiB/s] So I tried to write a quicker version in Rust. Here's my first attempt:
- use std: :env;
- fn main() {
- let expletive = env: :args().nth(1).unwrap_or("y".into());
- loop {
- println ! ("{}", expletive);
- }
- }
Some explanations:
Let's test it.
- cargo run --release | pv -r > /dev/null
- Compiling yes v0.1.0
- Finished release [optimized] target(s) in 1.0 secs
- Running `target/release/yes`
- [2.35MiB/s]
Whoops, that doesn't look any better. It's even slower than the Python version! That caught my attention, so I looked around for the source code of a C implementation.
Here's the very first version of the program , released with Version 7 Unix and famously authored by Ken Thompson on Jan 10, 1979:
- main(argc, argv)
- char **argv;
- {
- for (;;)
- printf("%s\n", argc>1? argv[1]: "y");
- }
No magic here.
Compare that to the 128-line-version from the GNU coreutils, which is mirrored on Github . After 25 years, it is still under active development! The last code change happened around a year ago . That's quite fast:
- # brew install coreutils
- gyes | pv -r > /dev/null
- [854MiB/s]
The important part is at the end:
- /* Repeatedly output the buffer until there is a write error; then fail. */
- while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
- continue;
Aha! So they simply use a buffer to make write operations faster. The buffer size is defined by a constant named
, which gets chosen on each system so as to make I/O efficient (see here ). On my system, that was defined as 1024 bytes. I actually had better performance with 8192 bytes.
- BUFSIZ
I've extended my Rust program:
- use std::env;
- use std::io::{self, BufWriter, Write};
- const BUFSIZE: usize = 8192;
- fn main() {
- let expletive = env::args().nth(1).unwrap_or("y".into());
- let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
- loop {
- writeln!(writer, "{}", expletive).unwrap();
- }
- }
The important part is, that the buffer size is a multiple of four, to ensure memory alignment .
Running that gave me 51.3MiB/s. Faster than the version, which comes with my system, but still way slower than the results from this Reddit post that I found, where the author talks about 10.2GiB/s.
####Update
Once again, the Rust community did not disappoint. As soon as this post hit the Rust subreddit , user nwydo pointed out a previous discussion on the same topic. Here's their optimized code, that breaks the 3GB/s mark on my machine:
- use std: :env;
- use std: :io: :{
- self,
- Write
- };
- use std: :process;
- use std: :borrow: :Cow;
- use std: :ffi: :OsString;
- pub const BUFFER_CAPACITY: usize = 64 * 1024;
- pub fn to_bytes(os_str: OsString) - >Vec < u8 > {
- use std: :os: :unix: :ffi: :OsStringExt;
- os_str.into_vec()
- }
- fn fill_up_buffer < 'a>(buffer: &'a mut[u8],
- output: &'a [u8]) -> &'a[u8] {
- if output.len() > buffer.len() / 2 {
- return output;
- }
- let mut buffer_size = output.len();
- buffer[..buffer_size].clone_from_slice(output);
- while buffer_size < buffer.len() / 2 {
- let(left, right) = buffer.split_at_mut(buffer_size);
- right[..buffer_size].clone_from_slice(left);
- buffer_size *= 2;
- }
- & buffer[..buffer_size]
- }
- fn write(output: &[u8]) {
- let stdout = io: :stdout();
- let mut locked = stdout.lock();
- let mut buffer = [0u8; BUFFER_CAPACITY];
- let filled = fill_up_buffer( & mut buffer, output);
- while locked.write_all(filled).is_ok() {}
- }
- fn main() {
- write( & env: :args_os().nth(1).map(to_bytes).map_or(Cow: :Borrowed( & b "y\n" [..], ), |mut arg | {
- arg.push(b '\n');
- Cow: :Owned(arg)
- },
- ));
- process: :exit(1);
- }
Now that's a whole different ballgame!
and
- std::ffi::OsString
to avoid unnecessary allocations.
- std::borrow::Cow
The only thing, that I could contribute was removing an unnecessary
.
- mut
来源: https://juejin.im/post/5a3133b86fb9a0451171214c