完整代码: https://paste.ubuntu.com/26257776/
- quit: 退出当前shell
- jobs: 列出所有后台运行的工作
- bg < job > :这个命令将会向 < job > 代表的工作发送SIGCONT信号并放在后台运行, < job > 可以是一个PID也可以是一个JID。
- fg < job > :这个命令会向 < job > 代表的工作发送SIGCONT信号并放在前台运行, < job > 可以是一个PID也可以是一个JID。
- +----------+
- | Bash |
- +----+-----+
- |
- +-----------------------------------------+
- | v |
- | +----+-----+ foreground |
- | | tsh | group |
- | +----+-----+ |
- | | |
- | +--------------------+ |
- | | | | |
- | v v v |
- | /bin/ls /bin/sleep xxxxx |
- | |
- | |
- +-----------------------------------------+
所以当我们在终端输入 ctrl-c (ctrl-z) 的时候,SIGINT (SIGTSTP) 信号应该被送给每一个在前台进程组中的所有进程,包括我们在 tsh 中认为是后台进程的程序。一个决绝的方法就是在 fork 之后 execve 之前,子进程应该调用 setpgid(0, 0) 使得它进入一个新的进程组(其 pgid 等于该进程的 pid)。tsh 接收到 SIGINT SIGTSTP 信号后应该将它们发送给 tsh 眼中正确的 "前台进程组"(包括其中的所有进程)。
我首先将书上(8.5.5 节)说的 6 个关于信号处理函数安全性的要求列出(详细的解释请参考书),在编程的时候要注意:
下面我就实验要求完成的 7 个函数说几个注意的地方,代码中的注释也解释了一些:
- /* Here are the functions that you will implement */
- void eval(char * cmdline);
- int builtin_cmd(char * *argv, char * cmdline);
- void do_bgfg(char * *argv, char * cmdline);
- void waitfg(pid_t pid);
- void sigchld_handler(int sig);
- void sigtstp_handler(int sig);
- void sigint_handler(int sig);
在调用 parseline 解析输出后,我们首先判断这是一个内置命令(shell 实现)还是一个程序(本地文件)。如果是内置命令,进入
,否则创建子进程并在 job 列表里完成添加。这里要注意在 fork 前用 access 判断是否存在这个文件,不然 fork 以后无法回收,另外要注意一个线程并行竞争(race)的问题:fork 以后会在 job 列表里添加 job,信号处理函数 sigchld_handler 回收进程后会在 job 列表中删除,如果信号来的比较早,那么就可能会发生先删除后添加的情况。这样这个 job 永远不会在列表中消失了(内存泄露),所以我们要先 blockSIGCHLD ,添加以后再还原。
- builtin_cmd(argv, cmdline)
- /*
- * eval - Evaluate the command line that the user has just typed in
- *
- * If the user has requested a built-in command (quit, jobs, bg or fg)
- * then execute it immediately. Otherwise, fork a child process and
- * run the job in the context of the child. If the job is running in
- * the foreground, wait for it to terminate and then return. Note:
- * each child process must have a unique process group ID so that our
- * background children don't receive SIGINT (SIGTSTP) from the kernel
- * when we type ctrl-c (ctrl-z) at the keyboard.
- */
- void eval(char * cmdline) {
- char * argv[MAXARGS];
- int bg_flag;
- bg_flag = parseline(cmdline, argv);
- /* true if the user has requested a BG job, false if the user has requested a FG job. */
- if (builtin_cmd(argv, cmdline))
- /* built-in command */
- {
- return;
- } else
- /* program (file) */
- {
- if (access(argv[0], F_OK))
- /* do not fork and addset! */
- {
- fprintf(stderr, "%s: Command not found\n", argv[0]);
- return;
- }
- pid_t pid;
- sigset_t mask,
- prev;
- sigemptyset( & mask);
- sigaddset( & mask, SIGCHLD);
- sigprocmask(SIG_BLOCK, &mask, &prev);
- /* block SIG_CHLD */
- if ((pid = fork()) == 0)
- /* child */
- {
- sigprocmask(SIG_SETMASK, &prev, NULL);
- /* unblock SIG_CHLD */
- if (!setpgid(0, 0)) {
- if (execve(argv[0], argv, environ)) {
- fprintf(stderr, "%s: Failed to execve\n", argv[0]);
- }
- /* context changed */
- } else {
- unix_error("Failed to invoke setpgid(0, 0)");
- }
- } else if (pid > 0)
- /* tsh */
- {
- if (!bg_flag)
- /* exec foreground */
- {
- fg_pid = pid;
- fg_pid_reap = 0;
- addjob(jobs, pid, FG, cmdline);
- sigprocmask(SIG_SETMASK, &prev, NULL);
- /* unblock SIG_CHLD */
- waitfg(pid);
- } else
- /* exec background */
- {
- addjob(jobs, pid, BG, cmdline);
- sigprocmask(SIG_SETMASK, &prev, NULL);
- /* unblock SIG_CHLD */
- printf("[%d] (%d) %s", maxjid(jobs), pid, cmdline);
- }
- return;
- } else {
- unix_error("Failed to fork child");
- }
- }
- return;
- }
这个函数分情况判断是哪一个内置命令,要注意如果用户仅仅按下回车键,那么在解析后 argv 的第一个变量将是一个空指针。如果用这个空指针去调用 strcmp 函数会引发 segment fault。
- /*
- * builtin_cmd - If the user has typed a built-in command then execute
- * it immediately.
- */
- int builtin_cmd(char **argv, char *cmdline)
- {
- char *first_arg = argv[0];
- if (first_arg == NULL) /* if input nothing('\n') in function main, then the
- first_arg here will be NULL, which will cause SEG fault when invoke strcmp(read) */
- {
- return 1;
- }
- if (!strcmp(first_arg, "quit"))
- {
- exit(0);
- }
- else if (!strcmp(first_arg, "jobs"))
- {
- listjobs(jobs);
- return 1;
- }
- else if (!strcmp(first_arg, "bg") || !strcmp(first_arg, "fg"))
- {
- do_bgfg(argv, cmdline);
- return 1;
- }
- return 0;
- }
这个函数单独处理了 bg 和 fg 这两个内置命令。要注意 fg 有两个对应的情况:1. 后台程序是 stopped 的状态,这时我们需要设置相关变量,然后发送继续的信号。2. 如果这个进程本身就在运行,我们就只需要改变 job 的状态,设置相关变量,然后进入 waitfg 等待这个新的前台进程执行完毕。
写这个也出现了一个让我 debug 几个小时的兼容性问题:
在 man 7 signal 中,SIGCHLD 描述如下:
- SIGCHLD 20,17,18 Ign Child stopped or terminated
也就是说,子进程终止或者停止的时候会向父进程发送这个信号,然后父进程进入 sigchld_handler 信号处理函数进行回收或者提示。但是在我的机器上却发现在子进程从 stopped 变到 running(收到 SIGCONT )的时候也会向父进程发送这个信号。这样就会出现一个问题:我们要使后台一个 stopped 的进程重新运行,但是它会向父进程(shell)发送一个 SIGCONT ,这样父进程就会进入信号处理函数 sigchld_handler 试图回收它(不是 stop),而它有没有结束,所以信号处理函数会一直等待它执行完毕,在 shell 中显示的情况就是卡住了。
经过长时间调试确认后发现在 POSIX 某个标准中 SIGCHLD 信号的定义如下:
SIGCHLD
The SIGCHLD signal is sent to a process when a child process terminates , is interrupted, or resumes after being interrupted. One common usage of the signal is to instruct the operating system to clean up the resources used by a child process after its termination without an explicit call to the wait system call.
,看到这句的时候我就要吐血了。。。
- or resumes after being interrupted.
为了进一步证实我的想法,我在 FreeBSD11.1 上面查了一下手册:
他说的是 "changed",看来我的机器是按照 POSIX 的某个标准实现的。
我的解决方案是设置一个 pid_t 的全局变量 stopped_resume_child 记录我们要 fg 的 stopped 进程,在进入信号处理函数后首先检查这个变量是否大于零,如果是就直接退出不做处理。(这里其实有一个和其他进程竞争的问题,时间有限就不去做更改了)
- /*
- * do_bgfg - Execute the builtin bg and fg commands
- */
- void do_bgfg(char **argv, char *cmdline)
- {
- char *first_arg = argv[0];
- if (!strcmp(first_arg, "bg"))
- {
- if (argv[1] == NULL)
- {
- fprintf(stderr, "bg command requires PID or %%jobid argument\n");
- return;
- }
- if (argv[1][0] == '%') /* JID */
- {
- int jid = atoi(argv[1] + 1);
- if (jid)
- {
- struct job_t *job_tmp = getjobjid(jobs, jid);
- if (job_tmp != NULL)
- {
- job_tmp->state = BG;
- printf("[%d] (%d) %s", jid, job_tmp->pid, job_tmp->cmdline);
- stopped_resume_child = job_tmp->pid;
- killpg(job_tmp->pid, SIGCONT);
- return;
- }
- else
- {
- fprintf(stderr, "%%%s: No such job\n", argv[1] + 1);
- }
- }
- else
- {
- fprintf(stderr, "%%%s: No such job\n", argv[1] + 1);
- }
- }
- else /* PID */
- {
- pid_t pid = atoi(argv[1]);
- if(pid)
- {
- struct job_t *job_tmp = getjobpid(jobs, pid);
- if (job_tmp != NULL)
- {
- job_tmp->state = BG;
- printf("[%d] (%d) %s", job_tmp->jid, pid, job_tmp->cmdline);
- stopped_resume_child = job_tmp->pid;
- killpg(pid, SIGCONT);
- return;
- }
- else
- {
- fprintf(stderr, "(%s): No such process\n", argv[1]);
- }
- }
- else
- {
- fprintf(stderr, "bg: argument must be a PID or %%jobid\n");
- }
- }
- }
- else
- {
- /* there are two case when using fg:
- 1. the job stopped
- 2. the job is running
- */
- if (argv[1] == NULL)
- {
- fprintf(stderr, "fg command requires PID or %%jobid argument\n");
- return;
- }
- if (argv[1][0] == '%') /* JID */
- {
- int jid = atoi(argv[1] + 1);
- if (jid)
- {
- struct job_t *job_tmp = getjobjid(jobs, jid);
- if (job_tmp != NULL)
- {
- int state = job_tmp->state;
- fg_pid = job_tmp->pid; /* this is the new foreground process */
- fg_pid_reap = 0;
- job_tmp->state = FG;
- if (state == ST)
- {
- stopped_resume_child = job_tmp->pid; /* set the global var in case of wait in SIGCHLD handler */
- killpg(job_tmp->pid, SIGCONT);
- }
- waitfg(job_tmp->pid); /* wait until the foreground terminate/stop */
- return;
- }
- else
- {
- fprintf(stderr, "%%%s: No such job\n", argv[1] + 1);
- }
- }
- else
- {
- fprintf(stderr, "%%%s: No such job\n", argv[1] + 1);
- }
- }
- else /* PID */
- {
- pid_t pid = atoi(argv[1]);
- if(pid)
- {
- struct job_t *job_tmp = getjobpid(jobs, pid);
- if (job_tmp != NULL)
- {
- int state = job_tmp->state;
- fg_pid = job_tmp->pid; /* this is the new foreground process */
- fg_pid_reap = 0;
- job_tmp->state = FG;
- if (state == ST)
- {
- stopped_resume_child = job_tmp->pid; /* set the global var in case of wait in SIGCHLD handler */
- killpg(pid, SIGCONT);
- }
- waitfg(job_tmp->pid); /* wait until the foreground terminate/stop */
- return;
- }
- else
- {
- fprintf(stderr, "(%s): No such process\n", argv[1]);
- }
- }
- else
- {
- fprintf(stderr, "fg: argument must be a PID or %%jobid\n");
- }
- }
- }
- return;
- }
我之前声明了一个
的全局变量 fg_pid_reap ,只要信号处理函数回收了前台进程,它就会将 fg_pid_reap 置 1,这样我们的 waitfg 函数就会退出,接着读取用户的下一个输入。使用 busysleep 会有一些延迟,实验报告上要求这么实现我也没办法; )
- volatile sig_atomic_t
- /*
- * waitfg - Block until process pid is no longer the foreground process
- */
- void waitfg(pid_t pid)
- {
- while (!fg_pid_reap)
- {
- sleep(1);
- }
- fg_pid_reap = 0;
- return;
- }
注意保存 errno 。
注意到这里不能使用 while 来回收进程,因为我们的后台还可能有正在运行的进程,这样做的话会使得 waitpid 一直等待这个进程结束。当然使用 if 只回收一次也可能会导致信号累加的问题,例如多个后台程序同时结束,实验报告上要求这么实现我也没办法 ; )
注意如果程序是被 stop 的话 SIGTSTP ctrl-z ,我们不用回收、删除 job 列表中的节点。
- /*
- * sigchld_handler - The kernel sends a SIGCHLD to the shell whenever
- * a child job terminates (becomes a zombie), or stops because it
- * received a SIGSTOP or SIGTSTP signal. The handler reaps all
- * available zombie children, but doesn't wait for any other
- * currently running children to terminate.
- */
- void sigchld_handler(int sig)
- /* When a child process stops or terminates, SIGCHLD is sent to the parent process. */
- {
- int olderrno = errno;
- if (stopped_resume_child) {
- stopped_resume_child = 0;
- return;
- }
- int status;
- pid_t pid;
- if ((pid = waitpid( - 1, &status, WUNTRACED)) > 0)
- /* don't use while! */
- {
- if (pid == fg_pid) {
- fg_pid_reap = 1;
- }
- if (WIFEXITED(status))
- /* returns true if the child terminated normally */
- {
- deletejob(jobs, pid);
- } else if (WIFSIGNALED(status))
- /* returns true if the child process was terminated by a signal. */
- /* since job start from zero, we add it one */
- {
- printf("Job [%d] (%d) terminated by signal %d\n", pid2jid(pid), pid, WTERMSIG(status));
- deletejob(jobs, pid);
- } else
- /* SIGTSTP */
- {
- /* don't delete job */
- struct job_t * p = getjobpid(jobs, pid);
- p - >state = ST;
- /* Stopped */
- printf("Job [%d] (%d) stopped by signal 20\n", pid2jid(pid), pid);
- }
- }
- errno = olderrno;
- return;
- }
注意是群发,即 killpg,不能只发一个。
- /*
- * sigint_handler - The kernel sends a SIGINT to the shell whenver the
- * user types ctrl-c at the keyboard. Catch it and send it along
- * to the foreground job.
- */
- void sigint_handler(int sig)
- {
- int olderrno = errno;
- pid_t pgid = fgpid(jobs);
- if (pgid)
- {
- killpg(pgid, SIGINT);
- }
- errno = olderrno;
- return;
- }
不解释。
- /*
- * sigtstp_handler - The kernel sends a SIGTSTP to the shell whenever
- * the user types ctrl-z at the keyboard. Catch it and suspend the
- * foreground job by sending it a SIGTSTP.
- */
- void sigtstp_handler(int sig)
- {
- int olderrno = errno;
- pid_t pgid = fgpid(jobs);
- if (pgid)
- {
- killpg(pgid, SIGTSTP);
- }
- errno = olderrno;
- return;
- }
为了方便检查结果,我写了一个 bash 脚本,用来比较我的 tsh 和实验给的正确参考程序 tshref 的输出结果(测试用例为 trace01.txt~trace16.txt):
- frank@under:~/tmp/shlab-handout$ cat test.sh
- #! /bin/bash
- for file in $(ls trace*)
- do
- ./sdriver.pl -t $file -s ./tshref > tshref_$file
- ./sdriver.pl -t $file -s ./tsh > tsh_$file
- done
- for file in $(ls trace*)
- do
- diff tsh_$file tshref_$file > diff_$file
- done
- for file in $(ls diff_trace*)
- do
- echo $file " :"
- cat $file
- echo -e "-------------------------------------\n"
全部打印出来太长,这里列出最后几个:
- frank@under:~/tmp/shlab-handout$ ./test.sh
- #.............................
- #.............................
- #.............................
- diff_trace13.txt :
- 5c5
- < tsh> Job [1] (6173) stopped by signal 20
- ---
- > tsh> Job [1] (6162) stopped by signal 20
- 7c7
- < tsh> [1] (6173) Stopped ./mysplit 4
- ---
- > tsh> [1] (6162) Stopped ./mysplit 4
- 20,24c20,24
- < 6170 pts/5 S+ 0:00 /usr/bin/perl ./sdriver.pl -t trace13.txt -s ./tsh
- < 6171 pts/5 S+ 0:00 ./tsh
- < 6173 pts/5 T 0:00 ./mysplit 4
- < 6174 pts/5 T 0:00 ./mysplit 4
- < 6177 pts/5 R 0:00 /bin/ps a
- ---
- > 6159 pts/5 S+ 0:00 /usr/bin/perl ./sdriver.pl -t trace13.txt -s ./tshref
- > 6160 pts/5 S+ 0:00 ./tshref
- > 6162 pts/5 T 0:00 ./mysplit 4
- > 6163 pts/5 T 0:00 ./mysplit 4
- > 6166 pts/5 R 0:00 /bin/ps a
- 41c41
- < 1303 tty7 Ssl+ 21:49 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
- ---
- > 1303 tty7 Ssl+ 21:48 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
- 51,53c51,53
- < 6170 pts/5 S+ 0:00 /usr/bin/perl ./sdriver.pl -t trace13.txt -s ./tsh
- < 6171 pts/5 S+ 0:00 ./tsh
- < 6182 pts/5 R 0:00 /bin/ps a
- ---
- > 6159 pts/5 S+ 0:00 /usr/bin/perl ./sdriver.pl -t trace13.txt -s ./tshref
- > 6160 pts/5 S+ 0:00 ./tshref
- > 6169 pts/5 R 0:00 /bin/ps a
- -------------------------------------
- diff_trace14.txt :
- 7c7
- < tsh> [1] (6207) ./myspin 4 &
- ---
- > tsh> [1] (6188) ./myspin 4 &
- 23c23
- < tsh> Job [1] (6207) stopped by signal 20
- ---
- > tsh> Job [1] (6188) stopped by signal 20
- 27c27
- < tsh> [1] (6207) ./myspin 4 &
- ---
- > tsh> [1] (6188) ./myspin 4 &
- 29c29
- < tsh> [1] (6207) Running ./myspin 4 &
- ---
- > tsh> [1] (6188) Running ./myspin 4 &
- -------------------------------------
- diff_trace15.txt :
- 7c7
- < tsh> Job [1] (6241) terminated by signal 2
- ---
- > tsh> Job [1] (6224) terminated by signal 2
- 9c9
- < tsh> [1] (6244) ./myspin 3 &
- ---
- > tsh> [1] (6226) ./myspin 3 &
可以发现除了 PID 不同以外其余都相同,说明 tsh 实现正确。
这次实验给我最大的教训就是不要完全相信文档,自己去实现和求证也很重要。另外,并行产生的竞争问题也有了一些了解。
另外,有意思的是,我在做实验之前看到实验指导里说:
– In waitfg, use a busy loop around the sleep function.
– In sigchld handler, use exactly one call to waitpid.
当时我还想说用 sleep 和在 waitpid 里面只用一个回收是不是不安全或者太傻了,结果我上 github 一看不仅都是这样,而且他们的代码非常不安全(上面提到的六个安全注意点完全不遵守,各种调用也没有检查返回值和异常(例如只有 if 没有 else)),于是觉得自己写的肯定比他们好多了
结果。。。如果注意这些安全问题会有很多麻烦,时间也有限,我就把几个容易实现的实现了,还有两个 "访问全局结构变量前 block" 和 " 在信号处理函数中仅使用 async-signal-safe 没有实现。
最后,改编一下 Mutt E-Mail Client 作者的一句话总结一下这次实验:
All code about this ShellLab on github suck. This one just sucks less ;)
来源: https://www.cnblogs.com/liqiuhao/p/8120617.html