目錄表
Process Control
0x00 Outline
- Overview
- Process creation
- Process termination
- Program execution
0x01 OverView
Process Identifiers
- 每個 process 都有獨特的 id
- pid 是非負整數
- process terminate 後 pid 可被重複使用
- init process
- 作業系統啟動的第一支程式
- 是所有 orphaned process 的 parent
- never die
- ps、top、htop command list running processes
Process Relationships
- pstree command 以 tree structure 列出 process
- init 會在 process tree 的樹根,通常 pid 為 1
Retrieve Process Identifiers
pid_t getpid(void); pid_t getppid(void); uid_t getuid(void); uid_t geteuid(void); gid_t getgid(void); gid_t getegid(void); /* None of these functions has an error return */
0x02 Process Creation
The fork function
#include <unistd.h> pid_t fork(void); /* Returns: 0 in child, process ID of child in parent, -1 on error */
- 建立一個 new(child) process
- 在 child process 中會回傳 0,在 parent process 中會回傳 child pid,error 則回傳 -1
- parent 和 child process 都會從呼叫 fork() 的地方繼續執行
- child gets a copy of parents's data space、heap and stack,這部分是複製,記憶體不共享
- child and parent 共享 text segment
- Copy-on-write (COW): 這是現今多採用的技術,由於完整複製父程序的 data, stack and heap 效能較差,因此 kernel 將 parent 和 child 會修改到的部分複製,而其餘部分則能以 read-only 共享
#include "apue.h" int globvar = 6; /* external variable in initialized data */ char buf[] = "a write to stdout\n"; int main(void) { int var; /* automatic variable on the stack */ pid_t pid; var = 88; if (write(STDOUT_FILENO, buf, sizeof(buf)-1) != sizeof(buf)-1) err_sys("write error"); printf("before fork\n"); /* we don’t flush stdout */ if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) /* child */ { globvar++; /* modify variables */ var++; } else /* parent */ { sleep(2); } printf("pid = %ld, glob = %d, var = %d\n", (long)getpid(), globvar, var); exit(0); }
$ ./a.out a write to stdout before fork pid = 430, glob = 7, var = 89 # child’s variables were changed pid = 429, glob = 6, var = 88 # parent’s copy was not changed $ ./a.out > temp.out $ cat temp.out a write to stdout before fork pid = 432, glob = 7, var = 89 before fork pid = 431, glob = 6, var = 88
- “a write to stdout” 字串都只出現一次,因為
write
function 是 not buffered,且是在 fork 前被呼叫,所以在呼叫時字串就會被寫到 standard output - “before fork”: printf from the standard I/O library is buffered
- 在第一個 case,standard I/O (running the program interactively) 是 line buffered,standard output buffer 會在遇到換行時被 flushed
- 在第二個 case,standard I/O (redirect stdout to a file) 是 fully buffered,雖然 printf 是在 fork 前被呼叫,但字串保留在 buffer 中,當呼叫 fork 時,parent 的 data space 被複製一份給了 child,直到 exit 時 buffer 才被清空,因此印出了兩次字串
Fork and file Sharing
Handling File Descriptors after fork
- parent waits for child to complete
- 剛 fork 時 child 的 file descriptor 會和 parent 有一樣的 offset
- Both the parent and the child go their own ways
- parent 和 child 個別關閉自己不需使用的 file descriptor
Other Properties Inherited by the Child
- real uid、real gid、euid、egid、gid
- controlling terminal
- suid、sgid
- current working directory
- file mode creation mask
- signal mask and dispositions
- the close-on-exec flag for any open file descriptors
- environment variables
Use of fork
- when a process want to duplicate itself
- parents 和 child 可以執行程式內的不同部分
- 常用在網路程式,parent 等待 client 連線,當有連線後 fork 讓 child 執行工作,parent 則繼續等待其他連線
- when a process want to execute a different program
- 在 shell 中常使用,當下了指令時,shell 都是使用 fork 另外去執行
Variants of fork
- vfork:
- Creates child process 時不會複製 address space 給 child,child 在呼叫 exec 或 exit 前都跑在 parent 的 address space 中
- 通常用在 child 會立即呼叫 exec 的情況,不需要複製資料,比 fork 更有效率
- clone:
- 實現 fork 和 vfork 的 Linux system calls,可以決定哪些東西要在 parent 和 child 之間共享
function | 特性 |
---|---|
fork | parent 和 child 執行時互相獨立,不保證先後,變數分離,溝通需要透過 pipe 等專門機制,Linux 的 Copy on Write 技術可以減少負擔 |
vfork | parent 和 child 共用變數,在 child 呼叫 exit 或 exec之前,parent 會被 block 住 |
clone | 可以決定哪些東西要在 parent 和 child 之間共享 |
0x03 Process Termination
Child process terminate
- zombie process: 當 child 終止時,他的 exit status 應該被 parent 讀取,如果 parent 不理會 child 的 exit status,則 child 會變成 zombie process,直到 parent 呼叫 wait 取得 child 的結束狀態
- child 佔據的 resource 會被釋放,但是 pid 和 terminate status 會留在 kernel
- 保證 parent process 存在
- 如果 parent process 在 child 之前終止了, init process 會接手成為那些 child process 的 parent process,child 的 ppid 會改為 1
- 當 child process 終止(正常或異常皆是)時,kernel 會發送 SIGCHLD signal 給 parent process
- child termination 是 asynchronous event,可能發生在 process 執行的任何時候
- kernel 發送給 parent 的 signal 也是 asynchronous
- parent 可以忽略或提供 function 處理這個 signal
wait and waitpid function
#include <sys/types.h> #include <sys/wait.h> pid_t wait(int *status); pid_t waitpid(pid_t pid, int *status, int options);
- parent process 可以透過呼叫 wait 和 waitpid functions 來得到 child process 的 terminate status
- 這兩個 function:
- 如果 children 都還在執行,則會 Block
- 如果有 child terminate 且正在等待它的 terminate status 被 fetch,則會直接回傳 child 的 termination status
- 如果沒有 child 則直接回傳 error
- wait 和 waitpid functions 的差異
- Block or not block
- wait function always block the caller until a child process terminates
- waitpid function has an option that prevents it from being blocked
- Process termination order
- The waitpid function doesn't wait for the child that terminates first; it has a number of options that control which process it waits for.
- wait function 會等待所有 child process
- waitpid function
- 參數 pid:
- < -1: Waits for any child whose process group ID equals the absolute value of pid.
- == -1: Waits for any child process. In this respect, waitpid is equivalent to wait.
- == 0: Waits for any child whose process group ID equals that of the calling process.
- > 0: Waits for the child whose process ID equals pid.
- 參數 option:
- WNOHANG: The waitpid function will not block if a child specified by pid is not immediately available. In this case, the return value is 0
- WUNTRACED:
- WCONTINUED:
int main(void) { if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) /* child */ { exit(7); } if (wait(&status) != pid) { err_sys("wait error"); } pr_exit(status); if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) /* child */ { abort(); /* generates SIGABRT */ } if (wait(&status) != pid) { err_sys("wait error"); } pr_exit(status); if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) /* child */ { status /= 0; } if (wait(&status) != pid) { err_sys("wait error"); } pr_exit(status); exit(0); } void pr_exit(int status) { if (WIFEXITED(status)) printf("normal termination, exit status = %d\n", WEXITSTATUS(status)); else if (WIFSIGNALED(status)) printf("abnormal termination, signal number=%d%s\n", WTERMSIG(status), WCOREDUMP(status) ? " (core file generated)" : ""); else if (WIFSTOPPED(status)) printf("child stopped, signal number=%d\n", WSTOPSIG(status)); }
normal termination, exit status = 7 abnormal termination, signal number = 6 abnormal termination, signal number = 8
Macros to Interpret Exit Status
- WIFEXITED(status): 如果 child 正常終止 (terminated) 則回傳 true,在這個情況下可以使用 WEXITSTATUS(status) 去取得 child 傳入 exit, _exit,or _Exit 的 low-order 8 bits argument
- WIFSIGNALED (status): 如果 child 異常終止則回傳 true,通常是收到一個未能處理的 signal,在這個情況下可以使用 WTERMSIG(status) 去取得導致終止的 signal number,除此之外有些系統也實作了 WCOREDUMP(status),當 child 終止時若有 coredump file 產生則這個 macro 會回傳 true
- WIFSTOPPED (status): if child currently stopped, return true,在這個情況下可以使用 WSTOPSIG(status) 去取得導致 child process 暫停的 signal number
- WIFCONTINUED (status): True if status was returned for a child that has been continued after a job control stop
Avoid Zombies by Calling fork Twice
fork 時 parent process 可能也有自己的工作,不想呼叫 wait 來等待 child process 而被 block,所以我們可以透過兩次 fork,並立刻結束第一個 child,如此 parent 不需要過多等待,而第二個 child process 會在第一個 child process 結束時被 init 接手,從此與 parent 獨立
#include "apue.h" #include <sys/wait.h> int main(void) { pid_t pid; if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) { /* first child */ if ((pid = fork()) < 0) err_sys("fork error"); else if (pid > 0) exit(0); /* parent from second fork == first child */ /* * We're the second child; our parent becomes init as soon * as our real parent calls exit() in the statement above. * Here's where we'd continue executing, knowing that when * we're done, init will reap our status. */ sleep(2); printf("second child, parent pid = %ld\n", (long)getppid()); exit(0); } if (waitpid(pid, NULL, 0) != pid) /* wait for first child */ err_sys("waitpid error"); /* * We're the parent (the original process); we continue executing, * knowing that we're not the parent of the second child. */ exit(0); }
$ ./a.out $ second child, parent pid = 1
這邊可以注意到 “second child, parent pid = 1” 前面會先出現提示符,因為在 parent process 結束時提示符就會再出現,而後第二個 child 結束才輸出字串
Race Conditions
在呼叫 fork 後不保證 parent 和 child 誰會先執行
int main(void) { pid_t pid; if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) { charatatime("output from child\n"); } else { charatatime("output from parent\n"); } exit(0); }
解法一:
- 如果 child 要比 parent 早執行,則 parent 可以使用 wait/waitpid 等待 child 執行結束
- 如果是 parent 要比 child 早執行,由於 parent 結束後 child 會被 init 接手,因此可以使用 getppid() 做 Polling
while (getppid() != 1) sleep(1);
- 缺點是不管哪一種,parent 或 child 都會被停下來等待另一個,另外 Polling 效率不佳
解法二:
- 透過 IPC 溝通
- TELL_WAIT(): Initialize
- WAIT_PARENT(): blocks execution and waits for its parent
- TELL_CHILD(pid): tell a child that it has finished
- WAIT_CHILD(): blocks execution and waits for its child
- TELL_PARENT(ppid): tell its parent that it has finished
#include "apue.h" TELL_WAIT(); /* set things up for TELL_xxx & WAIT_xxx */ if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) { /* child */ /* child does whatever is necessary ... */ TELL_PARENT(getppid()); /* tell parent we’re done */ WAIT_PARENT(); /* and wait for parent */ /* and the child continues on its way ... */ exit(0); } /* parent does whatever is necessary ... */ TELL_CHILD(pid); /* tell child we’re done */ WAIT_CHILD(); /* and wait for child */ /* and the parent continues on its way ... */ exit(0);
- parent_go_first
} else if (pid == 0) { WAIT_PARENT(); /* parent goes first */ charatatime("output from child\n"); } else { charatatime("output from parent\n"); TELL_CHILD(pid); }
- child_go_first
} else if (pid == 0) { charatatime("output from child\n"); TELL_PARENT(getppid()); } else { WAIT_CHILD(); /* child goes first */ charatatime("output from parent\n"); }
0x04 Process Execution
exec Function
- 當 process 執行 exec function 後, 內容會完全被新的 program 取代,並執行新 program 的 main function
- 執行 exec 後 pid 並不會改變,因為並沒有產生新的 process 只是原來的 process 內容被取代了
fork
creates new processesexec
functions initiates new programsexit
handles terminationwait
functions handle waiting for termination
#include <unistd.h> extern char **environ; int execl(const char *pathname, const char *arg0, ...,(char *)0); int execlp(const char *filename, const char *arg0, ...,(char *)0); int execle(const char *pathname, const char *arg0, ...,(char *)0 , char * constenvp[]); int execv(const char *pathname, char *const argv[]); int execvp(const char *filename, char *const argv[]); int execve(const char *pathname, char *const argv[], char *const envp[]); int fexecve(int fd, char *const argv[], char *const envp[]); /* All seven return: −1 on error, no return on success */
- pathname 必須要為絕對或相對路徑
- filename 中不含 slash(/),exec 會自動搜尋在 $PATH 環境變數中的檔案名稱,function 中含有 p
- arg list 是直接將參數一個一個餵給 exec,function 中含有 l
- argv[] 是將參數存在陣列中再餵給 exec,function 中含有 v
- environ 是可以直接使用的外部環境變數
- envp[] 則是將可用環境變數放在陣列傳入 exec, function 中含有 e
#include "apue.h" #include <sys/wait.h> char *env_init[] = { "USER=unknown", "PATH=/tmp", NULL }; int main(void) { pid_t pid; if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) /* specify pathname, specify environment */ { if (execle("/home/sar/bin/echoall", "echoall", "myarg1", "MY ARG2", (char *)0, env_init) < 0) err_sys("execle error"); } if (waitpid(pid, NULL, 0) < 0) err_sys("wait error"); if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) /* specify filename, inherit environment */ { if (execlp("echoall", "echoall", "only 1 arg", (char *)0) < 0) err_sys("execlp error"); } exit(0); }
- echoall.c
#include "apue.h" int main(int argc, char *argv[]) { int i; char **ptr; extern char **environ; for (i = 0; i < argc; i++) /* echo all command-line args */ printf("argv[%d]: %s\n", i, argv[i]); for (ptr = environ; *ptr != 0; ptr++) /* and all env strings */ printf("%s\n", *ptr); exit(0); }
$ ./a.out argv[0]: echoall argv[1]: myarg1 argv[2]: MY ARG2 USER=unknown PATH=/tmp $ argv[0]: echoall argv[1]: only 1 arg USER=sar LOGNAME=sar SHELL=/bin/bash ... HOME=/home/sar
Interpreter Files
幾乎現今的 UNIX 系統都支援 Interpreter Files,而 exec 也可執行 Interpreter Files
#! pathname [ optional-argument ]
#include "apue.h" #include <sys/wait.h> int main(void) { pid_t pid; if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) /* child */ { if (execl("/home/sar/bin/testinterp", "testinterp", "myarg1", "MY ARG2", (char *)0) < 0) err_sys("execl error"); } if (waitpid(pid, NULL, 0) < 0) /* parent */ err_sys("waitpid error"); exit(0); }
$ cat /home/sar/bin/testinterp #!/home/sar/bin/echoarg foo $ ./a.out argv[0]: /home/sar/bin/echoarg argv[1]: foo argv[2]: /home/sar/bin/testinterp argv[3]: myarg1 argv[4]: MY ARG2
system Function
#include <stdlib.h> int system(const char *cmdstring);
system function 的簡易實作,不含 signal handle
- 主要包含 fork(), exec(), and waitpid()
#include <sys/wait.h> #include <errno.h> #include <unistd.h> int system(const char *cmdstring) /* version without signal handling */ { pid_t pid; int status; if (cmdstring == NULL) return(1); /* always a command processor with UNIX */ if ((pid = fork()) < 0) { status = -1; /* probably out of processes */ } else if (pid == 0) /* child */ { execl("/bin/sh", "sh", "-c", cmdstring, (char *)0); _exit(127); /* execl error */ } else /* parent */ { while (waitpid(pid, &status, 0) < 0) { if (errno != EINTR) { status = -1; /* error other than EINTR from waitpid() */ break; } } } return(status); }
system and suid/sgid Programs
如果一個 process 有 suid/sgid 則又在這個 process 中呼叫了 system,則 system 執行的這個 command 擁有和呼叫他的 process 一樣的權限,所以一般不建議在 suid/sgid 的程式直接使用 system
建議使用 fork 產生新 process 後再以 seteuid and setegid 重新調整權限,確認權限無誤後才呼叫 exec 執行指令
User Identification
#include <sys/types.h> #include <pwd.h> struct passwd *getpwnam(const char *name); struct passwd *getpwuid(uid_t uid); int getpwnam_r(const char *name, struct passwd *pwd, char *buf, size_t buflen, struct passwd **result); int getpwuid_r(uid_t uid, struct passwd *pwd, char *buf, size_t buflen, struct passwd **result); struct passwd { char *pw_name; /* username */ char *pw_passwd; /* user password */ uid_t pw_uid; /* user ID */ gid_t pw_gid; /* group ID */ char *pw_gecos; /* user information */ char *pw_dir; /* home directory */ char *pw_shell; /* shell program */ };
- 回傳符合 uid 的 passwd database(/etc/passwd)
#include <unistd.h> char *getlogin(void); int getlogin_r(char *buf, size_t bufsize);
- getlogin() returns a pointer to a string containing the name of the user logged in on the controlling terminal of the process, or a NULL pointer if this information cannot be determined