A fork() in the road
A fork() in the road
ABSTRACT
The received wisdom suggests that Unix's unusual combination of fork() and exec() for process creation was an inspired design. In this paper, we argue that fork was a clever hack for machines and programs of the nineteen seventies that has long outlived its usefulness and is now a liability. We catalog the ways in which fork is a terrible abstraction for the modern programmer to use, describe how it compromises OS implementations, and propose alternatives.
As the designers and implementers of operating systems, we should acknowledge that fork's continued existence as a first-class OS primitive holds back systems research, and deprecate it. As educators, we should teach fork as a historical artifact, and not the first process creation mechanism students encounter.
One INTRODUCTION
One INTRODUCTION
When the designers of Unix needed a mechanism to create processes, they added a peculiar new system call: fork(). As every undergraduate now learns, fork creates a new process identical to its parent (the caller of fork), with the exception of the system call's return value. The Unix idiom of fork() followed by exec() to execute a different program in the child is now well understood, but still stands in stark contrast to process creation in systems developed independently of Unix.
Fifty years later, fork remains the default process creation API on POSIX: Atlidakis et al. found one thousand three hundred four Ubuntu packages (seven point two percent of the total) calling fork, compared to only forty-one uses of the more modern posix_spawn(). Fork is used by almost every Unix shell, major web and database servers (e.g., Apache, PostgreSQL, and Oracle), Google Chrome, the Redis key-value store, and even Node.js. The received wisdom appears to hold that fork is a good design. Every OS textbook we reviewed covered fork in uncritical or positive terms, often noting its "simplicity" compared to alternatives. Students today are taught that "the fork system call is one of Unix's great ideas" and "there are lots of ways to design APIs for process creation; however, the combination of fork() and exec() are simple and immensely powerful ... the Unix designers simply got it right".
Our goal is to set the record straight. Fork is an anachronism: a relic from another era that is out of place in modern systems where it has a pernicious and detrimental impact. As a community, our familiarity with fork can blind us to its faults. Generally acknowledged problems with fork include that it is not thread-safe, it is inefficient and unscalable, and it introduces security concerns. Beyond these limitations, fork has lost its classic simplicity; it today impacts all the other operating system abstractions with which it was once orthogonal. Moreover, a fundamental challenge with fork is that, since it conflates the process and the address space in which it runs, fork is hostile to user-mode implementation of OS functionality, breaking everything from buffered I O to kernel-bypass networking. Perhaps most problematically, fork doesn't compose-every layer of a system from the kernel to the smallest user-mode library must support it.
We illustrate the havoc fork wreaks on OS implementations using our experiences with prior research systems. Fork limits the ability of OS researchers and developers to innovate because any new abstraction must be special-cased for it. Systems that support fork and exec efficiently are forced to duplicate per-process state lazily. This encourages the centralisation of state, a major problem for systems not structured using monolithic kernels. On the other hand, research systems that avoid implementing fork are unable to run the enormous body of software that uses it.
We end with a discussion of alternatives and a call to action: fork should be removed as a first-class primitive of our systems, and replaced with good-enough emulation for legacy applications. It is not enough to add new primitives to the OS-fork must be removed from the kernel.