Friday, 21 January 2011

A history of copy-on-write memory management

In virtual memory operating systems, copy on write (COW) is a well known technique to save time and space. The idea is that several processes have logically separate writable pages of data that happen to have the same contents, because they were initialized from the same source. So long as nobody writes into a page, all the processes can share the same copy. If a process does write into it, the operating system makes a separate copy of the the changed page. The best known application for COW is the Unix fork() primitive, although it also is useful for writable program data that is initialized from a file, a common situation in modern operating systems with shared libraries.


The implementation is straightforward. COW pages are mapped read-only into each process. A write attempt causes a page fault which the system handles by making a read/write copy of the page and mapping that into the process in place of the read-only page.

How far back does copy on write go? The oldest paging hardware for which I have documenation is the IBM 360/67. Although the /67 handled paging and page faults, the only facility for making memory read-only is the 360's cumbersome storage keys, and the Functional Characteristics manual says that a storage key violation causes an imprecise exception, which means that the system can't resume the program. The 1971 S/370 Principles of Operation says that the 370's VM didn't do write faults either. ESA/390 in the mid 90s finally added suppression on protection, which in connection with access lists, a segmented address extension hack, provided the needed hardware support. The ESA/390 PofO notes that AIX/ESA uses it for copy-on-write.

The storage keys do have change monitoring, a bit in the key that is set if a page has been written. In about 1976 VM/370 R3 did copy on write by looking at the change bits on each context switch to see whether a page had been changed and if so marked it as private, with other processes using a copy refreshed from the disk. This was slow and interacted poorly with the VM assist feature.

IBM workstation hardware from the ROMP in the RT/PC through the Power PC use an inverted page table which I discuss below.

The earliest widely used copy on write implementation I'm aware of was Tenex in 1972. One of the papers about Tenex says COW was adapted from BBN Lisp on the SDS 940. Tenex ran on a PDP-10 modified to add paged address hardware and larger physical addressing. The main use of copy on write was to initialize writable data from program files, for almost but not quite reentrant code, and large data areas that were changed infrequently. (Tenex had a fork primitive which it used to create new processes, although its use was somewhat different from Unix's.) Due to the cost of core memory, the Tenex hardware was designed to make page sharing as easy as possible to minimize the number of physical pages needed. Tenex evolved into TOPS-20, which was widely used on DEC's 36 bit hardware until they cancelled the product line in 1983.

Unix systems first got virtual memory in 3.0BSD in 1979, but copy on write didn't come along until much later. 3BSD added the vfork() call, a hack to avoid the memory copying in a normal fork() by lending the old process' address space to the new process, on the assumption that the new process won't need it for long. This was good enough to forestall demand for COW for several years.

The VAX, which the most popular Unix platform in the 1980s, had paging hardware sufficient to support COW, although there were reported to be hardware bugs on the VAX-11/750 (aka Comet) that made COW in the stack segment, the place it's most useful, not work. DEC's operating system VMS reportedly had COW from its earliest days in 1979.

In 1983, AIX for the RT/PC ran atop an extended virtual machine monitor called the VRM which provided the segmented virtual memory that AIX used. The RT's inverted page tables made sharing a page between two segments slow, since the page tables had to be fixed up on each context switch. The VRM provided copy-on-touch (COT) instead, in which a process got its own copy of a page the first time it referenced it after a fork. For the application of sharing data after a fork, this was nearly as good as COW, with lower internal overhead. (COT would have worked on the Comet, but it didn't occur to anyone to try it.) I don't know whether AIX on the PowerPC, which also uses inverted page tables, uses COW or COT.

In 1986, the CMU developed the Mach kernel, a new design intended to replace the kernel of 4.3BSD.  Mach became very popular, partly because it worked well, partly because it was free of the AT&T copyright that still applied to most Unix systems. It was incorporated into commercial systems such as Nextstep and OSF/1, and the COW code was incorporated into 4.4BSD, the final unencumbered BSD release in 1994, that was the predecessor of modern BSD systems such as FreeBSD, NetBSD, and OpenBSD.

Since the mid-1990s, COW has been a standard feature of Unix-like systems including Linux and AIX.