Operating Systems: Three Easy Pieces Ch. 23

May 25, 2023 5 분 소요

Complete Virtual Memory Systems

VAX/VMS operating system
- 1970’s and early 1980’s
Linux
- Linux is a widely used system
- runs effectively on systems as small and underpowered as phones
VAX/VMS Virtual Memory
- Digital Equipment Corporation
  - massive player
  - unfortunately, a series of bad decisions and the advent of the PC slowly
- The VAX-11 provided a 32-bit virtual address space per process, divided into 512-byte pages. Thus, a virtual address consisted of a 23-bit VPN and a 9-bit offset
- Upper two bits → segment
- A hybrid of paging and segment
- The lower half of the address space was known as the “process space” and is unique to each process. In the first half of process space (known as P0), the user program is found, and a heap grows downward. In the second half of process space (P1), we find the stack, which grows upwards. The upper half of the address space is known as system space (S), although only half of it is used. Protected OS code and data reside here, and the OS is in this way shared across processes.
- Size is small
  - The system reduced the pressure page tables placed on memory in two ways. First, by segmenting the user address space into two
    - page table for each region per process
    - no page-table space is needed for unused portion
  - The OS reduces memory pressure even further by placing user page tables
    - the code segment never begins at page 0. This page, instead, is marked inaccessible, in order to provide some support for detecting null-pointer accesses
- On the context-switch
  - the OS changes the P0 and P1 registers to point to the appropriate page tables of the soon-to-be-run process
  - But it doesn’t change the S base and bound register → same kernel structure are mapped into each user address space
  - This construction makes life easier for the kernel
  - the kernel appears almost as a library to applications, albeit a protected one
- PTE in Vax
  - valid bit
  - protection bit - 4
  - modify bit
  - fields reserved for OS - 5
  - PFN
  - No reference bit
  - Memory hogs → programs that use a lot of memory and make it hard for other programs to run
  - Segemented FIFO
    - each process has a maximum number of pages it can keep in memory, known as its resident set size (RSS).
  - VMS introduced two second-chance lists to improve FIFO performance
    - P exceeds its RSS, a page is removed from its per-process FIFO
    - if clean → clean-page list
    - if dirty → dirty-page list
  - If another process Q needs a free page, it takes the first free page off of the global clean list.
    - But if the original process P faults on that page before it is reclaimed, P reclaims it from the free list
  - Another optimization used in VMS also helps overcome the small page size in VMS
    - clustering → groups large batches of pages together from the global dirty list
  - Lazy optimizations
    - demand zeroing of pages
    - With demand zeroing, the OS instead does very little work when the page is added to your address space; it puts an entry in the page table that marks the page inaccessible → trap into the OS takes place
    - copy-on-write
      - it can map it into the target address space and mark it read-only in both address spaces
      - Copy the page when the child process’ page table has changed
      - The process then continues and now has its own private copy of the page
Linux Virtual Memory System
- Like those other systems, upon a context switch, the user portion of the currently-running address space changes; the kernel portion is the same across processes
- Kernel Virtual Space
  - kernel logical addresses → kmalloc
  - I/O transfers to and from devices via directory memory access (DMA)
  - which returns a pointer to a virtually contiguous region of the desired size → vmalloc
- enable the kernel to address more than (roughly) 1 GB of memory
- 64-bit Linux → not confined to only the last 1 GB of the virtual address space
Page Table Structure
- OS simply sets up mappings in its memory, points a privileged register at the start of the page directory, and the hardware handles the rest
- 64-bit
  - The full 64-bit nature of the virtual address space is not yet in use, however, rather only the bottom 48 bits
  - the top 16 bits of a virtual address are unused (and thus play no role in translation), the bottom 12 bits (due to the 4-KB page size) are used as the offset (and hence just used directly, and not translated), leaving the middle 36 bits of virtual address to take part in the translation.
Large page support
- Specifically, recent designs support 2-MB and even 1-GB pages in hardware. Thus, over time, Linux has evolved to allow applications to utilize these huge pages
- those translations are for 4-KB pages, only a small amount of total memory can be accessed without inducing TLB misses
- Huge pages allow a process to access a large tract of memory without TLB misses, by using fewer slots in the TLB, and thus is the main advantage
- One interesting aspect of Linux support for huge pages is how it was done incrementally
- Linux developers have added transparent huge page support. When this feature is enabled, the operating system automatically looks for opportunities to allocate huge pages
- Huge pages are not without their costs. The biggest potential cost is internal fragmentation
The Page Cache
- The Linux page cache is unified, keeping pages in memory from three primary sources: memory-mapped files
- anonymous memory, because there is no named file
- These entities are kept in a page cache hash table, allowing for quick lookup when said data is needed
- The page cache tracks if entries are clean (read but not updated) or dirty (a.k.a., modified)
- 2Q replacement
- When accessed for the first time, a page is placed on one queue (called A1 in the original paper, but the inactive list in Linux); when it is referenced, the page is promoted to the other queue (called Aq in the original, but the active list in Linux).
- Thus, as with many OSes, an approximation of LRU (similar to clock replacement) is used
Security and Buffer overflow
- One major threat is found in buffer overflow attacks, which can be used against normal user programs and even the kernel itself
- if successful upon the operating system itself, the attack can access even more resources and is a form of what is called privilege escalation
- NX bit (for No-eXecute), introduced by AMD into their version of x86
- return-oriented programming (ROP)
- address space layout randomization (ASLR) → OS randomizes their placement(stack, heap, code)
- kernel address space layout randomization (KASLR)
- Meltdown
- Spectre
- speculative execution, in which the CPU guesses which instructions will soon be executed in the future, and starts executing them ahead of time.
  - tends to leave traces of its execution in various parts of the system, such as processor caches, branch predictors
  - state can make vulnerable the contents of memory
- kernel page table isolation
  - kernel protection was thus to remove as much of the kernel address space from each user process and instead have a separate kernel page table

Operating Systems: Three Easy Pieces Ch. 23

참고

2023-HGU-ML Lecture 10. Classification

6학기를 돌아보며

2023-HGU-ML Lecture 9. Nonlinear Dimension Reduction

2023-HGU-ML Lecture 8. Dimension Reduction