Operating Systems: Three Easy Pieces Ch. 19
Paging: Faster Translations
- TLB: Translation-lookaside buffer
- Part of MMU: Memory-management unit
- hardware cache: unlike memory cache, it is visible
- Hardware-managed TLB
- extract virtual page number from virtual address and check TLB holds a translation
- TLB Hit : TLB hold the translation
- TLB Miss : TLB doesn’t have the translatinon
- In the case, update the TLB with the translation
- It is costly since it has more memory access
- avoid TLB miss is important
- extract virtual page number from virtual address and check TLB holds a translation
- Example
- In the case of array
- Since the array has continous memory, if we load memory unit by memory size, TLB hit occurs often
- Spatial locality
- The element of the array are packed tightly into pages
- Temporal locality
- Quick re-referencing of memory items in time
- In the case of array
- Handling TLB miss
- CISC (complex-instruction set computers), the old day system, engineer doesn’t believe OS
- Therefore, hardware know exactly where the page tables located via a page-table base register
- Hardware-managed TLB, uses fixed multi-level page table
- RISC (reduced-instrution set computers)
- TLB miss occurs, hardware simply raise the exception
- Software-managed TLB
- On the TLB miss → exception → raises privileged level to kernel mode, and jumps to a trap handler
- Important details
- return-from-trap instructions needs to be a little different
- It resume the execution from the instruction “cause” the TLB miss because it now will occur TLB hit
- TLB miss-handling, need to extra careful not to cause an infinite chain of TLB miss
- Many solutions → keep TLB miss handler in physical memory
- reserve some entries in the TLB for permanenetly-valid translations and use some of those permananent translation slots for the handler code it self → wired translation
- return-from-trap instructions needs to be a little different
- Flexibility → OS can use any data structure it wants to implement page table
- Simplicity → the hardware doesn’t have to do much on a miss
- CISC (complex-instruction set computers), the old day system, engineer doesn’t believe OS
- TLB Content
- TLB might have 32, 64, 128 entries with fully associative
-
VPN PFN other bits - Other bits
- valid bit - whether the entry has a valid translation or not
- protection bit - how can a page can be accessed
- Address-space identifier
- dirty bit
- Context switches
- the TLB contains virtual-to-physical translations that are only valid for the currently running process; these translations are not meaningful for other processes.
- Need to be careful not to be use other TLB
- If there is same vpn, the hardware can’t distinguish which entry is meant for which process
- Solution
- TLB flush
- flush the TLB on context switch, emptying TLB before running the new process
- On a software-based system, this can be accomplished with an explicit (and privileged) hardware instruction; with a hardware-managed TLB, the flush could be enacted when the page-table base register is changed (note the OS must change the PTBR on a context switch anyhow)
- There is a cost, it must incur TLB misses at it touches its data and code pages
- flush the TLB on context switch, emptying TLB before running the new process
- ASID
- address space identifier
- TLB can hold same VPN with ASID without confusion
- Two processes can share a page with this method
- TLB flush
- Replacement policy
- main issue: cache replacement; how can we replace?
- LRU(Least-recently-used)
- take advantage of locality in the memory-reference stream
- Random policy
- The real TLB
- Page 193
- The use of TLBs in virtual memory systems can significantly improve address translation performance by caching frequently accessed page table entries on-chip, reducing the need to access main memory. This results in performance similar to that of non-virtualized memory. However, TLBs may be less effective for programs that access a large number of pages in a short time, leading to TLB misses and decreased performance. One solution to this issue is to use larger page sizes to increase TLB coverage, but TLB access can also become a bottleneck in the CPU pipeline, particularly with physically-indexed caches, and virtually-indexed caches introduce new hardware design issues.