With 5-level paging, we effectively incur five additional memory dereferences per pointer dereference. This is incredibly expensive!
Observation: memory access locality.
VPN->PFN
mappings!MMU employs a fast-lookup hardware cache called “associative memory” or translation lookaside buffer (TLB).
When a virtual address is dereferenced, CPU will lookup VPN in the TLB. If there is a mapping (TLB hit), you don’t go through the page tables, you already have the PFN! Access physical memory using PFN and offset.
If the VPN isn’t in the TLB, then hardware performs a page table walk. Once PFN is derived, CPU installs VPN->PFN mapping into the TLB and then restart the memory dereference so that it is a TLB hit.
Assume that:
Compute effective access time (EAT) as follows:
EAT = (1 + e) a + (2 + e)(1 - a)
- If TLB hit, then just incur TLB lookup and memory cycle
EAT = a + ea + 2 + e - ea - 2a
EAT = 2 + e - a
- Assuming a high TLB-hit ratio and a low TLB lookup time, EAT approaches the
cost of 1 memory cycle (worth it!)
What should we do with TLB contents on context switches? We know that PTBR will get swapped out during context switch, so the VPNs in the TLB shouldn’t make sense anymore..
Option 1: flush the entire TLB
load cr3
instruction: load page table base and flush TLBOption 2: attach ID to TLB entries
x86 also has INVLPG addr
instruction, invalidates 1 TLB entry
munmap()
, region is no longer mapped