COMS W4118 Operating Systems I

Journaling

The Consistent Update Problem

Motivation: non-atomic updates

Recall FFS disk layout:

ffs-layout

Writes require several steps:

What if the system crashes between any of these steps? Disk only provides atomic writes one sector at a time… the data structures may not all be within the same sector. Like race conditions in concurrent programs, but can’t lock out a failure with a lock :(

Example: ext2 file creation

ext2 is a port of FFS. Say that we currently have a small ext2 partition with just one empty directory in it.

In order to create a new empty file foo in filesystem, we read three pieces of the filesystem into memory and mutate them:

ext2-foo

Let’s analyze possible crash scenarios. Define B, I, D as follows:

And define the contents for these updates as follows:

B = 01000   ---> B' = 01010
I = garbage ---> I' = initialized
D = {., ..} ---> D' = {., .., foo}

Any subset can be written out to disk!

B   I   D  --->  Consistent (new data lost)

B'  I   D  --->  Inconsistency! Bitmap says I was allocated,
                 but no one is using it (leak)

B   I'  D  --->  As if nothing happened! we wrote to the inode
                 but map still says its garbage

B   I   D' --->  SERIOUS PROBLEMS: dentry exists, but points to garbage inode.
                 bitmap says that inode is free, can be taken by another file.

B'  I'  D  --->  Inconsistency! Bitmap says I was allocated, and we wrote to I,
                 but no one uses I.

B'  I   D' --->  MOST SERIOUS PROBLEM! FS is consistent according to bitmap and
                 dentry, but inode has garbage data.

B   I'  D' --->  Inconsistency! Dentry refers to valid I, but bitmap says I is free.
                 I can be taken by another file.

B'  I'  D' --->  Consistent (new data persisted)

In these crash scenarios, data loss isn’t the primary concern – we care more about filesystem consistency. Ruining data structures makes the fs unusable!

fsck: file system consistency check

Journaling

Concept: write-ahead logging

Persistently write intent to log/journal, then update filesystem

Better than fsck:

Example: ext3 physical journaling

Let’s first write all of the block updates to the journal and then update the file system:

  1. Commit dirty blocks to journal as one transaction
  2. Write commit record (finalize journal entry)
  3. Write dirty blocks to real file system
  4. Reclaim journal space for transaction (we don’t need it anymore)

ext3-journal

If you crash after committing to the log, just replay changes from the log!

This was an example of physical journaling, as oppposed to logical journaling, where we write a logical record of the operation to the log:

Journaling Write Orders

  1. Journal writes, then FS writes
    • otherwise, crash will leave FS inconsistent but no journal record to patch it up
  2. FS writes, then reclaim journal space
    • otherwise, if you crash before you finish the FS write, the journal record to patch it up will already be gone!
  3. Journal writes, the commit record, then FS writes
    • we need the commit record to tell us that we journaled the entirety of the change. Otherwise, the journal may have garbage in it!

ext3 Journaling Modes

Motivation: journaling is expensive. Every FS write requires two disk writes, two seeks. Balance consistency and performance…


Last updated: 2023-04-08