Recall FFS disk layout:
Writes require several steps:
What if the system crashes between any of these steps? Disk only provides atomic writes one sector at a time… the data structures may not all be within the same sector. Like race conditions in concurrent programs, but can’t lock out a failure with a lock :(
ext2 is a port of FFS. Say that we currently have a small ext2 partition with just one empty directory in it.
In order to create a new empty file foo
in filesystem, we read three pieces of
the filesystem into memory and mutate them:
foo
foo
in the inode arrayfoo
in the directory’s datafoo
yetLet’s analyze possible crash scenarios. Define B, I, D as follows:
foo
(I)foo
to dir data block (D)And define the contents for these updates as follows:
B = 01000 ---> B' = 01010
I = garbage ---> I' = initialized
D = {., ..} ---> D' = {., .., foo}
Any subset can be written out to disk!
B I D ---> Consistent (new data lost)
B' I D ---> Inconsistency! Bitmap says I was allocated,
but no one is using it (leak)
B I' D ---> As if nothing happened! we wrote to the inode
but map still says its garbage
B I D' ---> SERIOUS PROBLEMS: dentry exists, but points to garbage inode.
bitmap says that inode is free, can be taken by another file.
B' I' D ---> Inconsistency! Bitmap says I was allocated, and we wrote to I,
but no one uses I.
B' I D' ---> MOST SERIOUS PROBLEM! FS is consistent according to bitmap and
dentry, but inode has garbage data.
B I' D' ---> Inconsistency! Dentry refers to valid I, but bitmap says I is free.
I can be taken by another file.
B' I' D' ---> Consistent (new data persisted)
In these crash scenarios, data loss isn’t the primary concern – we care more about filesystem consistency. Ruining data structures makes the fs unusable!
fsck: file system consistency check
B' I D'
).B I D'
?)Persistently write intent to log/journal, then update filesystem
Better than fsck:
Let’s first write all of the block updates to the journal and then update the file system:
If you crash after committing to the log, just replay changes from the log!
This was an example of physical journaling, as oppposed to logical journaling, where we write a logical record of the operation to the log:
Motivation: journaling is expensive. Every FS write requires two disk writes, two seeks. Balance consistency and performance…
Last updated: 2023-04-08