“You sound like a broken record”
Is something we complain when someone repeats again and again the same concepts. But even broken disks can sometime be useful
DISCLAIMER: No filesystem or device were harmed in the making of this experiment 😉
Image credits: Mick Haupt
In this article I would like to explore the powerful tools we have in Linux to simulate dealing with broken disks, that is, drives that more or less randomly report errors. Why is this important ? Because by simulating errors that will also happen sooner or later in the real world, we are able to create software that is more robust and can withstand any problems on the infrastructure.
Setup
In order not to have troubles in our development system, and to make the process as portable as possible, we start by creating a dummy 1GB disk based on the loop device.
# dd if=/dev/zero of=/myfakedisk.bin bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.446898 s, 2.4 GB/s
# losetup /dev/loop0 /myfakedisk.bin
Now we can use the loop device just like any other block device: we can create a filesystem and mount it
# mkfs.ext4 /dev/loop0
mke2fs 1.46.4 (18-Aug-2021)
Discarding device blocks: done
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: bcba505c-54fa-49e5-852c-b5ea3faa53d0
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
# mkdir /mnt/good && mount /dev/loop0 /mnt/good && echo "test" > /mnt/good/test.txt && umount /mnt/good
our working “virtual disk” is ready, now we can create a faulty one using linux’s device mapper features.
What’s device mapper ?
Image credits: Monstera Production
The Linux Device Mapper is a kernel-level framework that enables the creation of virtual block devices by mapping physical storage devices or logical volumes to these virtual devices. It operates within the Linux kernel, providing a layer for creating, managing, and manipulating storage devices through various mapping techniques such as mirroring, striping, encryption, and snapshots. This framework allows for the implementation of advanced storage features like volume management, RAID, and thin provisioning, offering greater flexibility, scalability, and reliability in managing storage resources within the Linux operating system.
Basically we are going to create a “map” between our working device and a “new” one, with this rough schema:
- from sector 0 to 2047, get the data from the underlying device (because we don’t want to mess with partition table and metadata)
- from sector 2048 to half disk size, return an error, or the original data, with 20% odds of failure
- from half size to the end, return again the data from the underlying device
Disk size can be found with a simple check:
# cat /sys/block/loop0/size
2097152
This kind of mapping is expressed in the dmsetup create
command:
# dmsetup create bad_disk << EOF
0 2048 linear /dev/loop0 0
2048 1047552 flakey /dev/loop0 2048 4 1
1049600 1047552 linear /dev/loop0 1049600
EOF
# ls -l /dev/mapper/bad_disk
lrwxrwxrwx 1 root root 7 Nov 19 17:51 /dev/mapper/bad_disk -> ../dm-0
For each table entry, we need to specify:
- start sector/offset of mapping
- size of the mapping
- which mapper is being used
- options of the mapper (for details refer to the documentation)
In this setup we are using the linear mapper and the flakey one. Another useful one can be delay to simulate very slow disks or dust that emulates the behavior of bad sectors at arbitrary locations, and the ability to enable the emulation of the failures at an arbitrary time.
Let’s try it out
Our backing disk is already formatted, so it’s time to try out the bad one, by mounting and writing some stuff:
# mkdir /mnt/bad && mount /dev/mapper/bad_disk /mnt/bad && cd /mnt/bad
# df -h | grep -E '(^Filesystem|bad)'
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/bad_disk 974M 28K 907M 1% /mnt/bad
# while sleep 1 ; do dd if=/dev/zero of=trytowrite.bin bs=1M count=500 ; done
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.595353 s, 881 MB/s
500+0 records in
500+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.637194 s, 823 MB/s
Message from syslogd@localhost at Nov 19 18:09:15 ...
kernel:[ 8017.117593][T23594] EXT4-fs (dm-0): failed to convert unwritten extents to written extents -- potential data loss! (inode 13, error -30)
Message from syslogd@localhost at Nov 19 18:09:15 ...
kernel:[ 8017.118445][T23976] EXT4-fs (dm-0): failed to convert unwritten extents to written extents -- potential data loss! (inode 13, error -30)
dd: error writing 'trytowrite.bin': Read-only file system
481+0 records in
480+0 records out
503865344 bytes (504 MB, 481 MiB) copied, 0.549939 s, 916 MB/s
dd: failed to open 'trytowrite.bin': Read-only file system
dd: failed to open 'trytowrite.bin': Read-only file system
dd: failed to open 'trytowrite.bin': Read-only file system
dd: failed to open 'trytowrite.bin': Read-only file system
dd: failed to open 'trytowrite.bin': Read-only file system
Disk failure is a success!
As we can see, at first some I/O operations succeeds, then the disk fails and in dmesg
log we can find more details:
[ 7962.645178] EXT4-fs (dm-0): error loading journal
[ 7979.334186] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 8016.759602] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 129024)
[ 8016.759641] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 129280)
[ 8016.759685] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 129536)
[ 8016.759802] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 129870)
[ 8016.760119] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 130625)
[ 8016.760122] Buffer I/O error on device dm-0, logical block 130625
[ 8016.760129] Buffer I/O error on device dm-0, logical block 130626
[ 8016.760131] Buffer I/O error on device dm-0, logical block 130627
[ 8016.760132] Buffer I/O error on device dm-0, logical block 130628
[ 8016.760133] Buffer I/O error on device dm-0, logical block 130629
[ 8016.760134] Buffer I/O error on device dm-0, logical block 130630
[ 8016.760135] Buffer I/O error on device dm-0, logical block 130631
[ 8016.760136] Buffer I/O error on device dm-0, logical block 130632
[ 8016.760137] Buffer I/O error on device dm-0, logical block 130633
[ 8016.760138] Buffer I/O error on device dm-0, logical block 130634
[ 8016.923667] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 54272)
[ 8016.923731] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 54783)
[ 8016.924020] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 55296)
[ 8016.924335] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 60416)
[ 8016.924394] EXT4-fs warning (device dm-0): ext4_end_bio:347: I/O error 10 writing to inode 13 starting block 61803)
[ 8016.961108] Buffer I/O error on dev dm-0, logical block 131103, lost sync page write
[ 8016.961125] Aborting journal on device dm-0-8.
[ 8016.961127] Buffer I/O error on dev dm-0, logical block 131072, lost sync page write
[ 8016.961128] JBD2: Error -5 detected when updating journal superblock for dm-0-8.
[ 8016.961142] EXT4-fs error (device dm-0): ext4_journal_check_start:83: comm kworker/u2:3: Detected aborted journal
[ 8016.966200] EXT4-fs error (device dm-0): ext4_journal_check_start:83: comm dd: Detected aborted journal
In a more general sense, these concepts fall under the principle of “chaos engineering”. This can be also a good practice for junior sysadmins that wants to learn how to cope with a damaged filesystem and try to recover data.
Cleanup
To remove the tracks of our experiments, it’s sufficient to unmount the “bad” disk, remove the mapping and unassociate the loop device with the backing file.
# umount /mnt/bad && rmdir /mnt/bad
# dmsetup remove bad_disk
# losetup -d /dev/loop0