Undelete on ext3? Maybe…

…but not bloody likely.

The first thing to remember about ext3 file recovery is that it isn’t possible, so start thinking about a backup strategy so you won’t ever have to go through this.

My next bit of advice is this: when you rm -rf a directory, make sure that it doesn’t happen to be named ~<your username> which we all know is one’s home directory. When I typed rm -rf ~bdf to remove a directory named ~bdf that was not my home directory, I expected it to take only a second or so. It took a couple seconds of disk thrashing to realize what was happening and frantically hit ^C while silently cursing my own stupidity. Fortunately only directories and files with names starting from A through H were deleted.

Unfortunately that included my GnuCash save files containing my financial records, saved under “gnucash” and last backed up three months previously. I immediately embarked upon a quest to recover the GnuCash save file.

The first thing I did was… … shut down the computer and reboot with a handy Knoppix CD. The volume with the deleted files was sda3. Without mounting it, I copied sda3 to an image file on a separate disk and made the image file read-only for good measure:
dd if=/dev/sda3 of=/mnt/hdc1/sda3_img.dat; chmod a-w /mnt/hdc1/sda3_img.dat

Then I rebooted back into the original system and started the recovery process. The first set of recovery tools I installed was The Sleuth Kit, an excellent forensic analysis toolkit. The Sleuth Kit contains dls, a utility to extract unallocated space in a disk image file to another image file:
dls -f ext -v sda3_img.dat > sda3_unalloc.dat

The unallocated space contains all the blocks in the disk image that are not allocated to files. This hopefully includes blocks allocated to files that were just deleted.

Next I installed foremost [a more up to date alternative is Scalpel], a “data carving” utility. (Foremost was written by a couple of special agents at the U.S. Air Force Office of Special Investigation! Neat!) Data carving refers to the use of header, footer and other structural information to assist in recovering files. GnuCash data files are gzipped XML files, so I supplied the gzip header data “\x1f\x8b\x08″ for foremost to use to detect the start of the file. There isn’t an easy way to detect the ending of a gzip file, so I just specified a maximum size.

To use foremost to recover the files:
foremost -i sda3_unalloc.dat -o recovery

This process yielded 6931 files. I wrote a quick (and dirty!) Perl script to check each one for valid gzip files.

[code lang="perl"]
# pipe "ls" to this script
while (<>) {
    my $ret = system('gunzip', '--test', $_) >> 8;
    if (($ret == 0) || ($ret == 2)) {
        system("mv $_ good/");

Because of the max-size file ending criteria, the script gave lots of warnings about “trailing garbage ignored”, but now I had 3014 valid gzip files in good/. Now lets see which ones contain XML by running “file” on each one with another nasty Perl script:

[code lang="perl"]
while (<>) {
    my $file = `gunzip < $_ 2>/dev/null | file -`;
    if ($file =~ m/XML 1.0 document/) {
        system("mv $_ xml/");

This yielded 19 XML files in xml/. I gunzipped them and grepped for a characteristic GnuCash XML element and…

… nothing. I didn’t find a single GnuCash file. By this time my parallel strategy of downloading my historical transaction data from my online banking accounts was yielding results so I abandoned my quixotic attempts at undeletion.

There were several other things to I could have tried though. For one thing, I had purposely ignored fragmentation as I thought it would be tricky to try matching up non-sequential blocks. From the looks of things, this may be possible.

Still, this was a learning experience. What did I learn? Make frequent backups! Preferably automatic backups.

[Update: Looks like I wasn't sufficiently motivated, unlike this guy.]

About this entry