The joys of btrfs and OpenSuSE – or “no space left on device”
At one point during the installation of OpenSuSE 12.1 on my Thinkpad I got a little adventurous and selected “btrfs” as the file system of choice. I wanted to have one large partition, but usually like /home to be apart from the rest so that I can keep all my data while doing a reinstall, upgrade or whatever. btrfs seemed a great choice to combine the two as it supports “subvolumes” which can be handled almost like own file systems. But alas, the YaST installer kindly noted that a “home” subvolume for “/” was not yet supported – so there I go again: how do you split up a rather small, albeit very fast, 128 GB SSD? I went for 20 GB for “/” and about 100 GB for “/home”, so swap and a ext2 “/boot” (since grub supposedly can boot a kernel from btrfs, but it was not really recommended, it would furthermore allow me to crypt both “/” and “/home” later on if I wanted to). This is where trouble actually started, although I was unaware of it.
No space left on device – What GNU tools say
Together with btrfs OpenSuSE installs a tool called “snapper” which manages another feature of btrfs: snapshots. While this is actually great, it really sucks if you don’t know that your system does them and got me into very deep sh!@$&. After working with the system for about 3 months and really enjoying it, I noticed my root partition filling up. I thought it normal, since I kept adding some software, but once Gnome keeps telling you that there is less than 1 GB left in “/” you try to do something. Luckily the pop-up with the warning also offers an inspection of the file system which showed that “/usr” was using over 8 GB of disk space. Hmm, ok – well that’s where the software lives… But why on earth is the next largest directory “/usr/share” with only 3 GB of disk usage? Where are those other 5 GB?!? First thought: Well, btrfs does inodes differently, since they are actually part of the normal file system and not set aside like in other file systems. A “df -i” actually looks like this:
juckel@denkbrett:~> df -i Filesystem Inodes IUsed IFree IUse% Mounted on rootfs 0 0 0 - / devtmpfs 1012789 515 1012274 1% /dev tmpfs 1014799 18 1014781 1% /dev/shm tmpfs 1014799 507 1014292 1% /run /dev/sda3 0 0 0 - / tmpfs 1014799 12 1014787 1% /sys/fs/cgroup tmpfs 1014799 1 1014798 1% /media tmpfs 1014799 507 1014292 1% /var/lock tmpfs 1014799 507 1014292 1% /var/run /dev/sda1 38000 43 37957 1% /boot /dev/sda4 0 0 0 - /home
But then again, 5 GB is a LOT of blocks that are used for metadata… But the whole thing was still working, and I did not have time to worry about it, until I was actually pushed into it yesterday. The whole session froze and only a very hard reboot was possible. After this one it took forever to boot and once the X-login appears the keyboard is not working. Well, a mouse is nice and a on-screen keyboard would also do the trick, but then again, no thanks. I tried to power the whole thing off, and there was my new friend for the next hour: “No space left on device”. In this case “Could not load xklavier keyboard support: No space left on device” – well that at least explains my inability to type. I need to fix this NOW! Let’s add a “1” to the grub command line to get into rescue mode. Shit is really piling up, when your runlevel 1 refuses to boot because “Cannot insmod XYZ: No space left on device”. Luckily, the second try got me a shell and I tried to get some breathing room to fix my system.
No space left on device – What’s really happening
This is where things got even more interesting. “df -h” said that I had about 350 MB left on “/”. So far so good, but then why does everything I try result in “no space left on device”? Weird… To top it all off, as I tried to get rid of some unpacked installation sources in “/root”, I got “cannot rm XYZ: No space left on device”. Now, come on – are you kidding me? You are supposed to delete the stuff and not add to it – somebody must be watching me right now and laughing his or her ass off. At some point I resigned and went to using the “btrfs” command which allows some modifications to the btrfs file system. It is actually an awesome tool since it allows to work with a mounted file system, but it is yet incomplete. What got me to the bottom of all this is the filesystem info of “btrfs”.
denkbrett:~ # btrfs filesystem show ... Label: none uuid: 40507634-cb2c-4678-8b3d-d014e1b88d78 Total devices 1 FS bytes used 20.00GB devid 1 size 20.00GB used 20.00GB path /dev/sda3
Hmm – somebody is not telling the truth. “df” insists that there is some space left… But since everything I see (“no space left on device”) indicates that “btrfs” is actually right and “df” might be wrong, let’s go down this path for some time. I finally am able to remove a large tar-ball lying around – that should have cleared up some room, but why is it not showing up in “btrfs filesystem show”? And wait a second, why is my home file system also showing two different numbers for usage?
denkbrett:~ # btrfs filesystem show Label: none uuid: b3b42cba-c08e-4401-9382-6db379176a1f Total devices 1 FS bytes used 90.21GB devid 1 size 100.00GB used 85.29GB path /dev/sda4 ...
This is weird. This is almost like on our large NAS for the home-directories of the ZIH users. BINGO! That NAS uses snapshots to keep old data blocks to be able to revert changes. I remember reading the article about btrfs in “c’t” (https://www.heise.de/artikel-archiv/ct/2011/23/174 and https://www.heise.de/artikel-archiv/ct/2011/24/190) – btrfs can also do snapshots. Maybe my OS is trying to be smart?! Let’s take “snapper” and see what’s there:
denkbrett:~ # snapper list Type | # | Pre # | Date | Cleanup | Description | Userdata -------+------+-------+--------------------------+----------+-------------------+--------- single | 0 | | | | current | single | 1 | | Thu Nov 10 14:00:01 2011 | timeline | timeline | single | 607 | | Thu Dec 1 00:00:01 2011 | timeline | timeline | single | 1381 | | Sun Jan 1 00:00:01 2012 | timeline | timeline | pre | 1609 | | Mon Jan 9 15:13:34 2012 | number | zypp(zypper) | post | 1610 | 1609 | Mon Jan 9 15:15:48 2012 | number | | pre | 1611 | | Mon Jan 9 15:16:35 2012 | number | zypp(zypper) | post | 1612 | 1611 | Mon Jan 9 15:16:49 2012 | number | | ... (lots more)
Well this is nice, my OS does snapshots from time to time and before and after a system update. Actually really nice to revert to an old state if something goes wrong. But now I need to get rid of you folks. Unfortunately a “snapper delete all” is not implemented – I have to run the command for every single one of them. Are you kidding me again? Well, let’s “fake” the “all”:
denkbrett:~ # for i in `seq 1 3656`; do snapper delete $i; done
Take this! Well almost – after getting rid of about 20 snapshots I find another known friend (https://bugzilla.novell.com/show_bug.cgi?id=733843):
kernel BUG at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.1.0/linux-3.1/fs/btrfs/inode.c:785! invalid opcode: 0000 [#1] PREEMPT SMP
Ok – reboot, runlevel 1 again. On the second try I get rid of all snapshots and am back to 49 % file system usage!!
Don’t trust “df” or “du” when using btrfs! Keep those snapshots at bay!
To prevent this from happening again, I tried to find out how my system is deciding to do snapshots and why to keep them. A “man snapper” reveals that the snapper configs are in “/etc/snapper/configs” and sure there is a config there for “root”. But what the heck are all those undocumented parameters?
denkbrett:~ # cat /etc/snapper/configs/root # subvolume to snapshot SUBVOLUME="/" # filesystem type FSTYPE="btrfs" # run daily number cleanup NUMBER_CLEANUP="yes" # limit for number cleanup NUMBER_MIN_AGE="1800" NUMBER_LIMIT="100" # create hourly snapshots TIMELINE_CREATE="yes" # cleanup hourly snapshots after some time TIMELINE_CLEANUP="yes" # limits for timeline cleanup TIMELINE_MIN_AGE="1800" TIMELINE_LIMIT_HOURLY="10" TIMELINE_LIMIT_DAILY="10" TIMELINE_LIMIT_MONTHLY="10" TIMELINE_LIMIT_YEARLY="10" # cleanup empty pre-post-pairs EMPTY_PRE_POST_CLEANUP="yes" # limits for empty pre-post-pair cleanup EMPTY_PRE_POST_MIN_AGE="1800"
Some research on the web led me to this great article http://doc.opensuse.org/products/draft/SLES/SLES-admin_sd_draft/cha.snapper.html. I am now running with the following setup – let’s see how it goes:
denkbrett:~ # cat /etc/snapper/configs/root # subvolume to snapshot SUBVOLUME="/" # filesystem type FSTYPE="btrfs" # run daily number cleanup NUMBER_CLEANUP="yes" # limit for number cleanup NUMBER_MIN_AGE="1800" NUMBER_LIMIT="20" # let's only keep 20 snapshots around # create hourly snapshots TIMELINE_CREATE="yes" # cleanup hourly snapshots after some time TIMELINE_CLEANUP="yes" # limits for timeline cleanup TIMELINE_MIN_AGE="1800" TIMELINE_LIMIT_HOURLY="10" TIMELINE_LIMIT_DAILY="2" # let's only keep daily snapshots around for two days TIMELINE_LIMIT_MONTHLY="0" # no I don't want to have things around for 10 months TIMELINE_LIMIT_YEARLY="0" # or even 10 years! Who came up with those defaults?!? # cleanup empty pre-post-pairs EMPTY_PRE_POST_CLEANUP="yes" # limits for empty pre-post-pair cleanup EMPTY_PRE_POST_MIN_AGE="1800"