Home > linux, management, software, wtf > The joys of btrfs and OpenSuSE – or “no space left on device”

The joys of btrfs and OpenSuSE – or “no space left on device”

At one point during the installation of OpenSuSE 12.1 on my Thinkpad I got a little adventurous and selected “btrfs” as the file system of choice. I wanted to have one large partition, but usually like /home to be apart from the rest so that I can keep all my data while doing a reinstall, upgrade or whatever. btrfs seemed a great choice to combine the two as it supports “subvolumes” which can be handled almost like own file systems. But alas, the YaST installer kindly noted that a “home” subvolume for “/” was not yet supported – so there I go again: how do you split up a rather small, albeit very fast, 128 GB SSD? I went for 20 GB for “/” and about 100 GB for  ”/home”, so swap and a ext2 “/boot” (since grub supposedly can boot a kernel from btrfs, but it was not really recommended, it would furthermore allow me to crypt both “/” and “/home” later on if I wanted to). This is where trouble actually started, although I was unaware of it.

No space left on device – What GNU tools say

Together with btrfs OpenSuSE installs a tool called “snapper” which manages another feature of btrfs: snapshots. While this is actually great, it  really sucks if you don’t know that your system does them and got me into very deep sh!@$&. After working with the system for about 3 months and really enjoying it, I noticed my root partition filling up. I thought it normal, since I kept adding some software, but once Gnome keeps telling you that there is less than 1 GB left in “/” you try to do something. Luckily the pop-up with the warning also offers an inspection of the file system which showed that “/usr” was using over 8 GB of disk space. Hmm, ok – well that’s where the software lives… But why on earth is the next largest directory “/usr/share” with only 3 GB of disk usage? Where are those other 5 GB?!? First thought: Well, btrfs does inodes differently, since they are actually part of the normal file system and not set aside like in other file systems. A “df -i” actually looks like this:

juckel@denkbrett:~> df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
rootfs 0 0 0 - /
devtmpfs 1012789 515 1012274 1% /dev
tmpfs 1014799 18 1014781 1% /dev/shm
tmpfs 1014799 507 1014292 1% /run
/dev/sda3 0 0 0 - /
tmpfs 1014799 12 1014787 1% /sys/fs/cgroup
tmpfs 1014799 1 1014798 1% /media
tmpfs 1014799 507 1014292 1% /var/lock
tmpfs 1014799 507 1014292 1% /var/run
/dev/sda1 38000 43 37957 1% /boot
/dev/sda4 0 0 0 - /home

But then again, 5 GB is a LOT of blocks that are used for metadata… But the whole thing was still working, and I did not have time to worry about it, until I was actually pushed into it yesterday. The whole session froze and only a very hard reboot was possible. After this one it took forever to boot and once the X-login appears the keyboard is not working. Well, a mouse is nice and a on-screen keyboard would also do the trick, but then again, no thanks. I tried to power the whole thing off, and there was my new friend for the next hour: “No space left on device”. In this case “Could not load xklavier keyboard support: No space left on device” – well that at least explains my inability to type. I need to fix this NOW! Let’s add a “1″ to the grub command line to get into rescue mode. Shit is really piling up, when your runlevel 1 refuses to boot because “Cannot insmod XYZ: No space left on device”. Luckily, the second try got me a shell and I tried to get some breathing room to fix my system.

No space left on device – What’s really happening

This is where things got even more interesting. “df -h” said that I had about 350 MB left on “/”. So far so good, but then why does everything I try result in “no space left on device”? Weird… To top it all off, as I tried to get rid of some unpacked installation sources in “/root”, I got “cannot rm XYZ: No space left on device”. Now, come on – are you kidding me? You are supposed to delete the stuff and not add to it – somebody must be watching me right now and laughing his or her ass off. At some point I resigned and went to using the “btrfs” command which allows some modifications to the btrfs file system. It is actually an awesome tool since it allows to work with a mounted file system, but it is yet incomplete. What got me to the bottom of all this is the filesystem info of “btrfs”.

denkbrett:~ # btrfs filesystem show
...
Label: none uuid: 40507634-cb2c-4678-8b3d-d014e1b88d78
 Total devices 1 FS bytes used 20.00GB
 devid 1 size 20.00GB used 20.00GB path /dev/sda3

Hmm – somebody is not telling the truth. “df” insists that there is some space left… But since everything I see (“no space left on device”) indicates that “btrfs” is actually right and “df” might be wrong, let’s go down this path for some time. I finally am able to remove a large tar-ball lying around – that should have cleared up some room, but why is it not showing up in “btrfs filesystem show”? And wait a second, why is my home file system also showing two different numbers for usage?

denkbrett:~ # btrfs filesystem show
Label: none uuid: b3b42cba-c08e-4401-9382-6db379176a1f
 Total devices 1 FS bytes used 90.21GB
 devid 1 size 100.00GB used 85.29GB path /dev/sda4
...

This is weird. This is almost like on our large NAS for the home-directories of the ZIH users. BINGO! That NAS uses snapshots to keep old data blocks to be able to revert changes. I remember reading the article about btrfs in “c’t” (https://www.heise.de/artikel-archiv/ct/2011/23/174 and https://www.heise.de/artikel-archiv/ct/2011/24/190) – btrfs can also do snapshots. Maybe my OS is trying to be smart?! Let’s take “snapper” and see what’s there:

denkbrett:~ # snapper list
Type | # | Pre # | Date | Cleanup | Description | Userdata
-------+------+-------+--------------------------+----------+-------------------+---------
single | 0 | | | | current |
single | 1 | | Thu Nov 10 14:00:01 2011 | timeline | timeline |
single | 607 | | Thu Dec 1 00:00:01 2011 | timeline | timeline |
single | 1381 | | Sun Jan 1 00:00:01 2012 | timeline | timeline |
pre | 1609 | | Mon Jan 9 15:13:34 2012 | number | zypp(zypper) |
post | 1610 | 1609 | Mon Jan 9 15:15:48 2012 | number | |
pre | 1611 | | Mon Jan 9 15:16:35 2012 | number | zypp(zypper) |
post | 1612 | 1611 | Mon Jan 9 15:16:49 2012 | number | |
... (lots more)

Well this is nice, my OS does snapshots from time to time and before and after a system update. Actually really nice to revert to an old state if something goes wrong. But now I need to get rid of you folks. Unfortunately a “snapper delete all” is not implemented – I have to run the command for every single one of them. Are you kidding me again? Well, let’s “fake” the “all”:

denkbrett:~ # for i in `seq 1 3656`; do snapper delete $i; done

Take this! Well almost – after getting rid of about 20 snapshots I find another known friend  (https://bugzilla.novell.com/show_bug.cgi?id=733843):

kernel BUG at
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.1.0/linux-3.1/fs/btrfs/inode.c:785!
invalid opcode: 0000 [#1] PREEMPT SMP

Ok – reboot, runlevel 1 again. On the second try I get rid of all snapshots and am back to 49 % file system usage!!

Lesson learned

Don’t trust “df” or “du” when using btrfs! Keep those snapshots at bay!

Aftermath

To prevent this from happening again, I tried to find out how my system is deciding to do snapshots and why to keep them. A “man snapper” reveals that the snapper configs are in “/etc/snapper/configs” and sure there is a config there for “root”. But what the heck are all those undocumented parameters?

denkbrett:~ # cat /etc/snapper/configs/root
# subvolume to snapshot
SUBVOLUME="/"
# filesystem type
FSTYPE="btrfs"
# run daily number cleanup
NUMBER_CLEANUP="yes"
# limit for number cleanup
NUMBER_MIN_AGE="1800"
NUMBER_LIMIT="100"
# create hourly snapshots
TIMELINE_CREATE="yes"
# cleanup hourly snapshots after some time
TIMELINE_CLEANUP="yes"
# limits for timeline cleanup
TIMELINE_MIN_AGE="1800"
TIMELINE_LIMIT_HOURLY="10"
TIMELINE_LIMIT_DAILY="10"
TIMELINE_LIMIT_MONTHLY="10"
TIMELINE_LIMIT_YEARLY="10"
# cleanup empty pre-post-pairs
EMPTY_PRE_POST_CLEANUP="yes"
# limits for empty pre-post-pair cleanup
EMPTY_PRE_POST_MIN_AGE="1800"

Some research on the web led me to this great article http://doc.opensuse.org/products/draft/SLES/SLES-admin_sd_draft/cha.snapper.html. I am now running with the following setup – let’s see how it goes:

denkbrett:~ # cat /etc/snapper/configs/root
# subvolume to snapshot
SUBVOLUME="/"
# filesystem type
FSTYPE="btrfs"
# run daily number cleanup
NUMBER_CLEANUP="yes"
# limit for number cleanup
NUMBER_MIN_AGE="1800"
NUMBER_LIMIT="20"    # let's only keep 20 snapshots around
# create hourly snapshots
TIMELINE_CREATE="yes"
# cleanup hourly snapshots after some time
TIMELINE_CLEANUP="yes"
# limits for timeline cleanup
TIMELINE_MIN_AGE="1800"
TIMELINE_LIMIT_HOURLY="10"
TIMELINE_LIMIT_DAILY="2"  # let's only keep daily snapshots around for two days
TIMELINE_LIMIT_MONTHLY="0" # no I don't want to have things around for 10 months
TIMELINE_LIMIT_YEARLY="0"  # or even 10 years! Who came up with those defaults?!?
# cleanup empty pre-post-pairs
EMPTY_PRE_POST_CLEANUP="yes"
# limits for empty pre-post-pair cleanup
EMPTY_PRE_POST_MIN_AGE="1800"

Categories: linux, management, software, wtf Tags:
  1. March 13th, 2012 at 15:15 | #1

    Without intent to start a flame war:

    That’s why I moved away from linux on the desktop :)

    Cool story, though, I guess you’re lucky to have been spared removing the hard drive and doing the repair on another machine.

  2. March 13th, 2012 at 15:20 | #2

    @jupp
    I just wished my system would tell me, what it does :-) I mean, how hard can it be, the snapper config for “root” comes from YaST – so the little house elfs should also be able to tell me that me disk is full because of the snapshots…

    I guess, the POSIX tools have to be extended to cope with this “shadow” disk usage somehow. Let’s see how this goes. At least it seems that everybody agrees that btrfs is the future for the Linux enviroment.

  3. March 16th, 2012 at 01:45 | #3

    @jupp: time machine eats up disk-space as well without telling the user explicitly. So your mac does not do much better (in my opinion) . . . ;)

    • March 16th, 2012 at 22:07 | #4

      @willi: Yes, but time machine (as well as the Windows7 internal backup) use an additional partition, so that they do not fill up your work file systems. While snapshoting is actually quite sweet, it has its disadvantages, too.

  4. March 20th, 2012 at 10:13 | #5

    Thanks for sharing! Do I need to restart snapper when I change the config file?

    • March 20th, 2012 at 11:23 | #6

      I did not, but then again it seems to me that the snapper is invoked via cron for the timebased snapshots on via zypper for snapshots during updates – but I also had to shut down my system shortly after the change…

  5. March 20th, 2012 at 13:36 | #7

    @juckel
    I replaced my superdrive with an additional hard drive that I also use as backup drive – and TimeMachine already managed to create 600GB of BackUps for my 128GB SSD the OS runs on . . .

  6. Nodrog
    May 4th, 2012 at 10:23 | #8

    Good write up! I was experimenting with btrfs on SLED 11 SP2 setup. All was going perfect for about 10 days when it just didn’t boot. I tried all the options to repair system etc. The error about no disk space didn’t make sense, but it does now…snapshots!! I was just about to wipe the system and revert to ext3 when the Google gods and Nerdyroom saved the day!
    - Gordon (Australia)

  7. Jake
    December 17th, 2012 at 16:22 | #9

    Thanks for the article, it got me moving in the right direction to solve my ‘no space left on device’ issues. However I would suggest a slightly modified approach. Rather than manually delete all the snaps, I just set the values in snapper config file as you did and then ran the snapper cleanup commands that are covered on the document you linked above in order to delete the old snapshots.

    snapper cleanup number; snapper cleanup timeline; snapper cleanup empty-pre-post

  8. Daniel
    February 12th, 2013 at 21:38 | #10

    Thanks for this post and for Jake’s streamlined solution, this saved my OpenSUSE desktop as well.

  9. robin
    May 2nd, 2013 at 21:34 | #11

    great article! is there anything I can do if I adding the 1 after the line linux …. in grub2 still does not get me to a shell (eg rescue discs or anything else)?

    br

    robin

  10. Remus
    December 20th, 2013 at 16:26 | #12

    This is exactly what happened to me yesterday and managed to solve it with this article. Thanks!

  11. Artur
    December 26th, 2013 at 00:24 | #13

    Yupp, same here (today). Fixed the problem with this article within 5 minutes.
    Thanks!

  1. April 9th, 2014 at 21:44 | #1