Okay, I love Linux. But I always surprised that I can love Linux ever more everyday. The possibility is basically endless!
I asked about raid0 last week and got a lot of helpful answers. However, this answer from d3Xter really helped me figures how I actually want to configure my system. I decided to buy 2x500GB HDD and 120GB NVME SSD. It’s cheap, very cheap. Like $15 total. If there is anything wrong with them, I lost nothing. But boy did I get surprised. I tested BG3, a game that requires SSD to perform well and I see no difference between my old SSD and my new setup. It loads as fast (well, maybe 10-15s slower, but who cares?), and I do not need to wait for the textures to load, at all. Boot time is amazing. I notice no difference at all between this HDD and my last SSD setup. Which is insane!
But the installation is confusing as hell. I write this pseudo-guide for anyone intersted.
FIRST: DO NOT STORE ANY DATA IN THIS SETUP FOR ANYTHING REMOTE TO IMPORTANT!
I favor performance for the cheapest possible above all else. Unless you want to buy 4 HDD to get raid10, do not store anything other than games and your apps on it. Do not use raid5 in this setup, it sucks.
Second: Get yourself an Arch live USB stick with at least 8GB (4 might works, but smallest one I have is 8GB). I think you can use it on other distro, but Arch is the easiest if you want to install IT on the bcache drive. And make sure you booted into UEFI by typing:
ls /sys/firmware/efi/efivars/
Third: Installing bcache. After you have internet on your live system, remount your live iso so we can use the AUR.
mount -o remount,size=8G /run/archiso/cowspace
Then, install base-devel and git
pacman -S base-devel git --ignore linux
If you have slow internet, you can go to 2nd tty by pressing ctrl + alt + f2 and login as root there to do the next step, otherwise wait for the process to finish. We need to setup a user since makepkg wont run on root. So, type:
useradd -m -G wheel user && passwd user
and set your password
Then we need the user to able to act as superuser. So, we edit visudo
and add this line at the bottom: user ALL=(ALL:ALL)ALL
save it and press ctrl + alt + f3, login as user you just set. Clone the bcache-tools AUR repo:
git clone https://aur.archlinux.org/bcache-tools.git && cd bcache-tools
If you want you can inspect the PKGBUILD, but we will proceed to make it, type:
makepkg -sri
Once it is done, it will throw error about vmliuz is not found, but the package itself is installed, you just need to tell the kernel to load it. We can achieve that by executing:
sudo modprobe bcache && sudo partprobe
Fouth: Configuring the bcache, btrfs, and raid0
List your drive with lsblk
identify your HDD and your SSD. Let’s say in this instance, we have /dev/sda and /dev/sdb as our HDD and /dev/nvme0n1 as our SSD.
I use cfdisk to partition my drive, but you can use whatever you want. I make one partition for both HDD, sda1 and sdb1, and make 3 partitions for my SSD. DO NOT FORMATI IT YET.
/dev/sda1 : 500 GB HDD with 500GB partition
/dev/sdb1 : same as before
/dev/nvme0n1p1 : 118G SSD partition as the cache
/dev/nvme0n1p2 : 1G partition as /boot. Even if you have one kernel, do not make a /boot partition less than 1G. I knew it in hard way.
/dev/nvme0n1p3 : 200M partition as /boot/efi. Same as before, better safe than sorry.
YOU NEED TO HAVE SEPARATE BOOT AND EFI PARTITION, OTHERWISE YOU WILL HAVE AN UNBOOTABLE SYSTEM!
Okay, now we configure our raid.
make-bcache -B /dev/sda1 /dev/sdb1
This command will make a partition (?) called bcache0 and bcache1 under both /dev/sdb and /dev/sda if you use
lsblk
Now, we make raid out of those two.
mkfs.btrfs -L ARCHACHED -d raid0 /dev/bcache0 /dev/bcache1
Then the cache.
make-bcache -C /dev/nvme0n1p1
We can register the cache for our raid setup by its cset.uuid. To know what is the uuid, we can use this command.
bcache-super-show /dev/nvme0n1p1 |grep cset.uuid
Example of the output is:
cset.uuid fc3aac3b-9663-4067-88af-c5066a6c661b
From that, we can register it by using this command:
echo fc3aac3b-9663-4067-88af-c5066a6c661b > /sys/block/bcache0/bcache/attach
echo fc3aac3b-9663-4067-88af-c5066a6c661b > /sys/block/bcache1/bcache/attach
Done! If you lsblk
now, you can see that there is bcache0 and bcache1 under the nvme0n1p1 partition.
Now, we can configure our boot partition.
mkfs.ext4 /dev/nvme0n1p2
(Remember, this is the 1G partition)
mkfs.vfat -F 32 /dev/nvme0n1p3
(Remember, this is the 200M partition)
We can then mount all of those drive to install Arch.
mount /dev/bcache0 /mnt
btrfs subvolume create /mnt/root
btrfs subvolume create /mnt/var
(optional)
btrfs subvolume create /mnt/home
(optional)
umount /mnt
mount /dev/bcache0 -o subvol=root,compress=lzo /mnt/
mount --mkdir /dev/bcache0 -o subvol=home,compress=lzo /mnt/home
mount --mkdir /dev/bcache0 -o subvol=var,compress=lzo /mnt/var
mount --mkdir /dev/nvme0n1p2 /mnt/boot
mount --mkdir /dev/nvme0n1p3 /mnt/boot/efi
Fifth: Install Arch Follow arch wiki, lol.
Last: bcache-tools, mkinitcpio, and grub. Go back to tty3, the user one. And copy the bcache-tools package to your newly installed /mnt
sudo cp bcache-tools-1.1-1-x86_64.pkg.tar.zst /mnt
Then go back tty1 and install it on you arch-chroot.
pacman -U bcache-tools-1.1-1-x86_64.pkg.tar.zst
Now, we configure the mkinitcpio. Edit your mkinitcpio by using you editor of choice. And add bcache in in MODULES so it looks like this:
MODULES=(bcache)
And in HOOKS, add it after block and before filesystem. Example:
HOOKS=(base udev block bcache flesystem)
Make your initcpio by executing this command:
mkinitcpio -p linux
Grub is the last. Install it as usual. Just make sure you did the partition right, as mentioned before.
That’s it. You just install Arch Linux on a bcache system. It’s complicated, and headache inducing for sure, but I assure you it is totally worth that pain.
Note: If you have problem after you reboot, and want to arch-chroot, you need to install bcache-tools again.
If you want to reformat anything and it shows the device as busy after you did the partprobe
, you need to tell the kernel to stop any activity to the drive. In this example the sda is the one you want to edit.
echo 1 > /sys/block/sda/sda1/bcache/stop
Now, if you accidentally execute the command, you can use partprobe
again to show it up.
Reference: https://wiki.archlinux.org/title/bcache https://wiki.archlinux.org/title/btrfs https://gist.github.com/HardenedArray/4c1492f537d9785e19406eb5cd991735 https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices.html
And in kernel 6.7 BcacheFS will arrive!
Then wait for bcachefs to land in 6.7
BcacheFS is a very different animal from Bcache; it does not provide block device caching. See bcachefs.org/FAQ/
I can’t quite figure out what would be the use cases where bcache would excel, except for hdds without cache or systems with very limited ram. Can you help me out with that?
Basically the idea is that if you have a lot of data, HDDs have much bigger capacities for the price, whereas large SSDs can be expensive. SSDs have gotten cheap, but you can get used enterprise drives on eBay with huge capacities for incredibly cheap. There’s 12TB HDDs for like $100. 12TB of SSDs would run you several hundreds.
You can slap bcache on a 512GB NVMe backed by a 8TB HDD, and you get 8TB worth of storage, 512GB of which will be cached on the NVMe and thus really fast. But from the user’s perspective, it’s just one big 8TB drive. You don’t have to think about what is where, you just use it. You don’t have to be like, I’m going to use this VM so I’ll move it to the SSD and back to the HDD when done. The first time might be super slow but subsequent use will be very fast. It also caches writes too, so you can write up to 512GB really fast in this example and it’ll slowly get flushed to the HDD in the background. But from your perspective, as soon as it’s written to the SSD, the data is effectively commited to disk. If the application calls fsync to ensure data is written to disk, it’ll complete once it’s fully written to the SSD. You get NVMe read/write speeds and the space of an HDD.
So one big disk for your Steam library and whatever you play might be slow on the first load but then as you play the game files gets promoted to the NVMe cache and perform mostly at NVMe speeds, and your loading screens are much shorter.
So one big disk for your Steam library and whatever you play might be slow on the first load but then as you play the game files gets promoted to the NVMe cache and perform mostly at NVMe speeds, and your loading screens are much shorter.
I really love/hate how you can immediately understand the practical application of new technologies through the use of games.
What are the implications in regards to the lifespan of the disk used as cache? Any potential downsides?
I don’t know, it’s going to depend a lot on usage pattern and cache hit ratio. It will probably do a lot more writes than normal to the cache drive as it evicts older stuff and replaces it. Everything has tradeoffs in the end.
Another big tradeoff depending on the cache mode (ie. writeback mode) if the SSD dies, you can lose a fair bit of data. Not as catastrophic as a RAID0 would but pretty bad. And you probably want writeback for the fast writes.
Thus I had 2 SSDs and 2 HDDs in RAID1, with the SSDs caching the HDDs. But it turns out my SSDs are kinda crap (they’re about as fast as the HDDs for sequential read/writes) and I didn’t see as much benefit as I hoped so now they’re independent ZFS pools.
except for hdds without cache
The “cache” on HDDs is extremely tiny. Maybe a few seconds worth of sequential access at max. It does not exist to cache significant amounts of data for much longer than that.
At the sizes at which bcache is used, you could permanently hold almost all of your performance-critical data on flash storage while having enough space for tonnes of performance-uncritical data; all in the same storage “package”.
In my case, it basically helps me improve random read significantly. My NVMe is fast, like 3GB/s in sequential and 500MB/s in random, but it’s only 120GB. By using it as a cache in a bcache system, once a random read is performed, the data will be copied from HDD to SSD and if the data is requested again the random read will happen from SSD instead of HDD.
Thus, using it to play modern gaming is actually do able. Game that requires fast random read, like Baldur’s Gate 3 and Starfield.
As a lot of people in my og post mentioned, random is more important that sequential. Bcache by default disable sequential cache so you wont fill you cache to fast if a big data is being read, like watching movie, copying video, etc. That’s where Raid0 comes to the rescue. Having Raid0 with 2 drives basically double my sequential read and having 3 triples it.
I use ext4 on bcache with an SSD and 5TB HDD for my home drive. Can recommend. Gonna try the new bcachefs soon too.
I’m using btrfs so I can add more drive on the fly. Saving myself from a headache. But I think bcachefs will make it infinitely easier.
I have a cache drive in my NAS for reads, thinking about putting a second drive in there so I can have a read/write cache array. It makes a huge difference over just having spinning rust. I’d love an all-flash array, but 36TB of SSD would be very expensive right now.
Note to others reading this: If your main use case is gaming (or anything other than storing/processing buttloads of data), I’d suggest just getting a bigger pcie3 drive instead of a faster pcie4/5 drive. Going with a faster drive won’t be a noticeable difference, but having 2-3x the capacity (for the same price) will help.
You’re using btrfs for raid?!
Not who you responded to, but I have a similar setup using ZFS.
6 drives in raid 6, and then an SSD cache.
What kind of SSD cache? L2ARC?
Yep. Half my ram as level one, and then a 500gb SSD as L2.
Definitely more than I need for the L2 as the hit rate is only 15% (vs 99% for ARC), but I don’t think there’s much of a downside to slightly over-sizing it these days (there used to be, but L2 is more ram-efficient now).
Thanks, that’s good to know.
I’m running ZFS on my server and tried an L2 cache at first (a 2 TB NVME on a system with a 64 GB ARC and three mirrored 18 TB HDD vdevs) but it didn’t seem like it was giving me much benefit. I looked into tweaking the settings a bit (prioritizing frequently used over MRU, increasing write rates, etc) but after seeing that most of the advice online was that it wasn’t great for my use case, I gave up and repurposed the drive. However, my use case has changed a bit (I’m using my server for more things) and I may try using the spare 256 GB drive that the 2 TB one displaced as an L2ARC drive now.
Any chance there’s a Debian repo for bcachefs? I’d want to see how it does on an extra drive in my server. Or will I have to compile it the old fashioned way?
For bcache, it is available in apt, I think. You just need to set the ppa. For bcachefs, I think you need to wait for 6.7 to land on Debian or compile it yourself.
Note that bcache and bcachefs are different things. The latter is extremely new and not ready for “production” yet. This post is about bcache.
Didn’t know there was a stand-alone bcache, I’ll have to look into that then.
Thanks for writing this! Getting bcache set up the first time can be confusing, so this certainly helps.
I’ll just drop one warning in here. With a setup very much like described here I’ve had severe data corruption and loss, so please make sure your data is properly and regularly backed up. To me it seemed like almost any unexpected or untimely power off would cause some data loss or significant corruption.
use ssds?