Friday, May 18, 2018

Proxmox, ZFS, and Memory.

I've run into a really interesting issue with Proxmox and ZFS. This is a scenario where Linux's goal of caching everything and its dog really bites us, because it doesn't really seem to play well with the ZFS ARC.

ZFS, ARC, and VMs

For those not in the know, ZFS is the Zettabyte File System, I highly recommend it, as it is awesome. The ARC is ZFS's read cache, called the Adaptive Read Cache. Great, with that out of the way, what was happening?

Well, first off, I got some really bum advice from the Internet at large. That happens from time to time. We have been having performance issues with our VMs for a while, and a knowledgeable source suggested dropping the ARC max to comically low levels. Why? Because "you don't need it for virtual machines."

The idea is that your guest OS will do its own caching, and so you don't want to cache twice. Okay, I'll buy that for a moment. If a Windows guest gets a request for a file that Windows already has cached, it's not going to go to the virtual disk for it, and as such it's not going to hit the ARC even if that file is in the ARC.

Good theory! Except for the fact that it's terrible. For those who haven't read up on the ARC, the ARC is amazing. Basically every operating system and RAID controller, and whatever else that uses cache uses a first-in-first-out caching method. This has long been considered a dumb way to cache data, but no one has bothered to come up with a better system.

Enter the ARC. The ARC caches blocks of data based on interesting bits of information like how often the block is accessed, or how close the block is to other blocks that are commonly accessed. Basically, the folks who made ZFS put in a good effort to predict which blocks are going to be useful to have in memory, and as it turns out, that effort was well worth it because it blows FIFO out of the water.

With that little nugget in mind, don't drop the ARC max! That's crazy talk, and you will immediately regret it. The ARC is your friend.

ZFS ARC Min/Max

Now, best I can tell, if you don't manually set the ARC max, the ARC will increase in size as it's needed. I think the default ARC max goal is 8GB (the goal being what it tries to achieve, assuming there's no other pressure on memory). As pressure on memory increases, the ARC will shrink. It's very selfless like that.

What I didn't realize would happen from setting the ARC max to a comically low setting is that this put ZFS at a disadvantage with Linux also trying to cache data from the file system. Kinda makes me wonder if it's a bug, actually.

As VMs would pull data from disk, Linux would attempt to cache the data itself, as did the ARC. Well, Linux caching data was apparently seen as pressure on memory as far as the ARC was concerned. So the ARC shrank, and shrank, and shrank until it got down to about 50MB in size. At that point the whole system would lock up pretty hard.

Lo and behold, there's an ARC min setting. I've never had to set that before. Well, I never had to set the ARC max before either.

If you need to set the ARC max, make sure you set the ARC min as well.

You can find these settings on Linux in /sys/module/zfs/parameters/zfs_arc_max and zfs_arc_min. Or set them in /etc/modprobe.d/zfs.conf

options zfs zfs_arc_max=1342177280
options zfs zfs_arc_min=1073741824

 That will give you a max of 10GB and a min of 8GB. No matter the pressure, the ARC will stubbornly refuse to shrink below 8GB. 

No comments:

Post a Comment