Monday, April 24, 2017

Seeding ZFS Replication

Introduction

Originally this was going to be a post about how to seed ZFS replication under FreeNAS, however I've recently been getting into ZFS on Linux. ZFS on Linux should behave in exactly the same way as ZFS anywhere else, plus or minus features depending on if you're using the official Oracle ZFS.

However, the developers backing FreeNAS have spent a lot of time and energy on the FreeNAS front end, trying to make give it a good balance of being feature rich, powerful, and user friendly. To me the rule is, "When something tries to be everything to everyone, it's usually not good at anything." But I also say, "There is an exception to every rule." FreeNAS is the exception to the rule. The developers have done wonderful things with the FreeNAS front end, and while you could use the same process here in the back end of FreeNAS, I would hate to undermine the developers' hard work. If you've got FreeNAS, stick to the GUI until you have a solid reason not to.

With that in mind, I'm going to be using ZFS on Linux to seed replication. While the tasks are done in the command line, you should be able to follow along, understand each step, and extrapolate what should be done in the FreeNAS front end in order to seed replication.

Why Would We Seed ZFS Replication?

ZFS snapshots when combined with replication create a sort of magical unicorn for computer storage. Getting your data offsite every hour is the sort of thing that, personally, I would have loved to have done at various jobs throughout my career. Jobs where the amount of data was staggeringly large, and the window to backup the data was astonishingly small. 

With that in mind, could you imaging taking the amount of data that you are currently responsible for, and rsyncing the whole set across the internet? If you can imagine it, then I guess this post isn't for you. For the rest of us, however...

How ZFS Differs From rsync

rsync was an awesome tool back in the day. Sending only files that have changed to a remote server? Yeah, where do I sign up? Now even a small business can easily have more data than rsync can deal with in a timely manner. While rsync will only transfer files that have changed, it first needs to scan the entire volume on both sides to find out which files have changed. 

ZFS on the other hand, being a copy-on-write file system, already has a list of what has changed on the file system. Moreover, the list of what has changed does not include the entire file, but only the blocks that have changed. And even beyond that, the blocks don't need to be compared, the snapshot checksums are compared. System A says to System B, "What's the last snapshot you have for this dataset?" System B responds with a checksum, and System A says, "Oh, okay you're 3 snapshots behind. Here they are."

The time to transfer has been cut down dramatically. Which is important when you're managing 2TB of data and need to transfer only the changes. Scanning 2TB of data for changes? No thanks.

But Still, Why Seed Replication if it's so Fast?

Because the magic of snapshots and replication comes in when your dataset is similarly aligned on System A and System B. If you have 2TB of data on System A and decide to start replicating to a freshly spun up System B with no data on it, you're going to transfer the full 2TB of data. There's no getting around that. Well, I mean there is, and that's what we're doing here. What I mean is there's not getting around the fact that System B needs to be brought up to speed before the magic of snapshots and replication takes over.

Now in a perfect world, you could just have your two systems on the same network, replicate the two over gigabit (or better if you've got it!) and call it a day. But what if you want to replicate from your office in Minneapolis to your satellite office in Vancouver, and vice versa? I mean you could have both systems in your office, get everything sync'd and then pay to ship 50lb, 2U System B to Vancouver and hope it doesn't get dropped in shipping (rare, but it happens).

Or you could pay to ship an external drive that fits in your pocket. Y'know, whatever.

With Why Out of the Way, ZFS Seeding

This is actually a silly easy process. I think half of the reason I wrote this post is to highlight just how silly it is that this is so easy. The 10,000ft overview of what we're going to do.

  • Create a zpool on an external drive
  • Replicate the desired dataset to the zpool on the external drive
  • Migrate the external drive to our System B
  • Replicate from the external drive to System B's zpool
  • Replicate from System A to System B
  • Setup automated replication between System A and System B
For everyone using FreeNAS, if you know your way around FreeNAS, you should be able to just stop reading right now and go make this happen. Follow along if you'd like, though. Let's step through this with ZFS on Linux.

Create the Seed Zpool and Dataset

First let's get a list of zpools and datasets using 

zpool status
zfs list


So we see currently I have only 1 zpool which is comprised of two 2-disk mirror vdevs. Seriously, mirrors > parity. Then on my zpool I've got 2 datasets, certdisks and openvpncerts. Not a whole lot of data here, but it will work for a demonstration.

Next, with our external drive plugged in, let's go ahead and create a new zpool. I will create the zpool with 

zpool create -f seeddrive /dev/sdf
 zpool status


Great! We've got 2 pools. 

Identifying and Sending Your Snapshot

Now let's get a list of all of the snapshots that are available. If you've only got 1 snapshot because you just started snapshotting, welcome! Also your life just got a lot easier because you don't have to worry about incremental snapshots. Here I do a "zfs list -t snapshot | more", yes I know more is less and less is more. But actually in this case less clears the screen, and I need the data to persist.


I'm going to start by sending the oldest monthly snapshot to my seed drive, 

zfs send protopool/certdisks@autosnap_2017-04-15_01:09:18_monthly | zfs receive seeddrive/certdisks

I'm not going to use a screenshot there because really, when it's done it just returns you to a command prompt. Nothing terribly exciting. But once again, for those of you who only have 1 snapshot, congratulations! You're done! Alternatively if you've got more snapshots, and would just like to replicate everything else, congratulations, you're also done! You can skip the incrementals section.

The Incrementals Section

I really wanted to include this section because it took me longer than it aught to have to figure out what was going on based on the Oracle docs. So what we're going to do now is send an incremental snapshot. On the send side of the command we add the -i switch, and now we specify two snapshots. The first snapshot is the last snapshot that should exist on the seed drive, and the second snapshot is the new snapshot we're sending. So it will look something like this.

zfs send -i myzpool/mydataset@hourly_snap_01 myzpool/mydataset@hourly_snap_02 | zfs receive myseedpool/mydataset

Followed by the next increment

zfs send -i myzpool/mydataset@hourly_snap_02 myzpool/mydataset@hourly_snap_03 | zfs receive myseedpool/mydataset

So on, and so forth. So here's the command based on the snapshots in the example so far.

zfs send -i protopool/certdisks@autosnap_2017-04-15_01:09:18_monthly protopool/certdisks@autosnap_2017-04-16_00:01:01_daily | zfs receive seeddrive/certdisks

And again, I will forgo the screenshot as when it's complete, it just dumps us back at a command prompt.

Migrating the Seed Drive

This is a very short section. We're just going to export the seed zpool and remove the drive. 

zpool export seeddrive

Again with the forgoing screenshots. If it yells at you about the device being busy, apply the same troubleshooting measures you would if you weren't able to unmount an ext4 partition. Is someone using a file on the drive? Has someone navigated somewhere inside the mount point, and their session is just sitting there? That sort of thing.

Once the zpool has been exported, remove the external drive from your server. Pack it up however it needs to be packed up for however it is that you're shipping it, and get it sent to its destination.

Here's a great quote from the year Marty went back to the future, "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway." --Andrew S. Tanenbaum

From back in 1985, and that quote still holds true today. Except we're not using tapes here, we're using a large external drive. But still, same idea.

Import the Seed Drive Zpool

This part's pretty easy as well, but there was one piece that I wasn't expecting. With ZFS in FreeNAS you can just zpool import and it'll grab any zpools it finds. With ZFS on Linux, I did a zpool import, and it showed me the seed drive as I would have expected, but it did not actually import the drive. As you see in the screenshot, I needed to 

zpool import big-long-id-thing


And boom, we've got our seeddrive, as well as vancouverpool where the data will live.

Replicating From the Seed Drive to System B's

This is going to be the same process we went through under the Identifying and Sending Your Snapshot section. Except rather than go from an array to an external drive, we're going from an external drive to an array. Let's begin.

Once again I will list my snapshots

zfs list -t snapshot

I will then select the oldest snapshot which happens to be my monthly snapshot, and get it sent over.
zfs send seeddrive/certdisks@autosnap_2017-04-15_01:09:18_monthly | zfs receive vancouverpool/certdisks
And for my incremental snapshots I will use the following command

zfs send -i seeddrive/certdisks@autosnap_2017-04-15_01:09:18_monthly seeddrive/certdisks@autosnap_2017-04-16_00:01:01_daily | zfs receive vancouverpool/certdisks
And we do that basically until we have all of the snapshots replicated to System B's array.


Replication From System A to System B

Now normally you would setup ssh keys between your two systems so you don't have to enter a password for root, or you would use dedicated users for replication. In this case I'm going to be a fail sys admin here and just use a password.

The command here is a little more complicated because we're throwing ssh into the mix.

zfs send -i protopool/certdisks@autosnap_2017-04-23_06:01:01_hourly protopool/certdisks@autosnap_2017-04-23_07:01:01_hourly | ssh root@192.168.15.13 zfs recv vancouverpool/certdisks


Baddabing! Snapshot has been replicated. You can now do this for every snapshot that has occurred between when you populated the seed drive, and when the seed drive populated System B. It's important to get these in order, though. My seed drive has some snapshots out of order, and that's my bad. Keep the snapshots in order!

Automating Replication...with Scripts!

Okay, I'm not going to help you roll your own automation here. I'm going to point you to a set of useful scripts called Sanoid. These are the scripts I use to automate snapshots and replication. I could have used them here, but I wanted to go through this process manually for people who have not yet discovered the scripts, and so that I could have the experience of running through replication manually myself.

This was all hastily put together, but I hope it's useful to someone down the road.

No comments:

Post a Comment