How to setup a ZFS pool on CentOS 7.3

Here's my quick notes on setting up a ZFS pool with 5 drives in it.

1. Create 5 fake devices:

Each one will be 2GB:

truncate -s 2G zfs01.img
truncate -s 2G zfs02.img
truncate -s 2G zfs03.img
truncate -s 2G zfs04.img
truncate -s 2G zfs05.img

2. Create a pool using all 5 devices:

zpool create my-zfs-pool /root/zfs0*

The above will create a pool called "my-zfs-pool" using the devices matching /root/zfs0*.

Let's see what df -hT has to say about this new drive:

df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 xfs 80G 1.3G 79G 2% /
devtmpfs devtmpfs 991M 0 991M 0% /dev
tmpfs tmpfs 1001M 0 1001M 0% /dev/shm
tmpfs tmpfs 1001M 8.3M 993M 1% /run
tmpfs tmpfs 1001M 0 1001M 0% /sys/fs/cgroup
tmpfs tmpfs 201M 0 201M 0% /run/user/0
my-zfs-pool zfs 9.7G 0 9.7G 0% /my-zfs-pool

As you can see in the above, a 9.7GB filesystem was created on the "my-zfs-pool".

Let's get some more details about the drive using "zpool status".

zpool status
 pool: my-zfs-pool
 state: ONLINE
 scan: none requested
config:

NAME STATE READ WRITE CKSUM
 my-zfs-pool ONLINE 0 0 0
 /root/zfs01.img ONLINE 0 0 0
 /root/zfs02.img ONLINE 0 0 0
 /root/zfs03.img ONLINE 0 0 0
 /root/zfs04.img ONLINE 0 0 0
 /root/zfs05.img ONLINE 0 0 0

errors: No known data errors

3. Let's simulate damage to our pool by "corrupting" one of the drives:

cat /dev/null > /root/zfs05.img

Run a scrub (also known as as "integrity check") on the "my-zfs-pool" pool.

zpool scrub my-zfs-pool

the command will look like it hanged, but in another window you can check the status:

zpool status -v
 pool: my-zfs-pool
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
 see: http://zfsonlinux.org/msg/ZFS-8000-HC
 scan: scrub in progress since Wed May 3 04:27:44 2017
 74.5K scanned out of 95.5K at 345/s, 0h1m to go
 0 repaired, 78.01% done
config:

NAME STATE READ WRITE CKSUM
 my-zfs-pool UNAVAIL 0 0 0 insufficient replicas
 /root/zfs01.img ONLINE 0 0 0
 /root/zfs02.img ONLINE 0 0 0
 /root/zfs03.img ONLINE 0 0 0
 /root/zfs04.img ONLINE 0 0 0
 /root/zfs05.img UNAVAIL 0 0 0 corrupted data

errors: No known data errors

there's also some real nasty errors in `dmesg -T`:

[Wed May 3 04:27:42 2017] WARNING: Pool 'my-zfs-pool' has encountered an uncorrectable I/O failure and has been suspended.
[Wed May 3 04:30:34 2017] INFO: task txg_sync:1280 blocked for more than 120 seconds.
[Wed May 3 04:30:34 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

On the second window, try to stop the scrub using the `zpool scrub -s my-zfs-pool` command:

zpool scrub -s my-zfs-pool
cannot cancel scrubbing my-zfs-pool: pool I/O is currently suspended

Yep, it's screwed. Good thing this was just test data right?

Reboot your system:

reboot now

After the reboot, try to import the pool using the `zpool import -d /root/` command:

zpool import -d /root/
 pool: my-zfs-pool
 id: 8247909996552101646
 state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
 devices and try again.
 see: http://zfsonlinux.org/msg/ZFS-8000-6X
 config:

my-zfs-pool UNAVAIL missing device
 /root/zfs01.img ONLINE
 /root/zfs02.img ONLINE
 /root/zfs03.img ONLINE
 /root/zfs04.img ONLINE

Additional devices are known to be part of this pool, though their
 exact configuration cannot be determined.

4. Let's rebuild our pool using raidz for redundancy:

Delete the old "devices":

rm /root/zfs0* -f

Create the devices again:

truncate -s 2G zfs01.img
truncate -s 2G zfs02.img
truncate -s 2G zfs03.img
truncate -s 2G zfs04.img
truncate -s 2G zfs05.img

Create the pool again, but this time use the `raidz` option:

zpool create my-redundant-zfs-pool raidz /root/zfs0*

Let's check what the output of `df -hT` is:

df -hT /my-redundant-zfs-pool/
Filesystem Type Size Used Avail Use% Mounted on
my-redundant-zfs-pool zfs 7.7G 0 7.7G 0% /my-redundant-zfs-pool

You can see that this time the pool's size is smaller even though we used the same amount of drives - this is because we added the 'raidz' option which enables redundancy against drive failure.

zpool status
 pool: my-redundant-zfs-pool
 state: ONLINE
 scan: none requested
config:

NAME STATE READ WRITE CKSUM
 my-redundant-zfs-pool ONLINE 0 0 0
 raidz1-0 ONLINE 0 0 0
 /root/zfs01.img ONLINE 0 0 0
 /root/zfs02.img ONLINE 0 0 0
 /root/zfs03.img ONLINE 0 0 0
 /root/zfs04.img ONLINE 0 0 0
 /root/zfs05.img ONLINE 0 0 0

errors: No known data errors

see how this one says 'raidz1-0' ?

Let's see what happens if we kill one of the drives again as we did before:

cat /dev/null > /root/zfs05.img

Run a scrub on it to check for errors:

zpool scrub my-redundant-zfs-pool

The above command has no output, but we can see what it's doing using `zpool status`:

zpool status
 pool: my-redundant-zfs-pool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
 invalid. Sufficient replicas exist for the pool to continue
 functioning in a degraded state.
action: Replace the device using 'zpool replace'.
 see: http://zfsonlinux.org/msg/ZFS-8000-4J
 scan: scrub repaired 0 in 0h0m with 0 errors on Wed May 3 05:03:32 2017
config:

NAME STATE READ WRITE CKSUM
 my-redundant-zfs-pool DEGRADED 0 0 0
 raidz1-0 DEGRADED 0 0 0
 /root/zfs01.img ONLINE 0 0 0
 /root/zfs02.img ONLINE 0 0 0
 /root/zfs03.img ONLINE 0 0 0
 /root/zfs04.img ONLINE 0 0 0
 /root/zfs05.img UNAVAIL 0 0 0 corrupted data

errors: No known data errors

5. Replace the corrupt drive:

As you can see that drive is corrupt. Let's "replace it".

Create a new "drive":

truncate -s 2G /root/zfs06.img

Use the aptly named "replace" command:

zpool replace my-redundant-zfs-pool /root/zfs05.img /root/zfs06.img

the above commands have no output, but let's check the status again:

zpool status
 pool: my-redundant-zfs-pool
 state: ONLINE
 scan: resilvered 21K in 0h0m with 0 errors on Wed May 3 05:05:27 2017
config:

NAME STATE READ WRITE CKSUM
 my-redundant-zfs-pool ONLINE 0 0 0
 raidz1-0 ONLINE 0 0 0
 /root/zfs01.img ONLINE 0 0 0
 /root/zfs02.img ONLINE 0 0 0
 /root/zfs03.img ONLINE 0 0 0
 /root/zfs04.img ONLINE 0 0 0
 /root/zfs06.img ONLINE 0 0 0

errors: No known data errors

That was so easy it's ridiculous.

Let's run another scrub:

zpool scrub my-redundant-zfs-pool

And check the status one last time:

zpool status -v
 pool: my-redundant-zfs-pool
 state: ONLINE
 scan: scrub repaired 0 in 0h0m with 0 errors on Wed May 3 05:06:41 2017
config:

NAME STATE READ WRITE CKSUM
 my-redundant-zfs-pool ONLINE 0 0 0
 raidz1-0 ONLINE 0 0 0
 /root/zfs01.img ONLINE 0 0 0
 /root/zfs02.img ONLINE 0 0 0
 /root/zfs03.img ONLINE 0 0 0
 /root/zfs04.img ONLINE 0 0 0
 /root/zfs06.img ONLINE 0 0 0

errors: No known data errors

Done!!

Lampros - Weird Bricks