How to setup a ZFS pool on CentOS 7.3
Here's my quick notes on setting up a ZFS pool with 5 drives in it.
1. Create 5 fake devices:
Each one will be 2GB:
truncate -s 2G zfs01.img
truncate -s 2G zfs02.img
truncate -s 2G zfs03.img
truncate -s 2G zfs04.img
truncate -s 2G zfs05.img
2. Create a pool using all 5 devices:
zpool create my-zfs-pool /root/zfs0*
The above will create a pool called "my-zfs-pool" using the devices matching /root/zfs0*.
Let's see what df -hT has to say about this new drive:
df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 xfs 80G 1.3G 79G 2% /
devtmpfs devtmpfs 991M 0 991M 0% /dev
tmpfs tmpfs 1001M 0 1001M 0% /dev/shm
tmpfs tmpfs 1001M 8.3M 993M 1% /run
tmpfs tmpfs 1001M 0 1001M 0% /sys/fs/cgroup
tmpfs tmpfs 201M 0 201M 0% /run/user/0
my-zfs-pool zfs 9.7G 0 9.7G 0% /my-zfs-pool
As you can see in the above, a 9.7GB filesystem was created on the "my-zfs-pool".
Let's get some more details about the drive using "zpool status".
zpool status
pool: my-zfs-pool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
my-zfs-pool ONLINE 0 0 0
/root/zfs01.img ONLINE 0 0 0
/root/zfs02.img ONLINE 0 0 0
/root/zfs03.img ONLINE 0 0 0
/root/zfs04.img ONLINE 0 0 0
/root/zfs05.img ONLINE 0 0 0
errors: No known data errors
3. Let's simulate damage to our pool by "corrupting" one of the drives:
cat /dev/null > /root/zfs05.img
Run a scrub (also known as as "integrity check") on the "my-zfs-pool" pool.
zpool scrub my-zfs-pool
the command will look like it hanged, but in another window you can check the status:
zpool status -v
pool: my-zfs-pool
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: scrub in progress since Wed May 3 04:27:44 2017
74.5K scanned out of 95.5K at 345/s, 0h1m to go
0 repaired, 78.01% done
config:
NAME STATE READ WRITE CKSUM
my-zfs-pool UNAVAIL 0 0 0 insufficient replicas
/root/zfs01.img ONLINE 0 0 0
/root/zfs02.img ONLINE 0 0 0
/root/zfs03.img ONLINE 0 0 0
/root/zfs04.img ONLINE 0 0 0
/root/zfs05.img UNAVAIL 0 0 0 corrupted data
errors: No known data errors
there's also some real nasty errors in `dmesg -T`:
[Wed May 3 04:27:42 2017] WARNING: Pool 'my-zfs-pool' has encountered an uncorrectable I/O failure and has been suspended.
[Wed May 3 04:30:34 2017] INFO: task txg_sync:1280 blocked for more than 120 seconds.
[Wed May 3 04:30:34 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
On the second window, try to stop the scrub using the `zpool scrub -s my-zfs-pool` command:
zpool scrub -s my-zfs-pool
cannot cancel scrubbing my-zfs-pool: pool I/O is currently suspended
Yep, it's screwed. Good thing this was just test data right?
Reboot your system:
reboot now
After the reboot, try to import the pool using the `zpool import -d /root/` command:
zpool import -d /root/
pool: my-zfs-pool
id: 8247909996552101646
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://zfsonlinux.org/msg/ZFS-8000-6X
config:
my-zfs-pool UNAVAIL missing device
/root/zfs01.img ONLINE
/root/zfs02.img ONLINE
/root/zfs03.img ONLINE
/root/zfs04.img ONLINE
Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.
4. Let's rebuild our pool using raidz for redundancy:
Delete the old "devices":
rm /root/zfs0* -f
Create the devices again:
truncate -s 2G zfs01.img
truncate -s 2G zfs02.img
truncate -s 2G zfs03.img
truncate -s 2G zfs04.img
truncate -s 2G zfs05.img
Create the pool again, but this time use the `raidz` option:
zpool create my-redundant-zfs-pool raidz /root/zfs0*
Let's check what the output of `df -hT` is:
df -hT /my-redundant-zfs-pool/
Filesystem Type Size Used Avail Use% Mounted on
my-redundant-zfs-pool zfs 7.7G 0 7.7G 0% /my-redundant-zfs-pool
You can see that this time the pool's size is smaller even though we used the same amount of drives - this is because we added the 'raidz' option which enables redundancy against drive failure.
zpool status
pool: my-redundant-zfs-pool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
my-redundant-zfs-pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
/root/zfs01.img ONLINE 0 0 0
/root/zfs02.img ONLINE 0 0 0
/root/zfs03.img ONLINE 0 0 0
/root/zfs04.img ONLINE 0 0 0
/root/zfs05.img ONLINE 0 0 0
errors: No known data errors
see how this one says 'raidz1-0' ?
Let's see what happens if we kill one of the drives again as we did before:
cat /dev/null > /root/zfs05.img
Run a scrub on it to check for errors:
zpool scrub my-redundant-zfs-pool
The above command has no output, but we can see what it's doing using `zpool status`:
zpool status
pool: my-redundant-zfs-pool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub repaired 0 in 0h0m with 0 errors on Wed May 3 05:03:32 2017
config:
NAME STATE READ WRITE CKSUM
my-redundant-zfs-pool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
/root/zfs01.img ONLINE 0 0 0
/root/zfs02.img ONLINE 0 0 0
/root/zfs03.img ONLINE 0 0 0
/root/zfs04.img ONLINE 0 0 0
/root/zfs05.img UNAVAIL 0 0 0 corrupted data
errors: No known data errors
5. Replace the corrupt drive:
As you can see that drive is corrupt. Let's "replace it".
Create a new "drive":
truncate -s 2G /root/zfs06.img
Use the aptly named "replace" command:
zpool replace my-redundant-zfs-pool /root/zfs05.img /root/zfs06.img
the above commands have no output, but let's check the status again:
zpool status
pool: my-redundant-zfs-pool
state: ONLINE
scan: resilvered 21K in 0h0m with 0 errors on Wed May 3 05:05:27 2017
config:
NAME STATE READ WRITE CKSUM
my-redundant-zfs-pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
/root/zfs01.img ONLINE 0 0 0
/root/zfs02.img ONLINE 0 0 0
/root/zfs03.img ONLINE 0 0 0
/root/zfs04.img ONLINE 0 0 0
/root/zfs06.img ONLINE 0 0 0
errors: No known data errors
That was so easy it's ridiculous.
Let's run another scrub:
zpool scrub my-redundant-zfs-pool
And check the status one last time:
zpool status -v
pool: my-redundant-zfs-pool
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Wed May 3 05:06:41 2017
config:
NAME STATE READ WRITE CKSUM
my-redundant-zfs-pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
/root/zfs01.img ONLINE 0 0 0
/root/zfs02.img ONLINE 0 0 0
/root/zfs03.img ONLINE 0 0 0
/root/zfs04.img ONLINE 0 0 0
/root/zfs06.img ONLINE 0 0 0
errors: No known data errors
Done!!