Hello internet,

Recently I was in a situation where I was asked to migrate a server to a different provider.

I figured this would be trivial so I started with the usual rsync in a screen copy. I checked a few hours later and rsync was frozen.

Further investigation revealed some statistics about the server:

  • The server had close to 3 million directories
  • The server had close to 20 million files

In this post I won't go into how I ended up migrating this server, but instead into how the above drove me to write my own script to generate thousands (millions) of files for testing.

Here is a gist of the script: https://gist.github.com/weirdbricks/c8d15c9935e50c7c328c81c81165db21

Please note - the script will run with both Ruby and JRuby, but only JRuby will take full advantage of all the system's cores and as such the performance is far superior. See a note about this at the end of this post.

Installation

On a new CentOS 6.x VPS:

1. Install RVM:

gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
curl -sSL https://get.rvm.io | bash -s stable
source /etc/profile.d/rvm.sh

2. Install Ruby and Jruby:

rvm install ruby-2.3.0
rvm install jruby

3. Make sure you are using Ruby 2.3.0:

rvm use ruby-2.3.0

4. Install some required gems:

gem install randomstring thread ruby-progressbar colorize sys-filesystem --no-ri --no-rdoc

5. Do the same for Jruby:

rvm use jruby
jruby -S gem install randomstring thread ruby-progressbar colorize sys-filesystem --no-ri --no-rdoc

6. Get the script and make it executable:

curl -O https://gist.githubusercontent.com/weirdbricks/c8d15c9935e50c7c328c81c81165db21/raw/de4badaa02a47950e738e173637acfcbd0b551e9/create_files.rb
mv create_files.rb /usr/local/bin
chmod +x /usr/local/bin/create_files.rb

Usage example 1: Set number of files and directories

Let's create 10000 (10k) files using Ruby MRI under the /opt directory in 10 directories:

rvm use ruby-2.3.0
create_files.rb /opt/ 10000 10

Output:

INFO: Found 12 cores, setting the threads to 12.
INFO: Will use the directory: /opt/.
INFO: Will create 10 subdirectories.
INFO: Will create 10000 per directory - this is a total of 100000.
QUESTION: If the above looks ok, type "yes".

Type "yes" to continue and the fun begins

Output:

OK, changing directory to /opt/..
Starting in 3 seconds..
Starting in 2 seconds..
Starting in 1 seconds..
Starting in 0 seconds..
Filesystem utilization for / at 5.57%
Inodes remaining for / = 22894403
Creating and switching to directory AccExCj3SuE..
Time: 00:00:17 |=================================================================| 100.00% (588.24 files per second) 10000 Files
----1/10----
Filesystem utilization for / at 5.64%
Inodes remaining for / = 22884402
Creating and switching to directory 74jxnmDdDaE..
Time: 00:00:17 |=================================================================| 100.00% (588.24 files per second) 10000 Files

Usage example 2: Keep creating files until you hit 95% of the filesystem utilization:

Let's do this one using Jruby:

rvm use jruby
create_files.rb /opt/ AUTO

Output:

WARNING: You have selected "AUTO" mode. I'll keep creating files and subdirectories under /opt/ randomly until I run out of inodes or disk space.
WARNING: Type "AUTO" if you wish to proceed.

Type "AUTO" to continue and the fun begins - sample output:

OK, changing directory to /opt/..
Starting in 3 seconds..
Starting in 2 seconds..
Starting in 1 seconds..
Starting in 0 seconds..
Filesystem utilization for / at 5.57%
Inodes remaining for / = 22894403
Creating and switching to directory AccExCj3SuE..
Time: 00:00:17 |=================================================================| 100.00% (588.24 files per second) 10000 Files
----1/10----
Filesystem utilization for / at 5.64%
Inodes remaining for / = 22884402
Creating and switching to directory 74jxnmDdDaE..
Time: 00:00:17 |=================================================================| 100.00% (588.24 files per second) 10000 FilesINFO: Found 12 cores, setting the threads to 12.
INFO: Will use the directory: /opt/.
OK, changing directory to /opt/..
Starting in 3 seconds..
Starting in 2 seconds..
Starting in 1 seconds..
Starting in 0 seconds..
----AUTO Mode: 1----
Filesystem utilization for / at 5.75%
Inodes remaining for / = 22866508
Creating and switching to directory wRVy3R-Qp-I..
INFO: will create 44785 files
Time: 00:00:09 |================ | 97.78% (4866.11 files per second) 43795 Files
----AUTO Mode: 2----
Filesystem utilization for / at 6.08%
Inodes remaining for / = 22821722
Creating and switching to directory kKmtYVPaPx4..
INFO: will create 6982 files
Time: 00:00:01 |================= | 98.49% (6877.00 files per second) 6877 Files
----AUTO Mode: 3----
Filesystem utilization for / at 6.13%
Inodes remaining for / = 22814739
Creating and switching to directory -FmCH8-dgz0..
INFO: will create 42170 files
Time: 00:00:07 |================ | 99.37% (5987.43 files per second) 41912 Files

See the Filesystem utilization for / at 6.13% line ? "AUTO" mode will keep going until it hits 95%. That's going to be a lot of files!

Besides the script output above you can use the command df -i to see the inodes utilization:

df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 22937600 283003 22654597 2% /
tmpfs 4109995 1 4109994 1% /dev/shm

Tons of fun!

Notes about performance:

On the same system (12 cores) it took MRI Ruby under 187 seconds to create 100k files, while it took Jruby 28 seconds - an 85% difference without any modifications to the code.