As you know shared file systems are a tricky problem in the cloud. One solution to that problem is a distributed file system. Something each one of your app nodes can read from and write to. When it comes to distributed file systems, GlusterFS is one of the leading products.
With a few simple scripts on your Mac OS X or Linux machine, you can deploy a multi-zone High Availability GlusterFS cluster to Google Compute Engine (GCE) that provides scalable, persistent shared storage for your GCE or Google Container Engine (GKE) Kubernetes clusters.
In this post, I will demo these scripts and show you how to do this. By default, our GlusterFS cluster will use three GlusterFS servers, one server per Google Cloud zone in the same chosen region.
Before continuing, please make sure you have:
The first thing you need to do is clone my GitHub repo:
$ git clone https://github.com/rimusz/glusterfs-gce
Then, open the
cluster/settings file in your text editor. Find the section marked
your cluster region and zones zones and set
ZONES to whatever matches your preferred setup. The rest of settings in this file are probably fine, but can be adjusted if need be.
Bootstrap Your Cluster
There's nothing more you need to do. You should already be authenticated with Google from when you installed the Google Cloud SDK.
You can go right ahead and create the cluster by running:
This command will create three servers.
Each server will have:
- A static IP
- The GlusterFS server package installed
- A Google Cloud persistent disk to be used as a GlusterFS brick, that is: storage space made available to the cluster
Create Your First Volume
A GlusterFS volume is a collection of bricks. A volume can store data across the bricks in three basic ways: distributed, striped, or replicated.
- A distributed volume stores each file on one brick
- A striped volume stores a single file, in chunks, across multiple bricks
- A replicated volume stores a copy of each file on every brick
My script configures the replicated volume store. This provides you with faul-tolerance, should one of your GlusterFS nodes go down.
Here's how to create a replicated volume on all three servers:
$ cd .. $ ./cluster/create_volume.sh VOLUME_NAME
At this point, your GlusterFS cluster should be fully set up and operational.
Let's test it.
Testing Your Volume
Spin up a new GCE virtual machine, or use one of your existing non-GlusterFS virtual machines, and grab a shell on that machine.
Then mount your GlusterFS volume (replacing
VOLUME_NAME with the actual volume name you chose) to
/mnt/gfs by running this command:
mount -t glusterfs gfs-cluster1-server-1:/VOLUME_NAME /mnt/gfs
gfs-cluster1-server-1 is the name your cluster is given by my script.
Then copy a bunch of files into the mount point:
$ for i in `seq -w 1 100`; do cp -rp /var/log/messages /mnt/copy-test-$i; done
You don't have to run this exact command. Any files will do.
Now, grab a shell on one of your GlusterFS virtual machines, and take a look inside the brick volume to see the files you just created.
Do that like so:
$ ls -lA /data/brick1/VOLUME_NAME
VOLUME_NAME with the volume name you chose.
In this post, we saw how to use a few scripts to quickly launch a GlusterFS cluster on GCE for scalable, persistent storage for your apps.
cluster folder of this repo, there are two more scripts:
upgrade_glusterfs.shscript upgrades GlusterFS on all servers
upgrade_servers.shscript upgrades your distro (via APT) on all servers
To learn more about using GlusterFS with Kubernetes, check out the GlusterFS examples in the official Kubernetes repository.