Backing Up a Flat-File Blog
We've all run into backup problems in our personal lives. Lost photos due to shares expiring, corrupt Word docs on old pen drives, and of course the occasional data loss with our favourite services. The worst I've encountered was losing this blog's database a couple years back (when it was on WordPress) in an installation for a seperate site. Apparently one of my provider's UI WordPress install wizards didn't respect the parameters passed for database name and config, and thus used the defaults (pointing to this site) instead. This had the disastrous side affects of wiping my DB with all of the posts and reinstalling a new version. I won't go into details on my provider, but I will say they are very cheap, and suffice to say I was supposed to have a database backup feature with my plan, but it crapped out. Now I was stuck. How do I get my literal years of blog pages back? I ended up being lucky enough to painstakingly scrape them off of Google's Web Cache and the WayBackMachine into Google Drive files. Now I had the content, and it was somewhat durable. Phew. But no way I'm depending on just my provider alone for backup anymore. So, I wondered... What if I had a blog I could commit to GitHub or Bitbucket?
Enter the Flat-File CMS
After doing a bit of research, I found there were lots of "Flat-File" blogging frameworks, like Grav (hey, that's us!) and Flextype. By "Flat-File" these frameworks mean that they abstain from using a DB or other data source for storing content. Essentially, they use standardized filesystem structures to define site layout, and they use YAML for everything else; configs, templates, and page content itself. This gives us a few benefits:
- No need for a database
- Can be simply tracked in
git
- Can use MD for writing posts (this makes writing coding blog posts so much easier)
So, in recovering this site, I ended up deciding to skip WordPress entirley, and migrate directly to Grav. It was a fun weekend learning the framework and playing around with the themes. I feel like I still have a lot to learn about options in Grav, but so far it's been pretty fast to pick up and use for this site and my resume site. What does writing a blog post for this look like? Take a look!
Pretty neat eh? By default you can write in MD in their online editor. As you can see, this gives me access to codeblock
sections, easy to place links, and all the fun stuff and whatnot that comes with MD. I highly recommend looking into a Flat-File CMS for your next blogging endeavour. In any case using a CMS like this makes tracking with git
very simple.
Tracking your Blog with Git
So, now that I had my site setup, I needed a way to track my content directories, and to automatically commit and push all changes every now and then. I achieved this through the use of a .gitignore
file and some cron jobs.
The .gitignore
This page on the default folder structure for Grav is a good starting point for deciding what to commit and what not. There's lots of discussion on this online as well, so it really is up to what you feel is best. If you had a private repository, like on Bitbucket, you could just back up the whole thing, but I wouldn't necessarily recommend it. Really, the most important directory to track would be the user
dir, so I'd say start there and work your way out. You probably don't want to track logs
, cache
, or tmp
. But otherwise, your mileage may vary on any option here, so I recommend you play around a bit. Here's a minimal one to start:
cache/**
logs/**
assets/**
backup/**
tmp/**
The Cron Job
Once you have your Git repo intialized and configured to a remote (I use Bitbucket), you can proceed to setup a script to automatically push your work. I was able to configure this cron job to run a few times a day through CPanel:
{ cd /home/justi180/jflow.io; git add -u; git add -A; git commit -m "Auto commit"; git push origin master; } >/dev/null 2>&1
Yes, it's a fine bit of arcane BASH work, by which I mean I agree its a hasty one-liner. However this hasty one-liner has worked for years to keep my blogs auto committing back to private repos, so I'd say its at least successful in some dimensions. You can use a similar approach, or take this as a base and build up a more robust script based on your needs.
Final Words
This backup method has kept my blogs well tracked and provides an interesting automatic history for all of my posts as well. This means I can go through the Bitbucket repo and track my individual posts in MD like code review diffs, which gives me a certain satisfaction. I agree it's not as robust as other solutions, but I think I can come back to this with another blog post in the future to address some concerns I have with it... In any case, hope this was enlightening to interesting alternative forms of handing DR for Flat-File CMS infrastructures.