System Rsync Backup

From KdjWiki

Jump to: navigation, search


Contents

General Concepts

Due to the low cost of hard disks, when I am talking about a backup in fact I am simply talking about creating a copy of your data in another location. This has the advantage of simple restoration (only needing to copy file back to their original location and you are good to go).


In practice there are many files whose content is not static and as such I am interested in being able to revert back to specific version of the files. For these I have them managed by subversion and backup the repository (see Versioning with Subversion). For files that change less frequently or I only release when I am happy with their correctness (and as such am only interested in the final version), I simply ensure I have a backup for disaster recovery. You should develop your own strategy in combining backup and versioning that satisfies your needs.


This all being said, I should probably start off and clarify some of the terminology I am likely to use as to reduce the risk of confusion.

source or local refers to the original data and/or location (such as machine and directory). This is what is being backed up.

target or remote refers to the duplicate or backup data and/or location. This is what results from the backup process.

Rsync

Rsync is a great tool for creating a duplicate set of files/directories. It is efficient in it's identification and transfer of only files that need to be updated. It can be run to simply update the target with files that are not currently there (or are different to the source), or made to ensure the target is an exact mirror of the source. It can be run over different transport protocols, including ssh which is a great option due to both it's ubiquity and security.

General Usage

The general usage for rsync is:

rsync [options] src dest

Common option would be:

-r    recursive
-l    copy symlinks as symlinks
-p    preserve permissions
-o    preserve owner
-g    preserve group
-t    preserve times
-u    only update if source is newer
-v    verbose
-z    compress during transfer

Other options are:

--exclude=PATTERN    exclude files matching pattern
--include=PATTERN    don't exclude files matching pattern
--delete             delete target files if they don't exist on source

There are also more complex filtering rules that can be applied with the -f or --filter=RULE options.

For non-standard systems, parameters can be passed to the remote shell with the -e or --rsh=COMMAND option (as you will see below).

Syncing Directories

A simple command line for syncing two directories would be:

  $ rsync -rlpogtv --delete /data /backup

This would recursively sync /data to /backup providing verbose feedback preserving all attributes (symlink, permissions, owner, group, time) and deleting files from /backup that don't exist in /data.

Syncing Remote Machines

NOTE: You will generally be prompted for a password when syncing with a remote machine, this is the password of the user you are connecting as (the name before the @). This can be bypassed as is described in the next section "Enabling password-less SSH".


To sync a local directory to a remote machine, the command line would be something like:

  $ rsync -rltv --delete /data user@hostname.com:/backup

Which would recursively sync /data to /backup on hostname.com (logging in as user) providing verbose feedback, preserving symlinks and file times, and deleting files from /backup that don't exist in /data.

By default this will connect using ssh on port 22 (the default ssh port). If you are running ssh on a non-standard port (such as 2200), you will need to use a command line such as:

  $ rsync -rltv --delete -e 'ssh -p 2200' /data user@hostname.com:/backup

Conversely to sync a remote directory to a local directory, the command line would be something like:

  $ rsync -rltv --delete user@hostname.com:/data /backup

And to sync a remote directory to a different remote directory, the command line would be something like:

  $ rsync -rltv --delete user@hostname.com:/data user2@hostname2.com:/backup

Sync/Backup Script

This script will only backup the database dumps unless a command line parameter (any parameter) is passed which is used to request a full sync. If you always want to run a full sync, you should remove the lines from if [ -z "${1}" ]; then to fi.


site_backup.sh

#!/bin/bash

app="/usr/bin/rsync"

# for exact sync (i.e. removing deleted files)
opts="-rltv --delete"

source="{root source directory}"
target="{user name}@{remote host name}:/{remote path}"

echo "Syncing database backups..."
${app} ${opts} -e '/usr/bin/ssh -p 2200' "${source}/backup/" "${target}/database/"

if [ -z "${1}" ]; then
	# Full sync not requested
	exit 0
fi

#
# for each top level folder (or subversion repository) to sync:
#

echo "Syncing bin..."
${app} ${opts} -e '/usr/bin/ssh -p 2200' "${source}/bin/" "${target}/bin/"

#
# etc ...
#

exit 0

NOTE: Due to the way bash parses command lines, I was unable to variablise (yeah, I made it a verb!) the ssh port. As such I had to hard code it in each rsync execution line. Remember that if you are running ssh on the default port (22) you can omit -e '/usr/bin/ssh -p 2200' from the line.

Enabling password-less SSH

In order to rsync (and ssh) to the remote machine without requiring a password be entered (i.e. so this can be automated without requiring user interaction) the remote machine needs to add credentials of the local machine to it's trusted hosts.


On local machine

Generate a public/private key pair:

  $ ssh-keygen

(accept default file location and no passphrase)

Copy public key to remote machine:

  $ scp .ssh/id_rsa.pub {user name}@{remote host}:/home/{user name}/


On remote machine

Add public key as authorised key:

  $ [ -d .ssh ] || mkdir .ssh
  $ cat id_rsa.pub >> .ssh/authorized_keys


On local machine

Test password-less ssh:

  $ ssh {user name}@{remote host}

If you are connected without needing to provide a password - you have successfully enabled a password-less ssh login. You should verify that it is only available from the local machine by ssh'ing to the remote machine from a third machine (or from the local machine whilst logged in as a different user).

Personal tools