OpenSSH/Cookbook/Automated Backup


Using OpenSSH with keys can facilitate secure automated backups. rsync(1)[1], tar(1), and dump(8) are the foundation for most backup methods. It's a myth that remote root access must be allowed. If root access is needed, sudo(8) works just fine or, in the case of zfs(8), the OpenZFS Delegation System. Remember that until the backup data has been tested and shown to restore reliably it does not count as a backup copy.

Backup with rsync(1) edit

rsync(1) is often used to back up both locally and remotely. It is fast and flexible and copies incrementally so only the changes are transferred, thus avoiding wasting time re-copying what is already at the destination. It does that through use of its now famous algorithm. When working remotely, it needs a little help with the encryption and the usual practice is to tunnel it over SSH.

The rsync(1) utility now defaults to using SSH and has since 2004[2]. Thus the following connects over SSH without having to add anything extra:

$ rsync -a fred@server.example.org:./archive/ \
    /home/fred/archive/.

But use of SSH can still be specified explicitly if additional options must be passed to the SSH client:

$ rsync -a -e 'ssh -v' \
    fred@server.example.org:./archive/ \
    /home/fred/archive/.

For some types of data, transfer can sometimes be expedited greatly by using rsync(1) with compression, -z, if the CPUs on both ends can handle the extra work. However, it can also slow things down. So compression is something which must be tested in place to find out one way or the other whether adding it helps or hinders.

Rsync with Keys edit

Since rsync(1) uses SSH by default it can even authenticate using SSH keys by using the -e option to specify additional options. In that way it is possible to point to a specific SSH key file for the SSH client to use when establishing the connection.

$ rsync --exclude '*~' -avv \
    -e 'ssh -i ~/.ssh/key_bkup_rsa' \
    fred@server.example.org:./archive/ \
    /home/fred/archive/.

Other configuration options can also be sent to the SSH client in the same way if needed, or via the SSH client's configuration file. Furthermore, if the key is first added to an agent, then the key's passphrase only needs to be entered once. This is easy to do in an interactive session within a modern desktop environment. In an automated script, the agent will have to be set up with explicit socket names passed along to the script and accessed via the SSH_AUTH_SOCK variable.

Root Level Access for rsync(1) with sudo(8) edit

Sometimes the backup process needs access to a different account other than the one which can log in. That other account is often root which for reasons of least privilege is usually denied direct access via SSH. rsync(1) can invoke sudo(8) on the remote machine, if needed.

Say you're backing up from the server to the client. rsync(1) on the client uses ssh(1) to make the connection to rsync(1) on the server. rsync(1) is invoked from client with -v passed to the SSH client to see exactly what parameters are being passed to the server. Those details will be needed in order to incorporate them into the server's configuration for sudo(8). Here the SSH client is run with a single level of increased verbosity in order to show which options must be used:

$ rsync \
  -e 'ssh -v \
          -i ~/.ssh/key_bkup_rsa  \
          -t             \
          -l bkupacct'   \
  --rsync-path='sudo rsync' \ 
  --delete   \
  --archive  \
  --compress \
  --verbose  \
  bkupacct@server:/var/www/ \
  /media/backups/server/backup/

There the argument --rsync-path tells the server what to run in place of rsync(1). In this case it runs sudo rsync. The argument -e says which remote shell tool to use. In this case it is ssh(1). For the SSH client being called by the rsync(1) client, -i says specifically which key to use. That is independent of whether or not an authentication agent is used for ssh keys. Having more than one key is a possibility, since it is possible to have different keys for different tasks.

You can find the exact settings(s) to use in /etc/sudoers by running the SSH in verbose mode (-v) on the client. Be careful when working with patterns not to match more than is safe.

Adjusting these settings will most likely be an iterative process. Keep making changes to /etc/sudoers on the server while watching the verbose output until it works as it should. Ultimately /etc/sudoers will end up with a line allowing rsync(1) to run with a minimum of options.

Steps for rsync(1) with Remote Use of sudo(8) Over SSH edit

These examples are based on fetching data from a remote system. That is to say that the data gets copied from /source/directory/ on the remote system to /destination/directory/ locally. However, the steps will be the same for the reverse direction, but a few options will be placed differently and --sender will be omitted. Either way, copy-paste from the examples below won't work.

Preparation: Create a single purpose account to use only during the backups, create a pair of keys to use only for that account, then make sure you can log in to that account with ssh(1) with and without those keys.

$ ssh -i ~/.ssh/key_bkup_ed25519 bkupacct@www.example.org

The account on the server is named 'bkupacct' and the private Ed25519 key is ~/.ssh/key_bkup_ed25519 on the client. On the server, the account 'bkupacct' is a member of the group 'backups'. See the section on Public Key Authentication if necessary.

The public key, ~/.ssh/key_bkup_ed25519.pub, must be copied to the account 'bkupacct' on the remote system and placed in ~/.ssh/authorized_keys in the correct place. Then it is necessary that the following directories on the server are owned by root and belong to the group 'backups' and are group readable, but not group writable, and definitely not world readable: ~ and ~/.ssh/. Same for the file ~/.ssh/authorized_keys there. (This also assumes you are not also using ACLs) However this is only one way of many to set permissions on the remote system:

$ sudo chown root:bkupacct ~
$ sudo chown root:bkupacct ~/.ssh/
$ sudo chown root:bkupacct ~/.ssh/authorized_keys
$ sudo chmod u=rwx,g=rx,o= ~
$ sudo chmod u=rwx,g=rx,o= ~/.ssh/
$ sudo chmod u=rwx,g=r,o=  ~/.ssh/authorized_keys

Now the configuration can begin.

Step 1: Configure sudoers(5) so that rsync(1) can work with sudo(8) on the remote host. In this case data is staying on the remote machine. The group 'backups' will temporarily need full access in order to find and set specific options used later in locking this down.

%backups ALL=(root:root) NOPASSWD: /usr/bin/rsync

That is a transitory step and it is important that line should not be left in place as-is for any length of time.

However, while it is in place, ensure that rsync(1) works with sudo(8) by testing it with the --rsync-path option.

$ rsync --rsync-path='sudo rsync' \
-aHv bkupacct@www.example.org:/source/directotry/ /destination/directory/

The transfer should run without errors, warnings, or extra password entry.

Step 2: Next, do the same transfer again but using the key for authentication to make sure that the two can be used together.

$ rsync -e 'ssh -i ~/.ssh/key_bkup_ed25519' --rsync-path='sudo rsync' \
-aHv bkupacct@www.example.org:/source/directory/ /destination/directory/

Again, see the section on Public Key Authentication if necessary.

Step 3: Now collect the connection details. They are needed to tune sudoers(5) appropriately.

$ rsync -e 'ssh -E ssh.log -v -i ~/.ssh/key_bkup_ed25519' \
--rsync-path='sudo rsync' \
-aHv bkupacct@www.example.org:/source/directory/ /destination/directory/

$ grep -i 'sending command' ssh.log

The second command, the one with grep(1), ought to produce something like the following:

debug1: Sending command: rsync --server --sender -vlHogDtpre.iLsfxCIvu . /source/directory/

The long string of letters and the directory are important to note because those will be used to tune sudoers(5) a little. Remember that in these examples, the data gets copied from /source/directory/ on the remote machine to /destination/directory/ locally.

Here are the settings which match the formula above, assuming the account is in the group backups:

%backups ALL=(root:root) NOPASSWD: /usr/bin/rsync --server --sender -vlHogDtpre.iLsfxCIvu . /source/directory/

That line adjusts sudoers(5) so that the backup account has enough access to run rsync(1) as root but only in the directories it is supposed to run in and without free-rein on the system.

More refinements may come later, but those are the basics for locking sudoers(5) down. At this point you are almost done, although the process can be automated much further. Be sure that the backed up data is not accessible to others once stored locally.

Step 4: Test rsync(1) with sudo(8) over ssh(1) to verify that the settings made in sudoers(5) are correct.

$ rsync -e 'ssh -i ~/.ssh/key_bkup_ed25519' --rsync-path='sudo rsync' \
-aHv bkupacct@www.example.org:/source/directory/ /destination/directory/

The backup should run correctly at this point.

Step 5: Finally it is possible to lock that key into just the one task by prepending restrictions using the command="..." option in the authorized_keys file. The explanation for that is found in sshd(8).

command="/usr/bin/rsync --server --sender -vlHogDtpre.iLsfxCIvu . ./source/directory" ssh-ed25519 AAAAC3Nz...aWi

Thereafter that one key functions only for the backup. It's an extra layer upon the settings already made in the sudoers(5) file.

Thus you are able to do automated remote backup using rsync(1) with root level access yet avoiding remote root login. Nevertheless keep close tabs on the private key since it can still be used to fetch the remote backup and that may contain sensitive information anyway.

From start to finish, the process requires a lot of attention to detail, but is quite doable if taken one step at a time. Setting up backups going the reverse direction is quite similar. When going from local to remote the ---sender option will be omitted and the directories will be different.

Other Implementations of the Rsync Protocol edit

openrsync(1) is a clean room reimplementation[3] of version 27 of the Rsync protocol as supported by the samba.org implementation of rsync(1). It has been in OpenBSD's base system since OpenBSD version 6.5. It is invoked with a different name, so if it is on a remote system and samba.org's rsync(1) is on the local system, the --rsync-path option must be point to it by name:

$ rsync -a -v -e 'ssh -i key_rsa' \
	--rsync-path=/usr/bin/openrsync \
	fred@server.example.org:/var/www/ \
	/home/fred/www/

Going the other direction, starting with openrsync(1) and connecting to rsync(1) on the remote system, needs no such tweaking.

Backup Using tar(1) edit

A frequent choice for creating archives is tar(1). But since it copies whole files and directories, rsync(1) is usually much more efficient for updates or incremental backups.

The following will make a tarball of the directory /var/www/ and send it via stdout on the local machine into sdtin on the remote machine via a pipe into ssh(1) where, it is then directed into the file called backup.tar. Here tar(1) runs on a local machine and stores the tarball remotely:

$ tar cf - /var/www/ | ssh -l fred server.example.org 'cat > backup.tar'

There are almost limitless variations on that recipe:

$ tar zcf - /var/www/ /home/*/www/ \
	|  ssh -l fred server.example.org 'cat > $(date +"%Y-%m-%d").tar.gz'

That example does the same, but also gets user WWW directories, compress the tarball using gzip(1), and label the resulting file according to the current date. It can be done with keys, too:

$ tar zcf - /var/www/ /home/*/www/ \
	|  ssh -i key_rsa -l fred server.example.org 'cat > $(date +"%Y-%m-%d").tgz'

And going the other direction is just as easy for tar(1) to find what is on a remote machine and store the tarball locally.

$ ssh fred@server.example.org 'tar zcf - /var/www/' >  backup.tgz

Or here is a fancier example of running tar(1) on the remote machine but storing the tarball locally.

$ ssh -i key_rsa -l fred server.example.org 'tar jcf - /var/www/ /home/*/www/' \
	> $(date +"%Y-%m-%d").tar.bz2

So in summary, the secret to using tar(1) for backup is the use of stdout and stdin to effect the transfer through pipes and redirects.

Backup of Files With tar(1) But Without Making A Tarball edit

Sometimes it is necessary to just transfer the files and directories without making a tarball at the destination. In addition to writing to stdin on the source machine, tar(1) can read from stdin on the destination machine to transfer whole directory hierarchies at once.

$ tar zcf - /var/www/ | ssh -l fred server.example.org "cd /some/path/; tar zxf -"

Or going the opposite direction, it would be the following.

$ ssh 'tar zcf - /var/www/' | (cd /some/path/; tar zxf - )

However, these still copy everything each time they are run. So rsync(1) described above in the previous section might be a better choice in many situations, since on subsequent runs it only copies the changes. Also, depending on the type of data network conditions, and CPUs available, compression might be a good idea either with tar(1) or ssh(1) itself.

Backup Using dump edit

Using dump(8) remotely is like using tar(1). One can copy from the remote server to the local server.

$ ssh -t source.example.org 'sudo dump -0an -f - /var/www/ | gzip -c9' > backup.dump.gz

Note that the password prompt for sudo(8) might not be visible and it must be typed blindly.

Or one can go the other direction, copying from the locate server to the remote:

$ sudo dump -0an -f - /var/www/ | gzip -c9 | ssh target.example.org 'cat > backup.dump.gz'

Note again that the password prompt might get hidden in the initial output from dump(8). However, it's still there, even if not visible.

Backup Using zfs(8) Snapshots edit

OpenZFS can easily make either full or incremental snapshots as a beneficial side effect of copy-on-write. These snapshots can be sent over SSH to or from another system. This method works equally well for backing up or restoring data. However, bandwidth is a consideration and the snapshots must be small enough to be feasible for the actual network connection in question. OpenZFS supports compressed replication such that the blocks which have been compressed on the disk remain compressed during transfer, reducing the need to recompress using another process. The transfers can be to or from either a regular file or another OpenZFS file system. It should be obvious but it is important to remember that smaller snapshots use less bandwidth and thus transfer more quickly than larger ones.

A full snapshot is required first because incremental snapshots only contain a partial set of data and require that the foundation upon which they were formed exists. The following uses zfs(8) to make a snapshot named 20210326 of a dataset named site01 in a pool named web.

$ zfs snapshot -r web/site01@20210326

The program itself will most likely be in the /sbin/ directory and either the PATH environment variable needs to include it or else the absolute path should be used instead. Incremental snapshots can subsequently be built upon the initial full snapshot by using the -i option. However, the ins and outs of OpenZFS management are far outside the scope of this book. Just the two methods for transfer between systems will be examined here. The one method is using an intermediate file and the other is more direct using a pipe. Both use zfs send and zfs receive and the accounts involved must have the correct privileges in the OpenZFS Delegation System. For sending, it will be send and snapshot for the relevant OpenZFS pool. For receiving, it will be create, mount, and receive for the relevant pool.

OpenZFS To And From A Remote File System Via A File edit

A snapshot can be transferred to a file on a local or remote system over SSH. This method does not need privileged access on either system, but the account running zfs must have the correct internal OpenZFS permissions as granted by zfs allow. Here a very small snapshot is downloaded from the remote system to a local file:

$ ssh fred@server.example.org '/sbin/zfs send -v web/site01@20210326' > site01.openzfs 
full send of web/site01@20210326 estimated size is 1.72M
total estimated size is 1.72M

If incremental snapshot are copied, the full snapshot on which they are based needs to be copied also. So care should be taken to ensure that this is a full snapshot and not just an incremental snapshot.

Later, restoring the snapshot is matter of going the reverse direction. In this case the data is retrieved from the file and sent to zfs(8) over SSH.

$ cat site01.openzfs | ssh fred@server.example.org '/sbin/zfs receive -v -F web/site01@20210326' 
receiving full stream of web/site01@20210326 into web/site01@20210326
received 1.85M stream in 6 seconds (316K/sec)

This is possible because the channel is 8-bit-clean when started without a PTY as happens when invoking programs directly at run time. Note that the targeted OpenZFS data set must be umounted using zfs(8) first. Then after the transfer it must be mounted again.

The Other Direction edit

Transferring from the local system to the remote is a matter of changing around the order of the components.

$ /sbin/zfs send -v web/site01@20210326 | ssh fred@server.example.org 'cat > site01.openzfs'
full send of web/site01@20210326 estimated size is 1.72M
total estimated size is 1.72M

Then similar changes are needed to restore from the remote to the local.

$ ssh fred@server.example.org 'cat site01.openzfs' | /sbin/zfs receive -v -F web/site01@20210326' 
receiving full stream of web/site01@20210326 into web/site01@20210326
received 1.85M stream in 6 seconds (316K/sec)

As usual, to avoid using the root account for these activities, the account running zfs(8) must have the right levels of access within the OpenZFS Delegation System.

OpenZFS Directly To And From A Remote File System edit

Alternatively that snapshot can be transferred over SSH to a file system on the remote computer. This method needs privileged access and will irrevocably replace any changes made on the remote system since the snapshot.

$ zfs send -v pool/www@20210322 | ssh fred@server.example.org 'zfs receive -F pool/www@20210322'

So if removable hard drives are used on the remote system, this can update them.

$ ssh fred@server.example.org 'zfs send -v pool/www@20210322' | zfs receive -F pool/www@20210322

Again, the remote account must already have been permitted the necessary internal ZFS permissions.

The Other Direction edit

Again, to go the other direction, from a remote system to a local one, it is a matter of changing around the order of the components.

$ ssh fred@server.example.org 'zfs send -v pool/www@20210322' | zfs receive -F pool/www@20210322

And,

$ zfs send -v pool/www@20210322 | ssh fred@server.example.org 'zfs receive -F pool/www@20210322'

Again, working with the OpenZFS Delegation System can avoid the need for root access on either end of the transfer.

Buffering OpenZFS Transfers edit

Sometimes the CPU and network will alternate being the bottleneck during the file transfers. The mbuffer(1) utility can allow a steady flow of data [4] even when the CPU gets ahead of the network. The point is to leave a big enough buffer for there to always be some data transferring over the net even while the CPU is catching up.

$ cat site01.zfs | mbuffer -s 128k -m 1G \
| ssh fred@server.example.org 'mbuffer -s 128k -m 1G | /sbin/zfs receive -v -F web/site01'

summary: 1896 kiByte in  0.2sec - average of 7959 kiB/s
receiving full stream of web/site01@20210326 into web/site01@20210326
in @ 2556 kiB/s, out @ 1460 kiB/s, 1024 kiB total, buffer   0% full
summary: 1896 kiByte in  0.8sec - average of 2514 kiB/s
received 1.85M stream in 2 seconds (948K/sec)

Further details of working with OpenZFS and managing its snapshots are outside the scope of this book. Indeed. there are whole guides, tutorials, and even books written about OpenZFS.


References edit

  1. "How Rsync Works". Samba.
  2. "NEWS for rsync 2.6.0 (1 Jan 2004)". Samba. 2004-01-01. Retrieved 2020-05-02.
  3. "openrsync imported into the tree". Undeadly. 2019-02-11. Retrieved 2020-05-10.
  4. Dan Langille (2014-05-03). "zfs send on FreeBSD over ssh using mbuffer". Retrieved 2020-05-22.