- 论坛徽章:
- 0
|
rsync Include and Exclude Files
If configuring vinum was the most time-consuming part of this odyssey, then tinkering with rsync's include and exclude functionality will be the trickiest. In this implementation, I use separate include and exclude files, although rsync is capable of drawing all of its include/exclude information from one file.
When copying in --archive mode, rsync includes all files that it has not been specifically told to exclude. It processes include and exclude rules a bit like a packet-filtering firewall -- searching from the top down and aborting the search at the first match. So, if rsync copies everything that isn't specifically excluded, why bother with an include list? When in --archive mode, --recursive is implied, and rsync applies the include/exclude list recursively to each sub-tree. If it finds an exclude match in a path, it aborts checking for all subdirectories underneath it. If there is a chance that an exclude rule might match a directory containing files you want to keep, then you'd better make sure those files are matched with an include rule first!
Beware the case-sensitive match! rsync does The Right Thing and considers case when comparing and copying files. Windows filesystems preserve case, but are not case sensitive. (This means that if I have a file called "FILENAME.TXT" and I ask Windows "Do you have a file called 'filename.txt'?", it will answer "yes". At the same time, if I create the file as "FiLeNaMe.TxT", then Windows will remember the original case that I specified, it just won't honor it!) If you can't be sure whether the filenames you want to match will be in uppercase or lowercase, then you need to specify both. Obviously, specifying all possible permutations and combinations could get pretty crazy pretty fast, so you need to approach this one with a level head.
In the following example, I specify that I want to keep word processor and spreadsheet files from OpenOffice and Microsoft along with PDFs. Generally, I don't want to archive executables and libraries, because they can be re-installed from original source disks. Many of my customers, however, use an email client that will quite happily soldier on if it is transplanted in its entirety with all of the files in its install directory, so I archive everything in that directory including executables.
Two of the biggest files on any Windows system will be the swap file and the hibernation file. Since they're pretty much completely useless anywhere other than on a running system, there is no point in archiving them. Many pre-installed Windows systems keep a complete copy of the OS install set in the \I386 directory. I can get that on CD too, so I won't be archiving that either. Here is a sample include file:
*.sxw
*.SXW
*.stc
*.STC
*.sxc
*.SXC
*.doc
*.DOC
*.xls
*.XLS
*.pdf
*.PDF
*Eudora*
*Eudora Pro*
And, here is a sample exclude file:
Temporary*
System Volume Information
i386
I386
*.dll
*.DLL
*.exe
*.EXE
PAGEFILE.SYS
hiberfil.sys
A check of the rsync line in my backup batch file will show that I refer to two include files and one exclude file. In this implementation, I use a common include and exclude file for all customers, then a second include file that is unique to each customer, usually empty, in case particular customizations are required.
All three files are stored on the backup server and copied at the beginning of each backup. Thus, I can modify them in the comfort and privacy of my own server and have the clients refer to the latest versions for each new backup run. The backup batch file also creates a text file listing all filenames on the client system, and that file is conveniently delivered to the backup server on every run. While fine-tuning the include and exclude rules, I can compare these file lists to the files that arrive on the server and tweak the rules as required.
Installing Win32 Client Software
In keeping with the very Unix-like flavor of this solution, Cygwin (http://www.cygwin.com/) binaries are used on the Win32 clients to make up the client end of the bargain. There are two ways to achieve this. If you have other uses for a Unix-like environment on your Win32 machines, then you might as well install the whole Cygwin environment. If this backup solution is your only requirement, however, then you may choose to simply install the small subset of the Cygwin distribution that is required to achieve this goal.
Specifically, we require the rsync, ssh, scp, ssh-keygen, and mount commands. If you don't require a full Cygwin installation, then you can make a temporary installation on one machine and pick out the executables and libraries you need. To run a backup from a Win32 client to the FreeBSD server, the following binaries are required on the Win32 machine:
rsync.exe
scp.exe
ssh.exe
cygcrypto-0.9.7.dll
cygminires.dll
cygpopt-0.dll
cygwin1.dll
cygz.dll
Additionally, to use ssh-keygen and mount (only required for installation), simply copy their respective binaries. The cat, mkdir, mv, nice, and rm commands and the z shell (sh.exe) are added so that they may be used in the install and backup scripts. These could be removed after they have been used by the install script if you're concerned about the extra space they use. Here are some additional Cygwin binaries for installation and scripting:
cat.exe
mkdir.exe
mount.exe
mv.exe
nice.exe
rm.exe
sh.exe
ssh-keygen.exe
cygiconv-2.dll
cygintl-1.dll
cygintl-2.dll
The files must be placed somewhere in the Windows path. Either place them in their own directory and modify the PATH environment variable or drop them in an existing location, perhaps C:\WINDOWS\ or C:\WINDOWS\SYSTEM32\.
Setting up the Windows Clients
Before a backup can be initiated, a number of prerequisites must be satisfied:
The Cygwin rsync and scp executables expect to find ssh in the /usr/bin directory, and ssh expects to record the public keys of known hosts in the customer's home directory in /home/<username>/.ssh/known_hosts. Mount points must be created to connect the Unix-style paths to their Win32 equivalents.
The customer requires a public/private key pair to authenticate with the backup server and, of course, the customer's public key must be installed on the server along with an actual user account on the server. The install.sh script delivers a series of commands ready to be pasted onto a server command line.
A script is required to carry out the backup process.
Execute the install script from a Windows command line with the form sh install.sh douglasb "Douglas the Cat". Windows 98 has different ideas about some of the paths used in this script, so it will require a bit of tweaking to run there.
#!/usr/bin/sh
# Simple install script to configure rsync/ssh backups
# on Windows NT hosts...
#
# Usage: install.sh <customer login name> <full customer name>
#
# Set Cygwin mount points
mount -f -s -t "C:\Documents and Settings" /home
mount -f -s -t $SYSTEMROOT /usr/bin
# Create customer's public/private key pair..
cd /usr/bin
mkdir .ssh
ssh-keygen.exe -N "" -q -b 1024 -C "$2" -t rsa -f .ssh/id_rsa
# Change back to the system directory, and insert the
# customer's FreeBSD username into the backup batch file.
mv backup-c.bat backup-c.bat.src
echo set USERNAME=$1 > backup-c.bat
cat backup-c.bat.src >> backup-c.bat
rm backup-c.bat.src
# A Windows Shortcut to the backup batch file on the Start menu
# may be a nice touch. Creating a Windows link file from
# DOS/shell is possible but complex. It's far easier to pre-
# create a shortcut to "%WINDIR/backup-c.bat", and simply move
# it into place.
mv "Backup C Drive.lnk" "$ALLUSERSPROFILE/Start Menu/Backup C Drive.lnk"
# Create command strings for execution on the FreeBSD backup
# server tocreate the customer account and populate .ssh/authorized_keys2
echo "/usr/sbin/pw useradd -n $1 -d /backups/$1 -L backupclients -g
backupclients -c \"$2\" -m -s /backups/bin/rsync-wrapper.sh" > tempfile.txt
echo mkdir "/backups/$1/.ssh" >> tempfile.txt
echo "echo command=\\\"/backups/bin/remote-rsync.sh\\\" 'cat
.ssh/id_rsa.pub' >/backups/$1/.ssh/authorized_keys2" >> tempfile.txt
echo "/bin/ln -s /backups/rsync-include.txt
/backups/$1/rsync-include.txt" >> tempfile.txt
echo "/bin/ln -s /backups/rsync-exclude.txt
/backups/$1/rsync-exclude.txt" >> tempfile.txt
echo "/usr/bin/touch /backups/$1/rsync-local-include.txt" >> tempfile.txt
# Present the command strings in a text editor for cut/paste
# to the host (this shell can actually execute windows binaries!)
/usr/bin/System32/notepad.exe tempfile.txt
# remove the temporary file.
rm tempfile.txt
The install.sh script prepends command="/backups/bin/rsync-wrapper.sh" to the customer's public key before offering it up for insertion in the authorized_keys2 file. If the customer authenticates by public/private key (and in this implementation, it is the only way a customer can gain access) then OpenSSH will ignore any command line sent by the client and instead execute this forced command. The rsync-wrapper.sh script records the client's command to syslog then confirms that it is either scp or rsync before allowing it to be executed. If the client sends any other command, it is logged to syslog's security facility and rejected.
Finally, here is the code for the backup-c.bat script that is executed by the Start menu shortcut inserted by the install.sh script:
C:
cd %WINDIR%
dir \ /a-d /s /b >all-files.txt
scp -i .ssh/id_rsa %USERNAME%@<my.backup.server>:rsync-include.txt
rsync-include.txt
scp -i .ssh/id_rsa %USERNAME%@<my.backup.server>:rsync-exclude.txt
rsync-exclude.txt
scp -i .ssh/id_rsa %USERNAME%@<my.backup.server>:rsync-local-include.txt
rsync-local-include.txt
nice -n 19 rsync.exe --archive --stats --progress --modify-window=5
--include-from=rsync-local-include.txt --include-from=rsync-include.txt
--exclude-from=rsync-exclude.txt --rsh="ssh -i .ssh/id_rsa"
/cygdrive/c/* %USERNAME%@<my.backup.server>:c/
pause
Network Time
To decide whether to copy a particular file rsync compares the size and the timestamp of the files. If your clients and your server have differing opinions on what the current time is, then you'll find a lot of unnecessary file transfers going on when your customers execute their backup scripts.
Many people are not aware that Windows 2000 ships with a perfectly serviceable NTP client -- it only made it into the GUI in Windows XP. The network time client is installed as a service named "Windows Time", but it does not start automatically by default in Windows 2000. Use the Services control panel (Start -> Run -> services.msc -> OK) to set it to start automatically.
Your backup server will also need a reliable time source. You could simply configure a cron job to run ntpdate every hour or so. If you have five minutes to spare instead of just one, configure xntpd and keep your server properly synchronized to a number of other servers. Be sure to add the line xntpd_enable="YES" to /etc/rc.conf if you do. With xntpd running on the server, your clients can synchronize to it, and they need never disagree on the time. Here's how to configure and start the Windows Time client:
C:\> net time /SETSNTP:ntp.mytimeserver.com
C:\> net start W32Time
Here's a sample ntp.conf for FreeBSD:
driftfile /var/db/ntp.drift
server ntp.atimeserver.com
server ntp.ticktock.com
server ntp.cuckoo.com
server ntp.hourglass.com
Windows 95/98/Me clients that don't ship with their own network time client might use the excellent open source NetTime. NetTime is available from:
http://nettime.sourceforge.net/
and is included on TheOpenCD from:
http://www.theopencd.org/
Even with a nicely synchronized clock, Windows' FAT filesystems cannot be relied upon to record timestamps with less than two seconds of granularity, so it is necessary to run rsync with the --modify-window option set to at least a second or two to avoid repeat copying of files.
Finally, always remember that tradition dictates that before you help yourself to someone else's network time service you should send them a quick email requesting permission. It's the polite thing to do and won't take much of your time!
Offline Backups
Once you have this system in place, making more permanent archives of the data from the comfort of your FreeBSD filesystems will be relatively easy. I've chosen to fulfill my "customer self-restore" goal by using mkisofs and cdrecord to write customers' data to CDs and DVDs. I use gzip to compress each individual file, so the customer is still working with a familiar filesystem, and many Windows-based zip packages happily speak gzip.
Commodity media might be an unattainable luxury in a larger implementation, so a more conventional backup to tape might be more appropriate. Amanda and Bacula are your friends here. Both support a wide array of tape drives and auto-changers.
If you have disk space to burn, rsnapshot might be of interest. Rsnapshot uses hard links to give the impression of multiple full backups, all neat snapshots at regular intervals in time. You'll need enough disk space to hold one full backup, plus changes, but the potential for offering self-restore capabilities to your customers, possibly over Samba shares, is an attractive prospect.
Traps for Young Players
The standard backup-system traps for young players apply here as with any other. Two in particular are important here. Before you put this system into production, you should satisfy yourself that you have good answers to two questions:
1. Does my RAID-5 setup work? In other words, can I replace a failed disk and have the array rebuild itself successfully?
Experiment with this one. Consider making a trial-run, perhaps with a smaller array. Set the partitions to 100Mb instead of 200+Gb to save yourself some time. Build your RAID-5 array, init and newfs it, mount it, and fill it up with data. Once that is done, forcibly fail the array -- perhaps use the atacontrol detach command. (Be careful to only down one disk -- RAID-5 won't help you if you lose more than one.) Or, if you're feeling a little crazy, power down one of your drives.
Vinum list should report that your volume is up and the plex is in a degraded state. You will be able to continue to read from and write to the array, albeit at a slower rate than usual. Replacing a failed disk in a vinum RAID-5 array requires that you prepare another disk with a partition the same size as the original, give it the same name, and bring it back into the array using the vinum start <diskname> command. Vinum will recalculate the data that should be on the disk from parity and bring it back into the array. While FreeBSD does support hot swapping of ATA disks using the atacontrol command, vinum is happier with disks that have been present since boot time.
2. Can I recover my offline and online backups?
This may sound like a silly question, but many people forget. The most elaborate and carefully crafted backup system in the world is useless if you can't recover the data. So test this, too. Back up a client machine, then attempt to restore the backups. Recovering the online backup should be as simple as rsyncing the data back in the opposite direction. Offline backups are often trickier. Did you really keep the data you need? Is the media you chose 5 years ago still readable by current equipment? Has the media degraded to the point where it can no longer be read?
You need to convince yourself that you can comfortably manage your backup system, particularly in the arguably inevitable event of failure. Play devil's advocate and think worst case -- imagine the horror scenarios and have a tested and working plan for getting yourself out of them unscathed. Power failures, hard drive failures, theft, and fires -- plan for them all. There are not many things in life more difficult than explaining to your boss that the backup system you built didn't work because of some minor technical oversight five years ago.
Resources
FreeBSD
Techniques described in this article were implemented on FreeBSD version 4.10-RELEASE, available from the main site and local mirrors everywhere:
http://www.freebsd.org/
rsync
rsync is available in source form from:
http://rsync.samba.org/
It is included in the FreeBSD ports collection (cd to /usr/ports/net/rsync, then make install clean) and in the packages directory on the 4.10-RELEASE CD. I use the rsync.exe binary from the Cygwin distribution for Win32 systems.
Cygwin
The Cygwin distribution can be found at:
http://www.cygwin.com/
Download the Cygwin setup.exe from that site, run it, then follow the bouncing ball. The setup program walks you through choosing a mirror to install from, and choosing which components you need. rsync and OpenSSH aren't defaults; you need to select them, but most of the other tools I have used are part of the base Cygwin install. If you require commercial support, Red Hat will happily sell it to you at:
http://www.redhat.com/software/cygwin/
OpenSSH
OpenSSH is included in the standard FreeBSD distribution and works perfectly well out of the box even without the few tweaks I've mentioned here. Confirm that your /etc/rc.conf file contains the line sshd_enable="YES" to ensure that sshd is started at boot time. OpenSSH source and documentation are available from the OpenBSD folks at:
http://www.openssh.com/
Amanda, Bacula, cdrtools, and rsnapshot
All excellent tools that you might use to make permanent offline backups of your data; they are available from the following sites, respectively:
http://www.amanda.org/
http://www.bacula.org/
http://ftp.berlios.de/pub/cdrecord/
ftp://ftp.berlios.de/pub/cdrecord/
Many are in the FreeBSD ports collection, so save yourself some time and check there first.
Samba
Samba allows Unix and Unix-like systems to offer SMBFS and CIFS file services to Windows (and other Unix) clients. Samba is available from:
http://www.samba.org/
Note also that FreeBSD has support for SMBFS, though you'll need to configure support for it into your kernel. |
|