Archiving Avamar Backups to AWS Glacier – Part 1

Last night I set myself an engineering challenge. Is it possible to archive Avamar backups to AWS Glacier?

Now, before you jump off your chair in excitement, first some disclaimers. What I am about to demonstrate is not supported by me or EMC. Please don’t call EMC if you end up:

  • losing your backups
  • receiving a large bill from AWS

This is being shared in the spirit of experimentation. OK so first let me describe what is required.

We need a host to act as our Avamar to Glacier gateway. This is analogous to a Cloud Gateway. For this I am going to spin up a Linux virtual machine running CentOS.

Next we need a way to extract backups from Avamar into a flat file. It turns out the Avamar Extended Retention feature introduced a method to export and import Avamar backups to/from PAX streams. In this case we are going to turn the backup stream into a flat file and use that as the basis for archiving to Glacier.

We also need a way of shipping these flat file backup archives to Glacier. There are a few options available. I am going to use mtglacier. This tool allows a local file system to be kept in sync with one or more Glacier vault’s. In our case we do not want to keep the files around so once they have been uploaded they will be deleted.

The architecture looks like this:

AvamarGlacierArch

We have our Avamar server (in this case Avamar Virtual Edition running 7.0SP1) with an Avamar Data Store (file system). We have our Cloud Gateway Server with Avamar client and mtglacier installed and we have Glacier supporting the vault.

I am not going to discuss the process of installing Avamar client or mtglacier. These are already well documented.

Exporting and Uploading Avamar Archived Backups to Glacier

To archive an Avamar backup into Glacier we need to go through 3 steps:

  1. Identify the Avamar backup that needs to be exported
  2. Export the Avamar backup to a flat file on the Cloud gateway
  3. Upload the flat file into a Glacier vault

The workflow looks like this:

Upload

For this experiment I created a small backup that is 176588 KB in size from a Windows client with the name winco.mlab.local.  This client is registered under the Avamar domain /mlab. To export this backup I need to determine its unique identifier (sequence number).

Below is the list of backups available for this client. Run this command from the Cloud gateway using the Avamar adminsitrator (MCUser) account. If you want to lock down access substitute this user with an Avamar domain account.

# avtar --id=MCUser --password=<password> --path=/mlab/winco.mlab.local --quiet --backups
 Date Time Seq Label Size Plugin Working directory Targets
 ---------- -------- ----- ----------------- ---------- -------- --------------------- -------------------
 2014-03-27 23:06:11 62 MOD-1395921964905 176588K Windows C:\Program Files\avs\var C:/Users/administrator/Downloads/jdk-7u51-windows-x64.exe,C:/Use
 2014-03-27 22:38:47 61 MOD-1395920320594 972293K Windows C:\Program Files\avs\var C:/Users/administrator/Downloads/AvamarVMwareCombined-linux-x86-
 2014-03-27 20:48:37 59 MOD-1395900367200 2473651K Windows C:\Program Files\avs\var C:\Users
 2014-03-27 17:57:48 40 Default Schedule-Windows-1395802800168 47635164K Windows C:\Program Files\avs\var
 2014-03-26 14:06:54 33 Default Schedule-Windows-1395802800168 47635164K Windows C:\Program Files\avs\var
 2014-03-25 14:12:30 32 Default Schedule-Windows-1395716400363 47575481K Windows C:\Program Files\avs\var
 2014-03-24 14:04:05 31 Default Schedule-Windows-1395630000186 47561819K Windows C:\Program Files\avs\var
 2014-03-23 14:04:04 30 Default Schedule-Windows-1395543600164 47560966K Windows C:\Program Files\avs\var
 2014-03-22 14:04:01 29 Default Schedule-Windows-1395457200109 47560054K Windows C:\Program Files\avs\var
 2014-03-21 14:04:21 28 Default Schedule-Windows-1395370800153 47560220K Windows C:\Program Files\avs\var
 2014-03-20 14:04:04 27 Default Schedule-Windows-1395284400105 47559427K Windows C:\Program Files\avs\var
 2014-03-19 14:04:32 26 Default Schedule-Windows-1395198000192 47558551K Windows C:\Program Files\avs\var
 2014-03-18 14:04:28 25 Default Schedule-Windows-1395111600179 47546065K Windows C:\Program Files\avs\var
 2014-03-17 14:04:08 24 Default Schedule-Windows-1395025200242 47546535K Windows C:\Program Files\avs\var
 2014-03-16 14:04:09 23 Default Schedule-Windows-1394938800339 47527684K Windows C:\Program Files\avs\var
 2014-03-15 14:04:45 22 Default Schedule-Windows-1394852400289 47598941K Windows C:\Program Files\avs\var
 2014-03-14 14:05:04 21 Default Schedule-Windows-1394766000197 47533802K Windows C:\Program Files\avs\var
 2014-02-23 19:39:28 1 MOD-1393144755033 557K Windows C:\Program Files\avs\var C:\WinDump.exe

The backup we are interested in is highlighted above with sequence #62 and backup label MOD-1395921964905.

To archive we need to export a copy of the backup to the Cloud gateway. Before we do that create an archive directory tree structure on the Cloud gateway that mirrors the Avamar domain structure. This will provide a mapping between archived backups in Glacier and the Avamar domain and client it originated from. If we were archiving backups from multiple Avamar servers then we may choose to prefix the structure with the Avamar server name. This would avoid conflicts.

For each backup archive create the following directory structure:

/archives/<Avamar domain path>/<Avamar client>/<Backup Sequence #>

In this example we create it as follows:

# mkdir -p /archives/mlab/winco.mlab.local/62

Now we can begin the export process. To do this we instruct Avamar’s avtar command to extract a copy of the backup using the PAX archive format. PAX is short for Portable Archive Exchage and has similarities to tar and cpio. This is written to a file name data.avpax under the directory structure we previously created.

# avtar --id=MCUser --password=<password> --path=/mlab/winco.mlab.local -x --dto-exportstream --streamformat=avpax --labelnumber=62 stream=/archives/mlab/winco.mlab.local/62/data.avpax

From here we can see the exported backup has been created and is now represented by a file on the Cloud gateway.

# ls -la /archives/mlab/winco.mlab.local/62/data.avpax
-rw-r--r-- 1 root root 181325312 Mar 27 23:09 /archives/mlab/winco.mlab.local/62/data.avpax

If we look at the exported file it contains some header and XML content followed by the backup data itself. The XML content is used to describe the backup if we ever wanted to bring it back into Avamar.

# head -20 /archives/mlab/winco.mlab.local/62/data.avpax
$global$paxrecs0000644000000000000000000000013412315012405010743 gustar0029 AVAMAR.sort_directories=0
27 AVAMAR.globalflags=3841
9 size=0
27 AVAMAR.enable_extents=0
backupexport_metainfo$paxrecs0000700000000000000000000000017312315012405014045 xustar0043 AVAMAR.objectname=backupexport_metainfo
41 AVAMAR.metadata=backupexport_metainfo
14 size=73033
25 AVAMAR.headflags=2241
backupexport_metainfo0000700000000000000000000021651112315012405012310 0ustar00<exportstream_metainfo>
 <archive_info>
 <flag order="1" type="textbox" value="7.0.101-56" desc="version of client" name="appversion" id="appversion" />
 <flag order="2" type="checkbox" value="true" desc="does the celerra/vnx support i18n" name="celerrai18n" id="celerrai18n" />
 <flag order="3" type="textbox" value="avtar --sysdir=&quot;C:\Program Files\avs\etc&quot; --bindir=&quot;C:\Program Files\avs\bin&quot; --vardir=&quot;C:\Program Files\avs\var&quot; --ctlcallport=49157 --ctlinterface=3001-MOD-1395921964905 --logfile=&quot;C:\Program Files\avs\var\clientlogs\MOD-1395921964905-3001-Windows.log&quot; --encrypt=tls --encrypt-strength=high --expires=1401109565 --retention-type=none --server=ave7-01.mlab.local --hfsport=27000 --id=backuponly --password=**************** --account=/mlab/winco.mlab.local --backup_mounted_vhds=true --backupsystem=false --checkcache=false --ddr=false --ddr-index=0 --debug=false --detect-acl-changes=false --filecachemax=-8 --force=false --freezecachesize=-50 --freezemethod=best --freezetimeout=300 --freezewait=4 --hashcachemax=-16 --informationals=2 --one-file-system=false --protect-profile=disabled --repaircache=false --run-after-freeze-exit=true --run-at-end-exit=true --run-at-start-exit=true --statistics=false --verbose=0 --windows-optimized-backup=false" desc="command line" name="command_line" id="command_line" />
 <flag order="4" type="textbox" value="0000h:00m:04s" desc="length of backup" name="elaptime" id="elaptime" />
 <flag order="5" type="integer" desc="number of errors in session" name="errors" id="errors" />
 <flag order="6" type="stringlist" value="C:/Users/administrator/Downloads/jdk-7u51-windows-x64.exe,C:/Users/administrator/Downloads/VMware-ClientIntegrationPlugin-5.5.0.exe" desc="list of top-level dirs/files in backup" name="files" id="files" />
 <flag order="7" type="textbox" desc="Specify a file that contains a list of options." name="flagfile" id="flagfile" />
 <flag order="8" type="checkbox" desc="Print this message." name="help" id="help" />
 <flag order="9" type="checkbox" desc="Print help including extended flags." name="helpx" id="helpx" />
 <flag order="10" type="checkbox" desc="Print this message in XML." name="helpxml" id="helpxml" />

Before we archive this backup we also want to extract file lists and backup job metadata in order to service additional use cases such as a search archive service.

For example, a full text search engine could be introduced to index these files to support the process of identifying long term archives for retrieval. There are many free search engines available. One to consider is Elasticsearch. I may get to this in a subsequent blog post.

To extract the backup job information  use the following command:

# avtar --id=MCUser --password=<password> --path=/mlab/winco.mlab.local -x --labelnumber=62 --quiet --internal --target=/archives/mlab/winco.mlab.local/62 .system_info

This information is  internal to Avamar hence the –internal flag.

As this is a Windows file system backup we also want to extract the file list into an ascii file.

# avtar --id=MCUser --password=<password> --path=/mlab/winco.mlab.local -t -v --labelnumber=62 --quiet > /archives/mlab/winco.mlab.local/62/file.lst

Now we have our metadata captured as follows:

# ls -la /archives/mlab/winco.mlab.local/62
total 177212
drwxr-xr-x 2 root root 4096 Mar 27 23:15 .
drwxr-xr-x 3 root root 4096 Mar 27 23:07 ..
-rw-r--r-- 1 root root 1818 Mar 27 22:38 archive_info
-rw-r--r-- 1 root root 4760 Mar 27 22:38 archive_info.xml
-rw-r--r-- 1 root root 181325312 Mar 27 23:09 data.avpax
-rw-r--r-- 1 root root 302 Mar 27 22:38 encodings.xml
-rw-r--r-- 1 root root 99 Mar 27 22:38 errors
-rw-r--r-- 1 root root 638 Mar 27 23:15 file.lst
-rw-r--r-- 1 root root 98 Mar 27 22:38 filestats
-rw-r--r-- 1 root root 109 Mar 27 22:38 groups
-rw-r--r-- 1 root root 274 Mar 27 22:38 locale.xml
-rw-r--r-- 1 root root 1492 Mar 27 22:38 machine.xml
-rw-r--r-- 1 root root 262144 Mar 27 22:38 mbr-d0.bin
-rw-r--r-- 1 root root 0 Mar 27 22:38 mounts
-rw-r--r-- 1 root root 1061 Mar 27 22:38 partitiontables.xml
-rw-r--r-- 1 root root 17520 Mar 27 22:38 sessionlog
-rw-r--r-- 1 root root 2787 Mar 27 22:38 statsfile
-rw-r--r-- 1 root root 461 Mar 27 22:38 userinfo.xml
-rw-r--r-- 1 root root 88 Mar 27 22:38 users
-rw-r--r-- 1 root root 8192 Mar 27 22:38 vbr-d0-p0.bin
-rw-r--r-- 1 root root 8192 Mar 27 22:38 vbr-d0-p1.bin
-rw-r--r-- 1 root root 8192 Mar 27 22:38 vbr-d0-p2.bin
-rw-r--r-- 1 root root 8192 Mar 27 22:38 vbr-d0-p3.bin
-rw-r--r-- 1 root root 1751 Mar 27 22:38 volumes.xml
-rw-r--r-- 1 root root 5420 Mar 27 22:38 workorder

If we like we can also generate and store a hash of the exported backup so that we can confirm its integrity if we recall it from Glacier. In this case we will use md5sum.

# md5sum /archives/mlab/winco.mlab.local/62/data.avpax | tee /archives/mlab/winco.mlab.local/62/data.avpax.md5sum
2b861454d43b317506cbe1ca3ad0751d /archives/mlab/winco.mlab.local/62/data.avpax

We must create a one-time vault in Glacier. We can have multiple vaults (I believe up to 1000 per account). In this experiment we will create one vault called avarchive

# mtglacier create-vault avarchive --config glacier.cfg
MT-AWS-Glacier, Copyright 2012-2014 Victor Efimov http://mt-aws.com/ Version 1.114

PID 35082 Started worker
PID 35082 Created vault avarchive
OK DONE

Now we are ready to upload the archive to Glacier. Lets instruct mtglacier to perform a dry run sync process to confirm what it will upload to Glacier. In this case we want to filter the criteria to files called data.avpax

# mtglacier sync --config glacier.cfg --vault avarchive --dir /archives --filter '+data.avpax -' --dry-run --journal /archives/avarchive.log
MT-AWS-Glacier, Copyright 2012-2014 Victor Efimov http://mt-aws.com/ Version 1.114

Will UPLOAD /archives/mlab/winco.mlab.local/62/data.avpax
OK DONE

As expected mtglacier has identified it needs to upload data.avpax for the archive backup with sequence #62. Now for the real thing.

# mtglacier sync --config glacier.cfg --vault avarchive --dir /archives --filter '+data.avpax -' --journal /archives/avarchive.log
MT-AWS-Glacier, Copyright 2012-2014 Victor Efimov http://mt-aws.com/ Version 1.114

PID 37255 Started worker
PID 37256 Started worker
PID 37257 Started worker
PID 37258 Started worker
PID 37255 Created an upload_id JdlfolL1PaPpsAcj1a_7w1UUO92PGwsFmqo51jfRPLUexyRwMvHpqpUfBi-QV6Jzb-VoUytGRF4YKx2Z5atj7FRhsVrF
PID 37256 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [0]
PID 37258 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [33554432]
PID 37257 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [16777216]
PID 37255 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [50331648]
PID 37256 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [67108864]
PID 37258 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [83886080]
PID 37255 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [117440512]
PID 37257 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [100663296]
PID 37256 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [134217728]
PID 37255 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [167772160]
PID 37258 Uploaded part for mlab/winco.mlab.local/62/data.avpax at offset [150994944]
PID 37257 Finished mlab/winco.mlab.local/62/data.avpax hash [5c6210976a6885cdf2da2af8b8ffd590eeb9b57eaa6d3a0d0f5acfcc983831bf] archive_id [XGZh128TUmUmYTC1M_KTZayQDfwtycldFCLNftu0TWjx3Kzeu_JbMftMtZLguDBT_iI6LpOd7kzH9Mts62TG74uS0NgRXy07bRhpWGgUnHO4VmMl30GriVaZwfvtf0XNuZQSV7Gfrg]
OK DONE

What we see here is mtglacier performed a multipart upload using 4 workers and 16MB chunks. This is necessary to drive parallelism and saturate bandwidth. About 9 minutes later the 181MB file upload completed. Here is what it looked like from the internet gateway.

aws_bandwidth_graph

Before we delete the archived backup from Avamar we should take a backup of the metadata we created on the Cloud gateway. This is necessary to ensure we can always maintain the relationship between Avamar archived backups and Glacier archives.

# avtar --id=MCUser --password=<password> --path=/mlab/fileserver --label=avarchive --exclude=data.avpax -c /archives

We can safely delete the exported backups and the backups they represent in Avamar.

First delete the backup in Avamar.

# avtar --id=MCUser --password=<password>--path=/mlab/winco.mlab.local --label=MOD-1395921964905 --labelnumber=62 --delete --force

Then delete the exported backup in the archive folder. This can be done automatically by searching the mtglacier journal log and deleting files that have been inserted with a CREATED record (CREATED means uploaded to Glacier). In this case we want to limit the search to files called data.avpax

# grep CREATED avarchive.log | awk '{ print $8 }' | grep data.avpax | xargs -i rm -f {}

At this point we have successfully demonstrated how to archive an Avamar backup to Glacier.

Retrieving and Importing Avamar Archived Backups from Glacier

To import an archived backup into Avamar we need to go through 4 steps:

  1. Identify the Glacier archive that needs to be retrieved
  2. Request Glacier to retrieve the archive
  3. Download the archive when it is ready to be retrieved
  4. Import the archived backup into Avamar

The workflow looks like this:

Retrieve

[UPDATE: 20140801 – mtglacier now supports restoring individual files using the new –include and –exclude options. Using grep to extract the file from the journal and creating a new journal log is no longer required]

Unfortunately mtglacier does not support restoring individual files. Rather, it restores any files referenced in the journal log that do not exist in the local archive file system.

To work around this limitation we extract the record entry we want to restore from the journal log and create a new one. We then use the new log to initiate the restore. In this case we want to retrieve the backup that was recently archived for winco.mlab.local with sequence #62.

# grep mlab/winco.mlab.local/62/data.avpax avarchive.log | tee retrieve.log
B 1395923307 CREATED XGZh128TUmUmYTC1M_KTZayQDfwtycldFCLNftu0TWjx3Kzeu_JbMftMtZLguDBT_iI6LpOd7kzH9Mts62TG74uS0NgRXy07bRhpWGgUnHO4VmMl30GriVaZwfvtf0XNuZQSV7Gfrg 181325312 1395922186 5c6210976a6885cdf2da2af8b8ffd590eeb9b57eaa6d3a0d0f5acfcc983831bf mlab/winco.mlab.local/62/data.avpax

Now initiate the restore request from Glacier using the retrieve.log journal we just created.

# mtglacier restore --config glacier.cfg --vault avarchive --dir /archives --max-number-of-files=1 --journal retrieve.log
MT-AWS-Glacier, Copyright 2012-2014 Victor Efimov http://mt-aws.com/ Version 1.114

PID 37884 Started worker
PID 37885 Started worker
PID 37886 Started worker
PID 37887 Started worker
PID 37884 Retrieved Archive XGZh128TUmUmYTC1M_KTZayQDfwtycldFCLNftu0TWjx3Kzeu_JbMftMtZLguDBT_iI6LpOd7kzH9Mts62TG74uS0NgRXy07bRhpWGgUnHO4VmMl30GriVaZwfvtf0XNuZQSV7Gfrg
OK DONE

After the retrieve request is issued Glacier takes several hours before the archive becomes available for download. If you try and restore an archive that is not available this is what mtglacier returns:

# mtglacier restore-completed --config glacier.cfg --vault avarchive --dir /archives --journal retrieve.log
MT-AWS-Glacier, Copyright 2012-2014 Victor Efimov http://mt-aws.com/ Version 1.114

PID 37983 Started worker
PID 37984 Started worker
PID 37985 Started worker
PID 37986 Started worker
PID 37984 Retrieved Job List
OK DONE

I waited a few hours and it still wasn’t available. I tried the next morning and the restore completed.

# mtglacier restore-completed --config glacier.cfg --vault avarchive --dir /archives --journal retrieve.log MT-AWS-Glacier, Copyright 2012-2014 Victor Efimov http://mt-aws.com/ Version 1.114

PID 48289 Started worker
PID 48290 Started worker
PID 48291 Started worker
PID 48292 Started worker
PID 48289 Retrieved Job List
PID 48292 Downloaded archive /archives/mlab/winco.mlab.local/62/data.avpax
OK DONE

Here is what the internet gateway reported.

aws_bandwidth_restore

This graph looks significantly higher and narrower than the previous one. Can anyone guess why?

My broadband connection is asymmetric. That is, the download line rate is significantly higher than the upload. This is very common for home broadband connections. However, for this use case, it is not ideal. We would generate significantly more upload traffic than download. To make this feasible requires a symmetric link.

In this experiment the download was quick and ran @ 3 MB/s. My link is capable of 10 MB/s.

Lets check the download against the MD5 hash we created.

# md5sum -c /archives/mlab/winco.mlab.local/62/data.avpax.md5sum
/archives/mlab/winco.mlab.local/62/data.avpax: OK

Now that we have downloaded the archive we need to import it back into Avamar. In this case we want to import it back into the original Avamar client. We specify the domain path as defined when it was exported.

The command is as follows:

# avtar --id=MCUser --password=<password> --path=/mlab/winco.mlab.local --dto-exportstream --streamformat=avpax --stream=/archives/mlab/winco.mlab.local/62/data.avpax -c

Now lets list the backups to make sure it was imported correctly.

# avtar --id=MCUser --password=<password> --path=/mlab/winco.mlab.local --quiet --backups
 Date Time Seq Label Size Plugin Working directory Targets
 ---------- -------- ----- ----------------- ---------- -------- --------------------- -------------------
 2014-03-28 00:31:55 63 MOD-1395921964905 176588K Windows C:\Program Files\avs\var C:/Users/administrator/Downloads/jdk-7u51-windows-x64.exe,C:/Use
 2014-03-27 22:38:47 61 MOD-1395920320594 972293K Windows C:\Program Files\avs\var C:/Users/administrator/Downloads/AvamarVMwareCombined-linux-x86-
 2014-03-27 20:48:37 59 MOD-1395900367200 2473651K Windows C:\Program Files\avs\var C:\Users
 2014-03-27 17:57:48 40 Default Schedule-Windows-1395802800168 47635164K Windows C:\Program Files\avs\var
 2014-03-26 14:06:54 33 Default Schedule-Windows-1395802800168 47635164K Windows C:\Program Files\avs\var
 2014-03-25 14:12:30 32 Default Schedule-Windows-1395716400363 47575481K Windows C:\Program Files\avs\var
 2014-03-24 14:04:05 31 Default Schedule-Windows-1395630000186 47561819K Windows C:\Program Files\avs\var
 2014-03-23 14:04:04 30 Default Schedule-Windows-1395543600164 47560966K Windows C:\Program Files\avs\var
 2014-03-22 14:04:01 29 Default Schedule-Windows-1395457200109 47560054K Windows C:\Program Files\avs\var
 2014-03-21 14:04:21 28 Default Schedule-Windows-1395370800153 47560220K Windows C:\Program Files\avs\var
 2014-03-20 14:04:04 27 Default Schedule-Windows-1395284400105 47559427K Windows C:\Program Files\avs\var
 2014-03-19 14:04:32 26 Default Schedule-Windows-1395198000192 47558551K Windows C:\Program Files\avs\var
 2014-03-18 14:04:28 25 Default Schedule-Windows-1395111600179 47546065K Windows C:\Program Files\avs\var
 2014-03-17 14:04:08 24 Default Schedule-Windows-1395025200242 47546535K Windows C:\Program Files\avs\var
 2014-03-16 14:04:09 23 Default Schedule-Windows-1394938800339 47527684K Windows C:\Program Files\avs\var
 2014-03-15 14:04:45 22 Default Schedule-Windows-1394852400289 47598941K Windows C:\Program Files\avs\var
 2014-03-14 14:05:04 21 Default Schedule-Windows-1394766000197 47533802K Windows C:\Program Files\avs\var
 2014-02-23 19:39:28 1 MOD-1393144755033 557K Windows C:\Program Files\avs\var C:\WinDump.exe

We can see a new backup #63 was imported. Avamar did not use the original sequence number. It  increments these when backups are created. The imported backup shares the same label as the original backup we exported which is OK.

We can now browse this backup using Avamar Administrator and restore it.

browse

We should point out the imported backup has no expiration date. If we want to set this we would use the –expires argument to avtar during the import.

What about compression and encryption?

If you would prefer to compress and subsequently encrypt the backups before they are sent to Glacier then we can substitute –stream for –to-stdout in the case of exports and –from-stdin in the case of imports.

The process would look something like this for exports:

# avtar ... --to-stdout | compression_command_pipe | encryption_command_pipe | dd of=archived_backup_file

And this for imports:

# dd if=archived_backup_file | encryption_command_pipe | decompression_command_pipe | avtar ... --from-stdin

You could use gzip or bzip2 for compression and ccrypt for encryption. Make sure to compress before encrypting. For the encryption key we could use a combination of the Avamar backup label and sequence number.

What about alternative archive targets?

Although Glacier was used as the target for this experiment, the options are endless. The same approach can be used to archive backups to many popular cloud and object stores including S3, Swift, Atmos, EVault, Azure, Google and Ceph, either through similar tools like mtglacier or alternatives such as FUSE modules.

Alternatively if you want to keep your archives on premise then traditional block or file  storage systems could be consumed by the Cloud gateway. Ideally, these would implement erasure coding schemes to keep costs down.

So it can be done… But is it practical?

We have proven it is possible to archive Avamar backups but does that mean it is practical?

Lets put things into perspective. If we wanted to archive monthly backups for long term retention to Glacier what would we need?

Avamar comes in many flavours; virtual, physical, with and without Data Domain. The sizes range from 500GB (before dedup) to 124TB (Avamar 16 node grid). With Data Domain we can store 570TB in the active tier and have several attached to one Avamar server.

Now, lets assume we stopped storing backups in Avamar greater than 1 month old and instead use Glacier. To work out the size of our monthly backup we need to understand the ratio of front-end protected storage to backend consumed for a 30 day retention profile.

There are many factors that impact this ratio (data type, change rate, growth rate, etc) however for the purpose of this experiment we will use 1:1.

For a 500GB Avamar instance we would need to archive 500GB a month to Glacier. We have 30 days to complete the archive before the next cycle starts. Realistically we don’t want to consume the entire 30 days. We need to give ourselves some tolerance. Therefore, lets say we want to complete a monthly cycle within 50% either side of the next cycle. How much upload bandwidth would we need for 500GB?

We would need a 3.2 Mbps upload link. What about larger volumes of data?

Below is a table of volumes relative to time occupied between cycles. In the 100% case the archiving process is running 24×7.

ubchart

What we can infer from this chart is the upload bandwidth requirements are very high.

For example, my broadband can only accommodate 2 Mbps. Even then home broadband plans are not appropriate as most of them have upload GB caps and throttle bandwidth to impractical levels when the cap is reached. My cap is 200 GB for upload and download combined and costs $80 AUD/month.

What we need is a symmetric link which is often reserved for businesses.

For example, a leading telco offers 10 Mbps business plans. That would support a 2TB monthly archive use case to Glacier at 75% busy. However, this type of connectivity is very costly at $7931/month. Compare this to Glacier’s cost of $0.01/GB/month or in this case $20/month (first month) scaling to $240/month (12 months) to store monthly archived backups for 1 year.

In this example, the cost of networking is 33x more than storage. This makes any cloud storage look expensive even at $0.01/GB/month.

The blended cost is $0.34/GB/month after year 1 (excluding AWS get/put request and restore costs).

Summary

Glacier is a very cost effective cold storage service. However, the cost of networking in this country makes it impractical to consume Glacier over the Internet for long term backup archives. To address this issue AWS offers alternative connectivity options that bypass traditional Internet connections and provides direct connectivity.

The product is called AWS Direct Connect and is designed to be more cost effective for large scale requirements. However, in addition to AWS Direct Connect usage costs there are line costs associated with AWS Direct Connect network partners. These prices are not in the public domain (that I could find) which makes it difficult to evaluate.

In part 2 we will explore if it is possible to minimise the networking requirements between Avamar and Glacier.

Comments

  1. Interesting article, does the same apply for s3 ?

Leave a Comment