EMC NetWorker parallel saveset cloning with nsrclone and GNU parallel

It should be no surprise to drive backup or cloning throughput requires high levels of parallelism. As of this writing NetWorker 8.0.2 supports cloning of individual savesets that commence at the completion of a savegroup. The individual savesets that require cloning are passed to the nsrclone command which runs through them in sequence.

To optmise for cloning, savesets need to be spread across multiple savegroups in order to create clone session parallelism. Splitting up savesets this way is not always possible or desirable.

I wrote pnsrclone to speed up the cloning process without requiring savegroup redesign.

This script is a multi-process wrapper written around the standard Networker nsrclone command. It does everything nsrclone can do with the added benefit of managing a predefined number of parallel nsrclone sessions to process a queue of savesets.

The script supports the command line switches available to nsrclone with the exception of -P and -W which are not relevant for cloning.

Three environment variables can be set to control the scripts behaviour:

PARALLELISM=<n>

Sets the number of parallel nsrclone sessions to spawn and maintain while the saveset queue is processed. The default is 10.

JOBLOG=<filename>

Accepts a filename that is used to record relevant statistics from each parallel nsrclone session. As each nsrclone session completes various statistics are recorded by GNU parallel. These statistics can be used to determine the state of completed nsrclone sessions.

The following statistics are logged:

  1. Seq: The sequence number or order in which this command run
  2. Host;: N/A
  3. Starttime: Start time as seconds since epoch
  4. Runtime: Command runtime in seconds
  5. Send: N/A
  6. Receive: N/A
  7. Exitval: Exit value of nsrclone command
  8. Signal: Value of signal if received
  9. Command: nsrclone command run with arguments

There is no default defined and the environment variable is mandatory.

DRYRUN=1

When set to 1 instructs pnsrclone to perform a dry run without executing the nsrclone sessions. This parameter can be used to determine the number of savesets that would be cloned. The default is not to perform a dry run.

All output from pnsrclone and the underlying output from nsrclone sessions are written to standard out and standard error. GNU parallel does its best to keep the output of each nsrclone session together. When run from cron it is best to redirect standard out and error to a scratch file.

The script allows one instance of pnsrclone to run per unique filename defined by the JOBLOG environment variable. This is to prevent multiple invocations of pnsrclone from cloning the same savesets at the same time.

If there is a requirement to run multiple instances a different JOBLOG must be specified. And to prevent overlap avoid defining the same group (-g) or volid (-V) between pnsrclone instances that are destined to overlap.

The script requires GNU parallel to be installed which is available from here.

Testing was conducted on a Linux distribution with NetWorker 8.0.2. It should work on other UNIX variants. If you have success feel free to drop a comment here. There is no Windows version yet. If I get enough requests I may try and port it.

Run it from either the NetWorker server or storage node and schedule it via cron or some other method.

When pnsrclone runs you should see in NetWorker Administrator multiple clone sessions as below:

clone

The number of clone sessions will be relative to the PARALLELISM setting.

Examples script runs are included below:

  • Clone all backups that were created from the Exchange group up to a week ago to the clone pool named Target Clone Pool. Only clone backups that have less than one copy in the target clone pool. Use up to 8 parallel nsrclone sessions.

env PARALLELISM=8 JOBLOG=/tmp/pnsrclone.log pnsrclone -F -g Exchange -b “Target Clone Pool” -C 1 -t “last week” -S

  • As above but set the clone savesets browse and data retention policy to 6 months from the backup copy

env PARALLELISM=8 JOBLOG=/tmp/pnsrclone.log pnsrclone -F -g Exchange -b “Target Clone Pool” -y “6 month” -w “6 month” -C 1 -t “last week” -S

Disclaimer: the script is provided as-is and is not supported or maintained by EMC.

Download version 1.2 from here.

Comments

  1. Markus Trimmel says:

    Hi,

    thanks for posting the script. I tried the script under RHEL 6.4, but run into this issue.

    GNU Parallel is installed from

    yum install http://download.opensuse.org/repositories/home:/tange/RedHat_RHEL-6/noarch/parallel-20131022-1.1.noarch.rpm

    But we receive this error

    [root@XXXXXXX ~]# env PARALLELISM=3 JOBLOG=/tmp/pnsrclone.log /usr/local/bin/pnsrclone -B Default -C 2 -b clone -t ’60 days ago’ -S
    INF: [20131114000902] JOBLOG set to /tmp/pnsrclone.log
    INF: [20131114000902] PARALLELISM set to 3
    INF: [20131114000902] Executing nsrclone -n -B Default -C 1 -b clone -t 60 days ago -S
    INF: [20131114000903] Found 10 savesets to clone
    INF: [20131114000903] Savesets are 4166507462/1381489606 3498308655/1384379439 3515085859/1384379427 3565417498/1384379418 3615749100/1384379372 3481531441/1384379441 3598971886/1384379374 3582194712/1384379416 3548640285/1384379421 3531863072/1384379424
    INF: [20131114000903] Starting up to 3 parallel nsrclone sessions
    /bin/sh: 16: command not found
    /bin/sh: 16: command not found
    /bin/sh: 16: command not found
    /bin/sh: 16: command not found
    /bin/sh: 16: command not found
    /bin/sh: 16: command not found
    /bin/sh: 16: command not found
    /bin/sh: 16: command not found
    /bin/sh: 16: command not found
    /bin/sh: 16: command not found
    INF: [20131114000903] Finished parallel nsrclone sessions
    ERR: [20131114000903] 0 of 10 cloning sessions completed successfully in 0 minute(s)

    May you find some time, regards
    Markus

  2. Nice job.. have you tried to option of using nsrcloneconfig file under /nsr/debug ?

    Refer the below link for more information
    https://community.emc.com/docs/DOC-40563

  3. Excellent, but I need a Windows Version.

  4. Geza Balazs says:

    Hi,

    is this script pnsrclone intended only for Data Domain DDBOOST environments?

    Or is it also applicable to traditional tape-based scenarios? (A customer of us is willing to clone backups from DD VTL to physical tape. The issue is that this is typically done sequentially, no parallel cloning with Networker’s own nsrclone, if the source save sets are not multiplexed.)

    Thx.

    Geza Balazs

Leave a Comment