Syncing to Archival Storage

From CAC Documentation wiki
Jump to navigationJump to search

This is a user-level guide for syncing a directory to CAC Archival Storage using Globus.

Prerequisites

  • You know how to log into Globus.
  • You are a user of a CAC project with archival storage service enabled. In this document, <CACUser> denotes your CAC user name and <CACProject> denotes your CAC project name.
  • On the Linux host from where you want to run (either one time or regularly scheduled) sync commands, install Globus CLI client. The syncing script is a bash shell command so only Linux is supported.
  • Tip: If running the pip3 install globus-cli command works for you, you can skip the install Globus CLI client documentation altogether.
  • If the source directory is not located on an existing Globus Connect Server endpoint, install Globus Connect Personal for Linux, MacOS, or Windows on the host where the source directory is located.

Log into Globus using CLI

On the Linux host from where you want to run sync commands,

  • Log into Globus using Globus CLI:
 $ globus login
 Please authenticate with Globus here:
 ------------------------------------
 https://auth.globus.org/v2/oauth2/authorize?...........
 ------------------------------------
 
 Enter the resulting Authorization Code here: 
Copy and paste the URL https://auth.globus.org/v2/oauth2/authorize?........... into a web browser. Log into Globus as instructed in the web browser. After logging in, copy and paste the code back into the session where you ran the globus login command and press enter.
You have successfully logged in to the Globus CLI!

You can check your primary identity with
  globus whoami

For information on which of your identities are in session use
  globus session show

Logout of the Globus CLI with
  globus logout
  • Verify you are logged into Globus using the globus whoami command and you should get your Globus ID in the output:
$ globus whoami
shl1@cornell.edu

Make a Shared Endpoint on CAC Archive

  • In a web browser, log into Globus. Under File Manager, go to cac#archive02 endpoint and navigate to /export/<CACProject>. If you'd like, make a new directory to which data will be copied from the source directory.

Configure the Source Endpoint

  • If your source directory is located on an existing Globus Connect Server endpoint, you will need to make it a shared endpoint just as you did for the destination directory on CAC Archive.
  • If the source directory is not located on an existing Globus Connect Server endpoint, install Globus Connect Personal for Linux, MacOS, or Windows on the host where the source directory is located. Start the Globus Connect Personal endpoint on the source host.

Locate Source and Destination Endpoints

  • Back in Globus CLI client, locate the IDs of source and destination endpoints using the globus endpoint search --filter-scope my-endpoints command:
$ globus endpoint search --filter-scope my-endpoints
ID                                   | Owner            | Display Name          
------------------------------------ | ---------------- | ----------------------
4c8b5dda-389e-11ea-9710-021304b0cca7 | shl1@cornell.edu | my_source_endpoint
606579ae-5b03-11e9-bf32-0edbf3a4e7ee | shl1@cornell.edu | cac_archive_endpoint   

Install the cli-sync.sh script

  • Download the cli-sync.sh script onto your Linux host.
  • Open cli-sync.sh file and modify the following variables with appropriate values:
  • SOURCE_ENDPOINT: ID of your source endpoint
  • DESTINATION_ENDPOINT: ID of your destination point
  • SOURCE_PATH: Should probably be "/"
  • DESTINATION_PATH: Should probably be "/"
  • SYNCTYPE: Read the comments in the script and decide carefully. checksum is the safest but slowest because it will make the destination host (CAC archive) to read the copied files from disk again to verify the checksum.
  • You now run cli-sync.sh script directly from the shell or as a cronjob for scheduled archival.