Difference between revisions of "Syncing to Archival Storage"
Line 4: | Line 4: | ||
==Assumptions and definitions== | ==Assumptions and definitions== | ||
− | * You have a Globus account, which we'll indicate by <globusAccountName> | + | :* You have a Globus account, which we'll indicate by <globusAccountName> |
− | * You have a local endpoint you want to sync, indicated by <localEndpointName> | + | :* You have a local endpoint you want to sync, indicated by <localEndpointName> |
− | * There is a destination endpoint <destinationEndpointName>; for syncing to the CAC archive resource it is cac#archive01/export/archive01/<CACProjectName>/<path> | + | :* There is a destination endpoint <destinationEndpointName>; for syncing to the CAC archive resource it is cac#archive01/export/archive01/<CACProjectName>/<path> |
− | ** The CAC archive endpoint is active | + | :** The CAC archive endpoint is active |
− | * Any subsidiary paths with be written as <path>. | + | :* Any subsidiary paths with be written as <path>. |
==Limitations== | ==Limitations== | ||
Line 41: | Line 41: | ||
You also need ssh keys set up with the Globus system. You store your private key locally (typically in the .ssh subdirectory of your home directory and give it a name, which we'll represent as <mykey>) and: | You also need ssh keys set up with the Globus system. You store your private key locally (typically in the .ssh subdirectory of your home directory and give it a name, which we'll represent as <mykey>) and: | ||
− | * Go to the globus website and click on your account name at top-right, and select "manage identities" | + | :* Go to the globus website and click on your account name at top-right, and select "manage identities" |
− | ** Select "Add linked identity" and pick "Add SSH public key" | + | :** Select "Add linked identity" and pick "Add SSH public key" |
− | ** Paste the public key into the box for it and give the key a name | + | :** Paste the public key into the box for it and give the key a name |
− | ** Click "Submit" | + | :** Click "Submit" |
− | ==== | + | ==== How to perform a backup ==== |
Once GlobusConnect is started you next issue a command to the CLI via ssh. For example: | Once GlobusConnect is started you next issue a command to the CLI via ssh. For example: | ||
Line 86: | Line 86: | ||
Ideally each script would terminate the GLobusConnect client when the transfer completed but, this is not yet implemented and may never be depending on the time and effort required to make it work. | Ideally each script would terminate the GLobusConnect client when the transfer completed but, this is not yet implemented and may never be depending on the time and effort required to make it work. | ||
− | = | + | =Windows= |
− | == | + | ==Assumption== |
The client endpoint -- the one containing the resources to be transferred to the CAC endpoint -- is active. | The client endpoint -- the one containing the resources to be transferred to the CAC endpoint -- is active. | ||
− | == | + | ==Explanation== |
We're going to use the command-line interface (CLI) to Globus, which basically means logging into their dedicated server over SSH and sending commands. The CLI is detailed here: https://support.globus.org/forums/22861518-Command-Line-Interface | We're going to use the command-line interface (CLI) to Globus, which basically means logging into their dedicated server over SSH and sending commands. The CLI is detailed here: https://support.globus.org/forums/22861518-Command-Line-Interface | ||
− | == | + | ==Setup== |
− | * Download the latest version of PuTTY (some older versions won't work), including PuTTYGen and plink (the Windows installer contains them all: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html ) | + | :* Download the latest version of PuTTY (some older versions won't work), including PuTTYGen and plink (the Windows installer contains them all: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html ) |
− | * Launch PuTTYGen | + | :* Launch PuTTYGen |
− | ** Make sure the "SSH2-RSA" radio button is selected, and a key length of at least 2048 in the box below that | + | :** Make sure the "SSH2-RSA" radio button is selected, and a key length of at least 2048 in the box below that |
− | ** Click the "Generate" button. You'll need to keep moving the cursor over the blank grey area to generate randomness | + | :** Click the "Generate" button. You'll need to keep moving the cursor over the blank grey area to generate randomness |
− | ** You don't want to use a passphrase! | + | :** You don't want to use a passphrase! |
− | ** Save the keys: | + | :** Save the keys: |
− | *** the private key should be called something like <privateKeyName>_id_rsa.ppk and stored somewhere safe but accessible to the scheduled task | + | :*** the private key should be called something like <privateKeyName>_id_rsa.ppk and stored somewhere safe but accessible to the scheduled task |
− | *** The public key can be saved or you can just copy the key, which should be in clear test in the box to the clipboard | + | :*** The public key can be saved or you can just copy the key, which should be in clear test in the box to the clipboard |
− | * With the public key in the clipboard, go to the globus website and click on your account name at top-right, and select "manage identities" | + | :* With the public key in the clipboard, go to the globus website and click on your account name at top-right, and select "manage identities" |
− | ** Select "Add linked identity" and pick "Add SSH public key" | + | :** Select "Add linked identity" and pick "Add SSH public key" |
− | ** Paste the public key into the box for it and give the key a name | + | :** Paste the public key into the box for it and give the key a name |
− | ** Click "Submit" | + | :** Click "Submit" |
− | *Create a connection: | + | :*Create a connection: |
− | ** Start PuTTY | + | :** Start PuTTY |
− | ** In the "Session" tab: | + | :** In the "Session" tab: |
− | *** put <accountName>@cli.globusonline.org in the Host Name box | + | :*** put <accountName>@cli.globusonline.org in the Host Name box |
− | *** in the "Saved Sessions" textbox give the session a name; I use "globusSync" and we'll refer to this as <sessionName> | + | :*** in the "Saved Sessions" textbox give the session a name; I use "globusSync" and we'll refer to this as <sessionName> |
− | ** On the Connection > SSH > Auth tab, for "Private Key for authentication" click "browse" and select the file in which you wrote your private key | + | :** On the Connection > SSH > Auth tab, for "Private Key for authentication" click "browse" and select the file in which you wrote your private key |
− | ** Back on the "Session" tab, click "save"; your session name should now appear on the list of saved sessions | + | :** Back on the "Session" tab, click "save"; your session name should now appear on the list of saved sessions |
− | ** You can test that it works, now; double click on the saved session name; after accepting the server key, you should find yourself in an ssh session | + | :** You can test that it works, now; double click on the saved session name; after accepting the server key, you should find yourself in an ssh session |
You'll use plink to actually send the sync command (or any other Globus CLI commands you want to use); depending on whether it's in the right paths, you may wish to use the full path to the plink executable (for example, C:\Program Files (x86)\PuTTY\plink.exe) when you set this up as a scheduled task. The basic command, to run on the command line, is this: | You'll use plink to actually send the sync command (or any other Globus CLI commands you want to use); depending on whether it's in the right paths, you may wish to use the full path to the plink executable (for example, C:\Program Files (x86)\PuTTY\plink.exe) when you set this up as a scheduled task. The basic command, to run on the command line, is this: | ||
Line 129: | Line 129: | ||
=='''Creating the scheduled task'''== | =='''Creating the scheduled task'''== | ||
− | * Start up the task scheduler | + | :* Start up the task scheduler |
− | * Select "Create task" from the "Actions" tab | + | :* Select "Create task" from the "Actions" tab |
− | * Give the task a name, select a user identity as which this should run (ensuring it has access to the ssh session and key information you set up) and select the "run whether user is logged on or not" radio button (note that this doesn't fix the issue of the endpoint going down if the owning user isn't logged on). If only local resources will be required you can select to not store password details | + | :* Give the task a name, select a user identity as which this should run (ensuring it has access to the ssh session and key information you set up) and select the "run whether user is logged on or not" radio button (note that this doesn't fix the issue of the endpoint going down if the owning user isn't logged on). If only local resources will be required you can select to not store password details |
− | ** On the "Triggers" tab | + | :** On the "Triggers" tab |
− | *** Select "New" | + | :*** Select "New" |
− | *** Select "run on a schedule" from the drop-down, and select when you want it to run and on what cadence | + | :*** Select "run on a schedule" from the drop-down, and select when you want it to run and on what cadence |
− | *** Select the checkbox for "Enabled" (important!) | + | :*** Select the checkbox for "Enabled" (important!) |
− | ** On the "Actions" tab | + | :** On the "Actions" tab |
− | *** Select "New" | + | :*** Select "New" |
− | *** Select "Start a program" from the drop-down | + | :*** Select "Start a program" from the drop-down |
− | *** For "Program name" put the full path to plink.exe, enclosed in double quotes if it contains a space, eg: | + | :*** For "Program name" put the full path to plink.exe, enclosed in double quotes if it contains a space, eg: |
"C:\Program Files (x86)\PuTTY\plink.exe" | "C:\Program Files (x86)\PuTTY\plink.exe" | ||
− | * For "arguments" enter: | + | :* For "arguments" enter: |
<sessionName> transfer -s 2 --preserve-mtime --verify-checksum -- <globusAccountName>#<localEndpointName>/<path> cac#archive01/export/archive01/<CACProjectName>/<path> -r | <sessionName> transfer -s 2 --preserve-mtime --verify-checksum -- <globusAccountName>#<localEndpointName>/<path> cac#archive01/export/archive01/<CACProjectName>/<path> -r | ||
− | * Accept the other defaults, and click "OK". You'll have to enter the Windows credentials for the account under which the process will run if you didn't select the option not to store the password. | + | :* Accept the other defaults, and click "OK". You'll have to enter the Windows credentials for the account under which the process will run if you didn't select the option not to store the password. |
Revision as of 11:29, 21 September 2015
Purpose of this Howto
This is a user-level guide for syncing a Linux or a Windows machine with Globus, particularly to the CAC Archival Storage.
Assumptions and definitions
- You have a Globus account, which we'll indicate by <globusAccountName>
- You have a local endpoint you want to sync, indicated by <localEndpointName>
- There is a destination endpoint <destinationEndpointName>; for syncing to the CAC archive resource it is cac#archive01/export/archive01/<CACProjectName>/<path>
- The CAC archive endpoint is active
- Any subsidiary paths with be written as <path>.
Limitations
This requires that both endpoints be active. Although it can be run from any machine (not necessarily the one with an endpoint on it) Globus Connect Personal runs under a user identity and dies when that person logs out; this is not an issue on Linux, but limits its usefulness if syncing as a scheduled task on Windows, where it may be possible to run Globus Connect Personal as a service but that has not yet been tested.
Linux
Setup
To backup a directory from the a linux file server to CAC's archive you must first start GlobusConnect on the file server. Designate or create an account to run the syncing process, which we will call <sync-user>. Create these scripts so that the <sync-user> account can execute them:
gc_start.sh:
#!/bin/bash sh /opt/globusconnectpersonal-2.0.3/globusconnect -start -restrict-paths rw/<path to back up>&
gc_status.sh:
#!/bin/bash sh /opt/globusconnectpersonal-2.0.3/globusconnect -status
gc_stop.sh
#!/bin/bash sh /opt/globusconnectpersonal-2.0.3/globusconnect -stop
Execute gc_start.sh:
<pathToScript>/gc_start.sh
You also need ssh keys set up with the Globus system. You store your private key locally (typically in the .ssh subdirectory of your home directory and give it a name, which we'll represent as <mykey>) and:
- Go to the globus website and click on your account name at top-right, and select "manage identities"
- Select "Add linked identity" and pick "Add SSH public key"
- Paste the public key into the box for it and give the key a name
- Click "Submit"
- Go to the globus website and click on your account name at top-right, and select "manage identities"
How to perform a backup
Once GlobusConnect is started you next issue a command to the CLI via ssh. For example:
ssh -t <globusAccountName>@cli.globusonline.org transfer -s 2 --preserve-mtime --verify-checksum -- <globusAccountName>#<localEndpointName> cac#archive01/export/archive01/<CACProjectName>/<path> -r
This command will backup the /home/fs01 directory to the CAC archive preserving the last modified timestamp, performing a checksum, and only backing up files with timestamps newer than those existing in the archive or new files. Nothing will be deleted.
To monitor the status of your backup go to the cacsystems GlobusOnline transfer activity page. If you don't have the password, talk to other CAC staff to obtain it. Once your backup is completed an automated summary will be mailed to cac-systems. Next you need to stop the GlobusConnect client on the file server by running:
<pathToScript>/gc_stop.sh
You can check the status of the GlobusConnect by running:
<pathToScript>/gc_status.sh
Scheduled Backups
You can use cron jobs to perform scheduled backups. You need a user in which context these services will run; we will call this user <sync-user>. Our example services are
- Daily backup of /home/fs01/ running at 11:00PM
- Weekly backup of /home/shared running on Sunday at 11:30PM
Say you want to run these daily and weekly sync cronjobs in the context of the CTC_ITH\<sync-user> user. In this user's home folder create a daily-sync.sh and a weekly-sync.sh file. Each file should be scheduled accordingly via crontab.
5 0 * * * /home/<sync-user>/daily-sync.sh 5 0 * * 6 /home/<sync-user>/weekly-sync.sh
Because of limited program control available in the batch files each file does the following at present:
- attempts to start the GlobusConnect client.
- auto-activates the GlobusConnect endpoint on hd-hni-fs.cac.cornell.edu
- initiates a transfer command
Example content of weekly-sync.sh
#!/bin/bash <pathtTo>/gc_start.sh ssh -i .ssh/<mykey> -t <globusAccount>@cli.globusonline.org endpoint-activate <globusAccount>#<my-endpoint> ssh -i .ssh/<mykey> -t <globusAccount>@cli.globusonline.org transfer -s 2 --verify-checksum -- <globusAccount>#<my-endpoint>/home/shared/ cac#archive01/export/archive01/<CACProjectName>/<path>$
Ideally each script would terminate the GLobusConnect client when the transfer completed but, this is not yet implemented and may never be depending on the time and effort required to make it work.
Windows
Assumption
The client endpoint -- the one containing the resources to be transferred to the CAC endpoint -- is active.
Explanation
We're going to use the command-line interface (CLI) to Globus, which basically means logging into their dedicated server over SSH and sending commands. The CLI is detailed here: https://support.globus.org/forums/22861518-Command-Line-Interface
Setup
- Download the latest version of PuTTY (some older versions won't work), including PuTTYGen and plink (the Windows installer contains them all: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html )
- Launch PuTTYGen
- Make sure the "SSH2-RSA" radio button is selected, and a key length of at least 2048 in the box below that
- Click the "Generate" button. You'll need to keep moving the cursor over the blank grey area to generate randomness
- You don't want to use a passphrase!
- Save the keys:
- the private key should be called something like <privateKeyName>_id_rsa.ppk and stored somewhere safe but accessible to the scheduled task
- The public key can be saved or you can just copy the key, which should be in clear test in the box to the clipboard
- With the public key in the clipboard, go to the globus website and click on your account name at top-right, and select "manage identities"
- Select "Add linked identity" and pick "Add SSH public key"
- Paste the public key into the box for it and give the key a name
- Click "Submit"
- Create a connection:
- Start PuTTY
- In the "Session" tab:
- put <accountName>@cli.globusonline.org in the Host Name box
- in the "Saved Sessions" textbox give the session a name; I use "globusSync" and we'll refer to this as <sessionName>
- On the Connection > SSH > Auth tab, for "Private Key for authentication" click "browse" and select the file in which you wrote your private key
- Back on the "Session" tab, click "save"; your session name should now appear on the list of saved sessions
- You can test that it works, now; double click on the saved session name; after accepting the server key, you should find yourself in an ssh session
- Create a connection:
You'll use plink to actually send the sync command (or any other Globus CLI commands you want to use); depending on whether it's in the right paths, you may wish to use the full path to the plink executable (for example, C:\Program Files (x86)\PuTTY\plink.exe) when you set this up as a scheduled task. The basic command, to run on the command line, is this:
"C:\Program Files (x86)\PuTTY\plink.exe" <sessionName> transfer -s 2 --preserve-mtime --verify-checksum -- <accountName>#<localEndpointName>/<path> cac#archive01/export/archive01/<CACProjectName>/<path> -r
You should test that it works by calling up cmd.exe and executing it.
Creating the scheduled task
- Start up the task scheduler
- Select "Create task" from the "Actions" tab
- Give the task a name, select a user identity as which this should run (ensuring it has access to the ssh session and key information you set up) and select the "run whether user is logged on or not" radio button (note that this doesn't fix the issue of the endpoint going down if the owning user isn't logged on). If only local resources will be required you can select to not store password details
- On the "Triggers" tab
- Select "New"
- Select "run on a schedule" from the drop-down, and select when you want it to run and on what cadence
- Select the checkbox for "Enabled" (important!)
- On the "Actions" tab
- Select "New"
- Select "Start a program" from the drop-down
- For "Program name" put the full path to plink.exe, enclosed in double quotes if it contains a space, eg:
- On the "Triggers" tab
"C:\Program Files (x86)\PuTTY\plink.exe"
- For "arguments" enter:
<sessionName> transfer -s 2 --preserve-mtime --verify-checksum -- <globusAccountName>#<localEndpointName>/<path> cac#archive01/export/archive01/<CACProjectName>/<path> -r
- Accept the other defaults, and click "OK". You'll have to enter the Windows credentials for the account under which the process will run if you didn't select the option not to store the password.