Difference between revisions of "Archival Storage"
Line 1: | Line 1: | ||
== What is CAC Archival Storage? == | == What is CAC Archival Storage? == | ||
:* CAC Archival Storage is a low-cost, high-performance option for storing research data '''available only to users within Cornell University'''. | :* CAC Archival Storage is a low-cost, high-performance option for storing research data '''available only to users within Cornell University'''. | ||
− | :* CAC Archival Storage is not mountable by running jobs, instead the user must transfer their data from the CAC Archival Storage to an accessible server using [// | + | :* CAC Archival Storage is not mountable by running jobs, instead the user must transfer their data from the CAC Archival Storage to an accessible server using [//globus.org/ Globus]. |
− | :* Globus | + | :* Globus users have easy access to add, delete, and share their data using any Globus endpoints. |
− | :* Some of the Globus | + | :* Some of the Globus endpoints available include: |
:** storage01.cac.cornell.edu (cac#home) where all CAC user home directories are found | :** storage01.cac.cornell.edu (cac#home) where all CAC user home directories are found | ||
:** XSEDE sites: | :** XSEDE sites: | ||
Line 15: | Line 15: | ||
:* Don't have a project? [https://{{SERVERNAME}}/Services/projects.aspx How to start a CAC project?]. | :* Don't have a project? [https://{{SERVERNAME}}/Services/projects.aspx How to start a CAC project?]. | ||
− | == Second step - create your Globus | + | == Second step - create your Globus account == |
CAC Archival Storage is accessible only through '''[//globus.org/ Globus]'''. If you have never used Globus, [//globus.org/SignUp first sign up for a free Globus account]. | CAC Archival Storage is accessible only through '''[//globus.org/ Globus]'''. If you have never used Globus, [//globus.org/SignUp first sign up for a free Globus account]. | ||
Line 21: | Line 21: | ||
When signing up with Globus, Cornell users should select '''Cornell University''' under '''Use your existing organizational login''' and then click on the '''Continue''' button. You will get forwarded to the [https://it.cornell.edu/cuweblogin CUWebLogin] page. Login using your Cornell NetID and password. | When signing up with Globus, Cornell users should select '''Cornell University''' under '''Use your existing organizational login''' and then click on the '''Continue''' button. You will get forwarded to the [https://it.cornell.edu/cuweblogin CUWebLogin] page. Login using your Cornell NetID and password. | ||
− | ==Globus | + | == Using Globus == |
− | :*[//globus.org/how-it-works How Globus | + | Globus can be accessed using the following methods: |
− | :*[// | + | |
+ | *[https://globus.org '''Globus Web GUI'''] | ||
+ | *[https://docs.globus.org/cli/ '''Globus CLI client on your computer''']: | ||
+ | ** [https://docs.globus.org/cli/installation/ Install Globus CLI client on your computer] | ||
+ | ** [https://docs.globus.org/cli/using-the-cli/ Using Globus CLI] | ||
+ | ** [https://docs.globus.org/cli/reference/ Globus CLI Reference] | ||
+ | * [https://globus-sdk-python.readthedocs.io/en/stable/ Globus SDK for Python]: for workflow automation and integration with your science gateways or third party software. | ||
+ | |||
+ | Note: The legacy ssh-based hosted CLI will be deprecated in the future. Please do not use it for new development. If you use the ssh-based CLI for current production, you will need to migrate to use new Globus CLI client or Globus SDK for Python soon. | ||
+ | |||
+ | ==Globus Documentation == | ||
+ | :*[//globus.org/how-it-works How Globus works?] | ||
+ | :*[//docs.globus.org/how-to/get-started/ Globus Quickstart]: A guide for signing up a free Globus account and start transferring files. | ||
:*[//docs.globus.org/how-to/share-files/ Share or Publish Your Data using Globus] | :*[//docs.globus.org/how-to/share-files/ Share or Publish Your Data using Globus] | ||
Line 30: | Line 42: | ||
CAC's EndPoint is <b>cac#archive01</b>. | CAC's EndPoint is <b>cac#archive01</b>. | ||
− | :*When activating cac#archive01 endpoint in Globus | + | :*When activating cac#archive01 endpoint in Globus web GUI, you will be prompted by a dialog box saying: |
<blockquote>The administrator of this endpoint, cac#archive01, requires that you authenticate using their MyProxy OAuth server to activate the endpoint. When you click 'Continue' you will be redirected to their website.</blockquote> | <blockquote>The administrator of this endpoint, cac#archive01, requires that you authenticate using their MyProxy OAuth server to activate the endpoint. When you click 'Continue' you will be redirected to their website.</blockquote> | ||
Line 36: | Line 48: | ||
:*You will be redirected to the <nowiki>https://archive01.cac.cornell.edu/oath/authorize...</nowiki> page. | :*You will be redirected to the <nowiki>https://archive01.cac.cornell.edu/oath/authorize...</nowiki> page. | ||
:*Enter your CAC credentials. | :*Enter your CAC credentials. | ||
− | :*When login is successful, you will be redirected back to Globus | + | :*When login is successful, you will be redirected back to Globus web GUI with the endpoint activated. |
=== Administrative Information === | === Administrative Information === | ||
:* cac#archive01's default path is '''/export'''. | :* cac#archive01's default path is '''/export'''. | ||
:* Each project with access to CAC Archival Storage has a shared directory (named the project) in which '''all project members have full read/write access'''. | :* Each project with access to CAC Archival Storage has a shared directory (named the project) in which '''all project members have full read/write access'''. | ||
− | :* Users can rename and move files and directories within their project directory on the endpoint. Globus | + | :* Users can rename and move files and directories within their project directory on the endpoint. Globus added this feature recently. |
==Advanced Topic - Automating transfers to the Archival Storage== | ==Advanced Topic - Automating transfers to the Archival Storage== |
Revision as of 23:02, 4 July 2017
What is CAC Archival Storage?
- CAC Archival Storage is a low-cost, high-performance option for storing research data available only to users within Cornell University.
- CAC Archival Storage is not mountable by running jobs, instead the user must transfer their data from the CAC Archival Storage to an accessible server using Globus.
- Globus users have easy access to add, delete, and share their data using any Globus endpoints.
- Some of the Globus endpoints available include:
- storage01.cac.cornell.edu (cac#home) where all CAC user home directories are found
- XSEDE sites:
- Stampede (xsede#stampede)
- Lonestar (xsede#lonestar4)
- TACC Archival Storage (xsede#ranch).
First step - Enable (or create) CAC project for Archival Storage and add users where appropriate
- To use the CAC Archival Storage service, you must be a user of a CAC project where Archival Storage is enabled.
- The project PI can add users and verify that Archival Storage is enabled at the Manage CAC project page.
- Don't have a project? How to start a CAC project?.
Second step - create your Globus account
CAC Archival Storage is accessible only through Globus. If you have never used Globus, first sign up for a free Globus account.
When signing up with Globus, Cornell users should select Cornell University under Use your existing organizational login and then click on the Continue button. You will get forwarded to the CUWebLogin page. Login using your Cornell NetID and password.
Using Globus
Globus can be accessed using the following methods:
- Globus Web GUI
- Globus CLI client on your computer:
- Globus SDK for Python: for workflow automation and integration with your science gateways or third party software.
Note: The legacy ssh-based hosted CLI will be deprecated in the future. Please do not use it for new development. If you use the ssh-based CLI for current production, you will need to migrate to use new Globus CLI client or Globus SDK for Python soon.
Globus Documentation
- How Globus works?
- Globus Quickstart: A guide for signing up a free Globus account and start transferring files.
- Share or Publish Your Data using Globus
CAC specifics
Technical Information
CAC's EndPoint is cac#archive01.
- When activating cac#archive01 endpoint in Globus web GUI, you will be prompted by a dialog box saying:
The administrator of this endpoint, cac#archive01, requires that you authenticate using their MyProxy OAuth server to activate the endpoint. When you click 'Continue' you will be redirected to their website.
- You will be redirected to the https://archive01.cac.cornell.edu/oath/authorize... page.
- Enter your CAC credentials.
- When login is successful, you will be redirected back to Globus web GUI with the endpoint activated.
Administrative Information
- cac#archive01's default path is /export.
- Each project with access to CAC Archival Storage has a shared directory (named the project) in which all project members have full read/write access.
- Users can rename and move files and directories within their project directory on the endpoint. Globus added this feature recently.
Advanced Topic - Automating transfers to the Archival Storage
- Install Globus Connect Personal on the Linux/MacOS/Windows host you wish to archive by clicking on the "Get Globus Connect Personal" link on the Transfer Files screen on Globus.
- Error creating thumbnail: Unable to save thumbnail to destination
- On the host you wish to archive, download and untar Media:archive_scripts.tar.gz.
- To enable running Globus Connect Personal as root, add
"-allow-root",
- to globusconnectpersonal-2.0.3/gc.py (on line ~ 360):
args = [os.path.basename(PDEATH_LAUNCH),
GRIDFTP_SERVER,
"-allow-root",
"-i", "-always-send-markers",
"-hostname", "127.0.0.1",
- Copy root-bin directory from the archive_scripts.tar.gz to /root/bin. If you are archiving directories outside /home, modify the -restrict-path argument in /root/bin/gc_start.sh.
- Generate a ssh key pair using the "ssh-keygen" command, leave private key in ~/.ssh, and upload the private key to Globus
- Error creating thumbnail: Unable to save thumbnail to destination
- Make sure you can access Globus CLI like this:
ssh -i .ssh/<private key> <globus user name>@cli.globusonline.org
- Modify archive.sh to match your Globus user name, private key file name, CAC project and archive directory.
- On Globus, make sure your connection to cac#archive01 endpoint is activated.
- You should now be able to run archive.sh to upload your archive directory to CAC archive. You can automate this script using cron.
Advanced Topic: Syncing to Archival Storage
See here for how to sync data to Archival Storage