Difference between revisions of "Archival Storage"

From CAC Documentation wiki
Jump to navigation Jump to search
(Corr)
 
(29 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
== What is CAC Archival Storage? ==
 
== What is CAC Archival Storage? ==
:* CAC Archival Storage is a low-cost, high-performance option for storing research data '''available only to users within Cornell University'''.  
+
:* CAC Archival Storage provides low-cost, high-performance option for storing research data.
:* CAC Archival Storage is not mountable by running jobs, instead the user must transfer their data from the CAC Archival Storage to an accessible server using [//globusonline.org/ Globus Online].  
+
:* CAC Archival Storage has a direct 100 Gbps connection to Internet2.
:* Globus Online users have easy access to add, delete, and share their data using any Globus Online endpoints.  
+
:* CAC Archival Storage is not mountable by running jobs, instead the user must transfer their data from the CAC Archival Storage to an accessible server using [//globus.org/ Globus].
:* Some of the Globus Online endpoints available include:
+
:* Data in CAC Archival Storage is intended to be an additional copy of user data; CAC Archival Storage is not backed up or snapshotted.
:** storage01.cac.cornell.edu (cac#home) where all CAC user home directories are found
+
:* All CAC resources are suitable for unregulated, non-confidential data ([https://it.cornell.edu/security-and-policy/data-types-confidential-regulated-restricted-public reference] for details).  
 +
:* Globus users have easy access to add, delete, and share their data using any Globus endpoints.  
 +
:* Some of the Globus endpoints available include:
 +
:** CAC user home directories (cac#home)
 
:** XSEDE sites:
 
:** XSEDE sites:
 
:*** Stampede (xsede#stampede)  
 
:*** Stampede (xsede#stampede)  
Line 10: Line 13:
 
:*** TACC Archival Storage (xsede#ranch).
 
:*** TACC Archival Storage (xsede#ranch).
  
== First step - Enable (or create) CAC project for Archival Storage and add users where appropriate ==
+
== First Step - Enable (or create) CAC project for Archival Storage and add users where appropriate ==
 
:* To use the CAC Archival Storage service, '''you must be a user of a CAC project where Archival Storage is enabled'''.
 
:* To use the CAC Archival Storage service, '''you must be a user of a CAC project where Archival Storage is enabled'''.
 
:* The project PI can add users and verify that Archival Storage is enabled at the [https://{{SERVERNAME}}/Services/Projects/manage.aspx Manage CAC project page].
 
:* The project PI can add users and verify that Archival Storage is enabled at the [https://{{SERVERNAME}}/Services/Projects/manage.aspx Manage CAC project page].
 
:* Don't have a project? [https://{{SERVERNAME}}/Services/projects.aspx How to start a CAC project?].
 
:* Don't have a project? [https://{{SERVERNAME}}/Services/projects.aspx How to start a CAC project?].
  
== Second step - create your Globus Online account ==
+
== Second Step - Log into Globus ==
  
CAC Archival Storage is accessible only through '''[//globusonline.org/ Globus]'''.  If you have never used Globus, [//globus.org/SignUp first sign up for a free Globus account].
+
CAC Archival Storage is accessible only through '''[//globus.org/ Globus]'''.  Log into '''[//globus.org/ Globus]''' using [https://docs.globus.org/how-to/get-started/ these login instructions].
  
When signing up with Globus, Cornell users should select '''Cornell University''' under '''Use your existing organizational login''' and then click on the '''Continue''' button. You will get forwarded to the [https://it.cornell.edu/cuweblogin CUWebLogin] page. Login using your Cornell NetID and password.
+
When logging into Globus,  
 +
# Cornell users should select '''Cornell University''' under '''Use your existing organizational login''', or
 +
# Weill Cornell Medicine users should select '''Weill Cornell Medical College''',
  
==Globus Online links ==
+
[[File:GlobusLogin.png | 600px | center ]]
:*[//globus.org/how-it-works How Globus Online works?]
 
:*[//globusonline.org/quickstart/ Globus Online Quickstart]: A guide for signing up a free Globus account and start transferring files.
 
:*[//docs.globus.org/how-to/share-files/ Share or Publish Your Data using Globus]
 
  
== CAC specifics ==
+
and then click on the '''Continue''' button. You will get forwarded to the [https://it.cornell.edu/cuweblogin CUWebLogin (Cornell users)] or WCM Web Login (Weill Cornell users) page. Login using your NetID or CWID and password.
=== Technical Information ===
 
CAC's EndPoint is <b>cac#archive01</b>.
 
  
:*When activating cac#archive01 endpoint in Globus Online web GUI, you will be prompted by a dialog box saying:  
+
== Using Globus ==
 +
Globus can be accessed using the following methods:
  
<blockquote>The administrator of this endpoint, cac#archive01, requires that you authenticate using their MyProxy OAuth server to activate the endpoint. When you click 'Continue' you will be redirected to their website.</blockquote>
+
*[https://globus.org '''Globus Web GUI''']: click on the "Login" button on the Globus web site to start.
 +
*[https://docs.globus.org/cli/ '''Globus CLI client on your computer''']:
 +
** [https://docs.globus.org/cli/installation/ Install Globus CLI client on your computer]
 +
** [https://docs.globus.org/cli/ Using Globus CLI]
 +
** [https://docs.globus.org/cli/reference/ Globus CLI Reference]
 +
* [https://globus-sdk-python.readthedocs.io/en/stable/ '''Globus SDK for Python''']: for workflow automation and integration with your science gateways or third party software.
  
:*You will be redirected to the <nowiki>https://archive01.cac.cornell.edu/oath/authorize...</nowiki> page.
+
Note: The legacy ssh-based hosted CLI will be deprecated in the future. Please do not use it for new development. If you use the ssh-based CLI for current production, you will need to migrate to use new Globus CLI client or Globus SDK for Python soon.
:*Enter your CAC credentials.
 
:*When login is successful, you will be redirected back to Globus Online web GUI with the endpoint activated.
 
  
=== Administrative Information ===
+
== Make Your Computer Accessible on Globus ==
:* cac#archive01's default path is '''/export'''.
+
If you want to transfer data to or from your computer, your computer needs to be a Globus endpoint:
:* Each project with access to CAC Archival Storage has a shared directory (named the project) in which '''all project members have full read/write access'''.
 
:* Users can rename and move files and directories within their project directory on the endpoint. Globus Online added this feature recently.
 
  
==Advanced Topic - Automating transfers to the Archival Storage==
+
* [https://www.globus.org/globus-connect-personal Globus Connect Personal]: Install Globus Connect Personal to make your personal computer into a Globus endpoint.
:* Install Globus Connect Personal on the Linux/MacOS/Windows host you wish to archive by clicking on the "Get Globus Connect Personal" link on the Transfer Files screen on Globus.
+
* [https://www.globus.org/globus-connect-server Globus Connect Server]: If you have a multi-user '''''Linux''''' server, use Globus Connect Server to install a Globus endpoint accessible to all users on your server. Please read the [https://docs.globus.org/globus-connect-server/v5.4/#open-tcp-ports_section network requirements] before you start. (Basically, port 443 and ports 50000-51000 must be open to the Internet).
::[[File:Install_Globus_Connect_Personal.jpg]]
 
:* On the host you wish to archive, download and untar [[Media:archive_scripts.tar.gz]].
 
:* To enable running Globus Connect Personal as root, add
 
  
"-allow-root",
+
==Globus Documentation ==
 
+
:*[//globus.org/how-it-works How Globus works?]
::to globusconnectpersonal-2.0.3/gc.py (on line ~ 360):
+
:*[//docs.globus.org/how-to/get-started/ Globus Quickstart]: A guide for signing up a free Globus account and start transferring files.
<source lang="c">
+
:*[//docs.globus.org/how-to/share-files/ Share or Publish Your Data using Globus]
args = [os.path.basename(PDEATH_LAUNCH),
 
                GRIDFTP_SERVER,
 
                "-allow-root",
 
                "-i", "-always-send-markers",
 
                "-hostname", "127.0.0.1",
 
</source>
 
:* Copy root-bin directory from the archive_scripts.tar.gz to /root/bin. If you are archiving directories outside /home, modify the -restrict-path argument in /root/bin/gc_start.sh.
 
:* Generate a ssh key pair using the "ssh-keygen" command, leave private key in ~/.ssh, and upload the private key to Globus
 
::[[File:Upload_ssh_private_key.jpg]]
 
:* Make sure you can access Globus CLI like this:
 
ssh -i .ssh/<private key> <globus user name>@cli.globusonline.org
 
:* Modify archive.sh to match your Globus user name, private key file name, CAC project and archive directory.
 
:* On Globus, make sure your connection to cac#archive01 endpoint is activated.  
 
:* You should now be able to run archive.sh to upload your archive directory to CAC archive.  You can automate this script using cron.
 
  
==Advanced Topic: Syncing to Archival Storage==
+
== CAC Archival Storage specifics ==
See [[Syncing_to_Archival_Storage| here]] for how to sync data to Archival Storage
+
* In Globus web GUI's File Manager, select or search for the <b>[https://app.globus.org/file-manager?origin_id=dc94169c-30f3-4866-a305-f112408bcf8f&origin_path=%2F CAC Archive 2/DTN (cac#archive02)]</b> collection to access CAC archival storage.
 +
* Each project with access to CAC Archival Storage has a shared directory (named the project) in which '''all project members have full read/write access'''.
 +
* Users can rename and move files and directories within their project directory.

Latest revision as of 13:43, 23 March 2023

What is CAC Archival Storage?

  • CAC Archival Storage provides low-cost, high-performance option for storing research data.
  • CAC Archival Storage has a direct 100 Gbps connection to Internet2.
  • CAC Archival Storage is not mountable by running jobs, instead the user must transfer their data from the CAC Archival Storage to an accessible server using Globus.
  • Data in CAC Archival Storage is intended to be an additional copy of user data; CAC Archival Storage is not backed up or snapshotted.
  • All CAC resources are suitable for unregulated, non-confidential data (reference for details).
  • Globus users have easy access to add, delete, and share their data using any Globus endpoints.
  • Some of the Globus endpoints available include:
    • CAC user home directories (cac#home)
    • XSEDE sites:
      • Stampede (xsede#stampede)
      • Lonestar (xsede#lonestar4)
      • TACC Archival Storage (xsede#ranch).

First Step - Enable (or create) CAC project for Archival Storage and add users where appropriate

  • To use the CAC Archival Storage service, you must be a user of a CAC project where Archival Storage is enabled.
  • The project PI can add users and verify that Archival Storage is enabled at the Manage CAC project page.
  • Don't have a project? How to start a CAC project?.

Second Step - Log into Globus

CAC Archival Storage is accessible only through Globus. Log into Globus using these login instructions.

When logging into Globus,

  1. Cornell users should select Cornell University under Use your existing organizational login, or
  2. Weill Cornell Medicine users should select Weill Cornell Medical College,
GlobusLogin.png

and then click on the Continue button. You will get forwarded to the CUWebLogin (Cornell users) or WCM Web Login (Weill Cornell users) page. Login using your NetID or CWID and password.

Using Globus

Globus can be accessed using the following methods:

Note: The legacy ssh-based hosted CLI will be deprecated in the future. Please do not use it for new development. If you use the ssh-based CLI for current production, you will need to migrate to use new Globus CLI client or Globus SDK for Python soon.

Make Your Computer Accessible on Globus

If you want to transfer data to or from your computer, your computer needs to be a Globus endpoint:

  • Globus Connect Personal: Install Globus Connect Personal to make your personal computer into a Globus endpoint.
  • Globus Connect Server: If you have a multi-user Linux server, use Globus Connect Server to install a Globus endpoint accessible to all users on your server. Please read the network requirements before you start. (Basically, port 443 and ports 50000-51000 must be open to the Internet).

Globus Documentation

CAC Archival Storage specifics

  • In Globus web GUI's File Manager, select or search for the CAC Archive 2/DTN (cac#archive02) collection to access CAC archival storage.
  • Each project with access to CAC Archival Storage has a shared directory (named the project) in which all project members have full read/write access.
  • Users can rename and move files and directories within their project directory.