Using Globus CLI
This guide provides a step by step tutorial for scripting file transfers through Globus between the DCC and Duke Data Attic. The tutorial assumes you have Miniconda installed in your group space.
Once you have your conda environment installed and activated, you can proceed.
Installing pipx and Globus-CLI
# Install pipx
conda install conda-forge::pipx
# Install globus CLI
pipx install globus-cli
# Upgrade
pipx upgrade globus-cli
# globus-cli should be installed here: /Users/<userName>/.local/pipx/venvs/globus-cli
# Add this to your path
export PATH="$PATH:$HOME/.local/bin"
# Now when you run `which globus` you'll see /Users/<userName>/.local/bin/globus
# You can add this to your PATH permanently using
echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$HOME/.bashrc"
# If you're running zsh:or whatever shell you use (for me, it's zshrc)
echo 'export PATH="$HOME/.local/bin:$PATH"' >> "$HOME/.zshrc"
Setting up IDs and Testing Access
- Once you have the above setup, you can create environmental variables for your Globus IDs
- Note, I set up a conda environment named
globusEnv
Find Globus Endpoint IDs
# Find your IDs; say you want to find the Duke Compute Cluster ID. You can do this by going to Globus and searching for the collection and
# grabbing the UUID. Or, you can use the globus-cli:
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus endpoint search "duke compute cluster"
ID | Owner | Display Name
------------------------------------ | ------------------------------------------------------------ | ---------------------------------------------
1ad66c7c-4f60-11e8-900c-0a6d4e044368 | 0718c89a-8c42-4982-8779-3c34de602bcf@clients.auth.globus.org | Duke Compute Cluster (DCC) Data Transfer Node
# You can also search for collections that you own
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus endpoint search --filter-scope my-endpoints
ID | Owner | Display Name
------------------------------------ | -------------- | ---------------
82841786-2ffe-11ef-b81f-1dd816fe311b | rm145@duke.edu | CDSS-LGQFWT4QR4
# Also, Duke Box:
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus endpoint search "duke box"
ID | Owner | Display Name
------------------------------------ | --------------------------- | ----------------------------------------
45912e9f-49eb-48dc-a8b7-865ea7061488 | dukeuniversity@globusid.org | Duke Box Storage
# Duke Data Attic:
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus endpoint search "data attic"
ID | Owner | Display Name
------------------------------------ | ------------------------------------------------------------ | ---------------------------
d19669f3-8618-4282-a161-894b71efca41 | d0841cbe-79db-44de-8805-8fea5df324d6@clients.auth.globus.org | Duke Data Attic Collection
Setup Environment Variables for Globus Endpoints
# Now that you have the IDs, you can set environmental variables
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % export GB_DCC="1ad66c7c-4f60-11e8-900c-0a6d4e044368"
# Prompt access so that you can consent to Globus accessing that collection
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus ls "$GB_DCC"
The collection you are trying to access data on requires you to grant consent for the Globus CLI to access it.
# Globus will give you a command to run; do so and you'll be redirected to a webpage to consent; once you do that, you'll see:
You have successfully updated your CLI session.
Test CLI Access to your Shares
# Now, you can try and access your share
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus ls "$GB_DCC:/hpc/home/rm145/"
Desktop/
ondemand/
testDirectory/
testSize/
testTransfer/
Hello.ipynb
testText.txt
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus ls $GB_DCC:/hpc/home/rm145 #No quotes also works
# Also works for group shares
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus ls "$GB_DCC:/hpc/group/rescomp/rm145"
dir1/
dir2/
miniconda3/
testGlobus/
Miniconda3-latest-Linux-x86_64.sh
report.txt
testText.txt
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus ls $GB_DCC:/hpc/group/rescomp/rm145 # No quotes also works
Data Transfers
Single tranfers
- Let's try and transfer a single file from the DDA to /hpc/group/rescomp/rm145/testGlobus
# globus transfer [options] <from-endpoint>:<from-path> <to-endpoint>:<to-path>
# Remember:
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus ls $GB_DCC:/hpc/group/rescomp/rm145
dir1/
...
testText.txt
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus ls $GB_DDA:/rt.attic.165.rem-test/testTransfer
a.txt
...
z.txt
# Start transfer
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus transfer $GB_DDA:/rt.attic.165.rem-test/fileToMove.txt $GB_DCC:/hpc/group/rescomp/rm145/testGlobus/fileToMove_dest.txt
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 94187558-9bbc-11ef-82ce-9b523453efe6
# Check status
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus task show 94187558-9bbc-11ef-82ce-9b523453efe6
Label: None
Task ID: 94187558-9bbc-11ef-82ce-9b523453efe6
Is Paused: False
Type: TRANSFER
Directories: 0
Files: 1
Status: SUCCEEDED
Request Time: 2024-11-05T21:26:13+00:00
Faults: 0
Total Subtasks: 2
Subtasks Succeeded: 2
Subtasks Pending: 0
Subtasks Retrying: 0
Subtasks Failed: 0
Subtasks Canceled: 0
Subtasks Expired: 0
Subtasks with Skipped Errors: 0
Completion Time: 2024-11-05T21:26:16+00:00
Source Endpoint: Duke RC DevCeph24
Source Endpoint ID: 56c63dcf-1f83-4f46-bec6-54b35f9213ef
Destination Endpoint: Duke Compute Cluster (DCC) Data Transfer Node
Destination Endpoint ID: 1ad66c7c-4f60-11e8-900c-0a6d4e044368
Bytes Transferred: 0
Bytes Per Second: 0
# If you run into any issues, you can cancel the transfer:
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus task cancel 73f1f714-9bbb-11ef-8665-73df88a31e54
The task has been cancelled successfully.
# Let's try a recursive move to move multiple files in a directory:
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus transfer --recursive $GB_DDA:/rt.attic.165.rem-test/testTransfer $GB_DCC:/hpc/group/rescomp/rm145/testGlobus
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: aeec3c9c-9bbd-11ef-8665-73df88a31e54
# Check status
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus task show aeec3c9c-9bbd-11ef-8665-73df88a31e54
Label: None
Task ID: aeec3c9c-9bbd-11ef-8665-73df88a31e54
Is Paused: False
Type: TRANSFER
Directories: 1
Files: 26
Status: SUCCEEDED
Request Time: 2024-11-05T21:34:07+00:00
Faults: 0
Total Subtasks: 28
Subtasks Succeeded: 28
Subtasks Pending: 0
Subtasks Retrying: 0
Subtasks Failed: 0
Subtasks Canceled: 0
Subtasks Expired: 0
Subtasks with Skipped Errors: 0
Completion Time: 2024-11-05T21:34:11+00:00
Source Endpoint: Duke RC DevCeph24
Source Endpoint ID: 56c63dcf-1f83-4f46-bec6-54b35f9213ef
Destination Endpoint: Duke Compute Cluster (DCC) Data Transfer Node
Destination Endpoint ID: 1ad66c7c-4f60-11e8-900c-0a6d4e044368
Bytes Transferred: 0
Bytes Per Second: 0
# That moved all contents of dataAttic/testTransfer (a folder) to /hpc/group/rescomp/rm145/testGlobus WITHOUT recreating that folder
# in the destination (it just dumped all contents of testTransfer into .../testGlobus, so a-z.txt files). Will need to create a directory
# at destination to hold the files.
Batch Transfers
- Let's use a text file to direct a data transfer
- First, create a text file in whatever directory you're in
- I created a list.txt in ~
- I want to move one text file and one folder FROM Data Attic to DCC
- In the text file, you want to specify the path excluding the host
- Contents of text file:
# This is a sample batch file for Globus Data Transfers
# Comments and spaces are disregarded
# First, copy a file
# Specify the path excluding the host information
/rt.attic.165.rem-test/fileToMove.txt /hpc/group/rescomp/rm145/testGlobusBatch/fileToMove_dest.txt
# Now try a folder
--recursive /rt.attic.165.rem-test/testTransfer /hpc/group/rescomp/rm145/testGlobusBatch/testTransferBatch
- Once the text file is created, I used the following command:
# For transfers, once the batch file is created, this is the syntax:
# globus transfer --batch <file|-> [options] <from-endpoint> <to-endpoint>
# Specify the text file and then host information
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus transfer --batch list.txt $GB_DDA $GB_DCC
Message: The transfer has been accepted and a task has been created and queued for execution
Task ID: 6b68819c-9bc2-11ef-82cf-9b523453efe6
# Check status
(globusEnv) rm145@CDSS-LGQFWT4QR4 ~ % globus task show 6b68819c-9bc2-11ef-82cf-9b523453efe6
Label: None
Task ID: 6b68819c-9bc2-11ef-82cf-9b523453efe6
Is Paused: False
Type: TRANSFER
Directories: 1
Files: 27
Status: SUCCEEDED
Request Time: 2024-11-05T22:08:02+00:00
Faults: 0
Total Subtasks: 30
Subtasks Succeeded: 30
Subtasks Pending: 0
Subtasks Retrying: 0
Subtasks Failed: 0
Subtasks Canceled: 0
Subtasks Expired: 0
Subtasks with Skipped Errors: 0
Completion Time: 2024-11-05T22:08:06+00:00
Source Endpoint: Duke RC DevCeph24
Source Endpoint ID: 56c63dcf-1f83-4f46-bec6-54b35f9213ef
Destination Endpoint: Duke Compute Cluster (DCC) Data Transfer Node
Destination Endpoint ID: 1ad66c7c-4f60-11e8-900c-0a6d4e044368
Bytes Transferred: 0
Bytes Per Second: 0