-
Notifications
You must be signed in to change notification settings - Fork 0
Syncing_Directories
This section entails:
- Using
gds-sync-download- For syncing a gds folder with a local path
- Using
gds-sync-upload- For syncing a local path with a gds folder
- Using
gds-create-download-script- Generate a bash script containing the presigned urls of files recursively under a directory.
- The bash script can be copied to any server to download the gds files.
- Using
gds-migrate- Copy data from one project context to another through a manifest list and TES task.
- Using
gds-migrate-v2- Copy data from one project context to another through a manifest list and TES task to a v2 project.
- Using
gds-migrate-to-aws- Copy data from gds to your aws account
auto-completion: ✅
Sync a gds folder with a local directory using the temporary aws creds in a given gds folder. This function requires admin privileges in the source project.
Options:
- --gds-path: Path to the gds folder
- --download-path: Path to your local directory
- --write-script-path: Path to output file containing a bash script to run the command
Requirements:
- curl
- jq
- python3
- aws
Environment vars:
- ICA_BASE_URL
- ICA_ACCESS_TOKEN
- You will need to first run
ica-context-switcherto get this variable into your environment
- You will need to first run
Extra info:
-
You can also use any of the aws s3 sync parameters to add to the command list, for example:
gds-sync-download --gds-path gds://volume-name/path-to-folder/ --exclude='*' --include='*.fastq.gz'will download only fastq files from that folder.
-
If you are unsure on what files will be downloaded, use the
--dryrunparameter. This will inform you of which files will be downloaded to your local file system. -
Unlike rsync, trailing slashes on the
--gds-pathand--download-pathdo not matter. One can assume that a trailing slash exists on both parameters. This means that the contents inside the--gds-pathparameter are downloaded to the contents inside--download-path -
Despite this command being a 'download' command, you will need an 'admin' token for this command.
- aws s3 sync requires the PutObject policy on the s3 side, regardless of the direction of the sync.
-
-
Use
--write-script-pathif thegds-sync-downloadinstallation is not in the same terminal as the command you wish to execute from.- A usecase may be as follows:
-
gds-sync-downloadis installed on your local computer but the location you wish to run the command is on an ec2 instance. - You may run
gds-sync-downloadwith the--write-script-pathset torun-download.sh. - You may then upload the
run-download.shscript to the ec2 instance and launch the script from there.
-
- A usecase may be as follows:

auto-completion: ✅
Sync a local directory with a gds folder using the temporary aws creds in a given gds folder.
This function requires admin privileges in the destination project.
Options:
- --src-path: Path to your local directory
- --gds-path: Path to the gds folder
- --write-script-path: Path to output file containing a bash script to run the command
Requirements:
- curl
- jq
- python3
- aws
Environment vars:
- ICA_BASE_URL
- ICA_ACCESS_TOKEN
- You will need to first run
ica-context-switcherto get this variable into your environment
- You will need to first run
Extras:
See extras in
gds-sync-download
auto-completion: ✅
Create a script at <output_prefix>.sh that downloads all files in gds path.
Options:
- --gds-path: Path to the gds folder (Required)
- --output-prefix: Output file prefix (Required)
Requirements:
- jq
- python3
Environment vars:
- ICA_BASE_URL
- ICA_ACCESS_TOKEN
- You will need to first run
ica-context-switcherto get this variable into your environment
- You will need to first run
Extras:
The output file generated by this script is a bash script that uses base64 encoding to contains the following information for all files:
- The presigned url (which expires in one week)
- The output path of the file (relative to the output folder)
- The e-tag
- The file size
When files are downloaded through wget the e-tag and filesizes are calculated on the local file and compared with the value for that given file provided.
No environment variables are needed to run the output script (just jq and python3 binaries).
To prove this, in the example GIF below, the output script is first copied from the user's local directory to a fresh ec2-instance and
executed on the ec2-instance (that does not have ica or ica-ica-lazy installed).

auto-completion: ✅
Copy data from one project to another project.
You can also use this command to copy data within a project.
Options:
- --src-path: path to gds source directory
- --src-project: name of the source project
- --dest-path: path to gds dest directory
- --dest-project: name of the destination project
- --rsync-args: List of rsync args
- --stream: Stream the input files rather than download into the task
Requirements:
- jq
- python3
Environment vars:
- ICA_BASE_URL
Extras:
- You will need at least read-only permissions in the source project and have registered an access token with
ica-add-access-token. - You will need at least admin permissions in the destination project and have registered an access token with
ica-add-access-token. - rsync args are in comma separated values and should be just one string, i.e
--rsync-args "--include=*/,--include=*.fastq.gz,--exclude=*".- Quote the entire value but there is no need to add quotes or backlashes around the asterisks.
- --stream is most efficient when the output directory size out is expected to be much smaller than the input directory.
- Within a project, you may choose to use this command over
ica folders copyas gds-migrate has the added benefit of selecting certain files with the --rsync-args command

auto-completion: ✅
Copy data from a project on v1 to a project on v2.
Options:
- --src-path: path to gds source directory
- --src-project: name of the source project
- --dest-path: path to gds dest directory
- --dest-project: name of the destination project
- --rsync-args: List of rsync args
- --stream: Stream the input files rather than download into the task
Requirements:
- jq
- python3
Environment vars:
- ICA_BASE_URL
- ICAV2_BASE_URL (defaults to ica.illumina.com)
- ICAV2_ACCESS_TOKEN
Extras:
- You will need at least read-only permissions in the source project and have registered an access token with
ica-add-access-token. - You can use the following command to extract your access token from your
.icav2directory:export ICAV2_ACCESS_TOKEN="$(yq eval '.access-token' ~/.icav2/.session.ica.yaml)" - You must have write access to the v2 destination project.
- rsync args are in comma separated values and should be just one string, i.e
--rsync-args "--include=*/,--include=*.fastq.gz,--exclude=*".- Quote the entire value but there is no need to add quotes or backlashes around the asterisks.
-
--streamis most efficient when the output directory size out is expected to be much smaller than the input directory.

autocompletion ✅
Options:
- --gds-path: path to gds source directory
- --s3-path: path to gds dest directory
- --stream: Use stream mode for inputs, download is default ... Additional arguments are parsed to s3-sync
Requirements:
- aws
- aws-sso-creds
- jq (v1.5+)
- python3 (v3.4+)
Environment:
- ICA_BASE_URL
- ICA_ACCESS_TOKEN
- AWS_PROFILE
- AWS_REGION
Extras:
-
You can also use any of the aws s3 sync parameters to add to the command list, for example gds-migrate-to-aws --gds-path gds://volume-name/path-to-folder/ --s3-path s3://temp-bucket/folder/ --exclude='' --include='.fastq.gz' will download only fastq files from that gds folder.
Unlike rsync, trailing slashes on the --gds-path and --s3-path do not matter. One can assume that a trailing slash exists on both parameters. This means that the contents inside the --gds-path parameter are downloaded to the contents inside --s3-path
You should use the --stream option if the output will be relatively small compared to the input. aws-sso-creds can be downloaded from the releases page at https://github.com/jaxxstorm/aws-sso-creds
A token with at least read-only scope must be registered for the source path.
Example: gds-migrate-to-aws --gds-path gds://volume-name/path-to-folder/ --s3-path s3://temp-bucket/folder/ --exclude='' --include='.fastq.gz'
Head to the Workflow Handling page for some lazy scripts on handling workflows with ica.