Bug report
Trying to download files from a GCP Storage bucket with underscores doesn't work and Nextflow throws an error: java.net.URISyntaxException: Illegal character in hostname where the illegal character is an underscore _
Steps to reproduce the problem
#!/usr/bin/env nextflow
import com.google.cloud.storage.contrib.nio.CloudStorageFileSystem
Path path = CloudStorageFileSystem.forBucket('tgs_ext_archive').getPath('file.txt')
String gcsString = "gs://" + path.bucket() + "/data" + path.toAbsolutePath();
tgs_root_chan = Channel.fromPath(gcsString)
process get_and_untar{
machineType 'g1-small'
container 'python:3'
input:
file mine from tgs_root_chan.collect()
script:
"""
ls
"""
}
Program output (immediate)
N E X T F L O W ~ version 20.01.0
Launching `nextflow/make_untar.nf` [festering_edison] - revision: 268ef619fb
gs://tgs_ext_archive/data/TGSDEV150729*.tar.bz2
[- ] process > get_and_untar -
java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/
Logs
Version: 20.01.0 build 5264
Created: 12-02-2020 10:14 UTC
System: Linux 4.15.0-1055-gcp
Runtime: Groovy 2.5.8 on OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~16.04-b08
Encoding: UTF-8 (UTF-8)
Process: 27985@tgs-controller [10.154.0.56]
CPUs: 2 - Mem: 7.3 GB (6.2 GB) - Swap: 0 (0)
Mar-12 09:51:14.298 [main] DEBUG nextflow.Session - Work-dir: gs://nextflow-text-bucket/ [ext2/ext3]
Mar-12 09:51:14.299 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/daudn/tgs_workflow/nextflow/bin
Mar-12 09:51:14.388 [main] DEBUG nextflow.Session - Observer factory: TowerFactory
Mar-12 09:51:14.391 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Mar-12 09:51:14.588 [main] DEBUG nextflow.Session - Session start invoked
Mar-12 09:51:14.830 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Mar-12 09:51:14.887 [main] DEBUG nextflow.Session - Workflow process names [dsl1]: get_and_untar
Mar-12 09:51:14.945 [PathVisitor-1] DEBUG nextflow.file.PathVisitor - files for syntax: glob; folder: /data/; pattern: TGSDEV150729*.tar.bz2; options: [:]
Mar-12 09:51:15.315 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: google-lifesciences
Mar-12 09:51:15.316 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'google-lifesciences'
Mar-12 09:51:15.328 [main] DEBUG nextflow.executor.Executor - [warm up] executor > google-lifesciences
Mar-12 09:51:15.350 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'google-lifesciences' > capacity: 1000; pollInterval: 10s; dumpInterval: 5m
Mar-12 09:51:15.404 [main] DEBUG n.c.g.l.GoogleLifeSciencesExecutor - Google Life Science config=GoogleLifeSciencesConfig(project:bioinformatics-playground, zones:[], regions:[europe-west2], preemptible:false, remoteBinDir:null, location:europe-west2, disableBinDir:false, bootDiskSize:20 GB, sshDaemon:false, sshImage:gcr.io/cloud-genomics-pipelines/tools, debugMode:null, copyImage:google/cloud-sdk:alpine)
Mar-12 09:51:15.952 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > get_and_untar -- maxForks: 2; blocking: false
Mar-12 09:51:16.057 [main] DEBUG nextflow.script.ScriptRunner - > Await termination
Mar-12 09:51:16.057 [main] DEBUG nextflow.Session - Session await
Mar-12 09:51:16.312 [PathVisitor-1] ERROR nextflow.Channel - java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/
java.lang.AssertionError: java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/
at com.google.cloud.storage.contrib.nio.CloudStoragePath.toUri(CloudStoragePath.java:356)
at com.google.cloud.storage.contrib.nio.CloudStoragePseudoDirectoryAttributes.<init>(CloudStoragePseudoDirectoryAttributes.java:31)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.readAttributes(CloudStorageFileSystemProvider.java:831)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
at java.nio.file.Files.walkFileTree(Files.java:2662)
at nextflow.file.FileHelper.visitFiles(FileHelper.groovy:742)
at nextflow.file.PathVisitor.pathImpl(PathVisitor.groovy:162)
at nextflow.file.PathVisitor.applyGlobPattern0(PathVisitor.groovy:130)
at nextflow.file.PathVisitor.apply(PathVisitor.groovy:68)
at nextflow.file.PathVisitor$_applyAsync_closure1.doCall(PathVisitor.groovy:77)
at nextflow.file.PathVisitor$_applyAsync_closure1.call(PathVisitor.groovy)
at groovy.lang.Closure.run(Closure.java:486)
at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:719)
at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:701)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.parseHostname(URI.java:3387)
at java.net.URI$Parser.parseServer(URI.java:3236)
at java.net.URI$Parser.parseAuthority(URI.java:3155)
at java.net.URI$Parser.parseHierarchical(URI.java:3097)
at java.net.URI$Parser.parse(URI.java:3053)
at java.net.URI.<init>(URI.java:673)
at java.net.URI.<init>(URI.java:774)
at com.google.cloud.storage.contrib.nio.CloudStoragePath.toUri(CloudStoragePath.java:354)
... 20 common frames omitted
Mar-12 09:51:16.361 [main] DEBUG nextflow.Session - Session await > all process finished
Mar-12 09:51:16.418 [PathVisitor-1] DEBUG nextflow.Session - Session aborted -- Cause: java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/
Mar-12 09:51:16.448 [main] DEBUG nextflow.Session - Session await > all barriers passed
Mar-12 09:51:16.470 [main] DEBUG nextflow.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=0; peakCpus=0; peakMemory=0; ]
Mar-12 09:51:16.634 [main] DEBUG nextflow.CacheDB - Closing CacheDB done
Mar-12 09:51:16.661 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
Environment
- Nextflow version: 20.01.0 build 5264
- Java version: 1.8
- Operating system: Linux
- Bash version: GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
Bug report
Trying to download files from a GCP Storage bucket with underscores doesn't work and Nextflow throws an error:
java.net.URISyntaxException: Illegal character in hostnamewhere the illegal character is an underscore_Steps to reproduce the problem
Program output (immediate)
Logs
Environment