Skip to content

Nextflow doesn't allow underscores in GCP bucket name #1527

@daudn

Description

@daudn

Bug report

Trying to download files from a GCP Storage bucket with underscores doesn't work and Nextflow throws an error: java.net.URISyntaxException: Illegal character in hostname where the illegal character is an underscore _

Steps to reproduce the problem

#!/usr/bin/env nextflow
import com.google.cloud.storage.contrib.nio.CloudStorageFileSystem

Path path = CloudStorageFileSystem.forBucket('tgs_ext_archive').getPath('file.txt')

String gcsString = "gs://" + path.bucket() + "/data" + path.toAbsolutePath();

tgs_root_chan = Channel.fromPath(gcsString)

process get_and_untar{
    machineType 'g1-small'
    container 'python:3'

    input:
    file mine from tgs_root_chan.collect()

    script:
    """
    ls
    """
}

Program output (immediate)

N E X T F L O W  ~  version 20.01.0
Launching `nextflow/make_untar.nf` [festering_edison] - revision: 268ef619fb
gs://tgs_ext_archive/data/TGSDEV150729*.tar.bz2
[-        ] process > get_and_untar -
java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/

Logs

 Version: 20.01.0 build 5264
  Created: 12-02-2020 10:14 UTC
  System: Linux 4.15.0-1055-gcp
  Runtime: Groovy 2.5.8 on OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~16.04-b08
  Encoding: UTF-8 (UTF-8)
  Process: 27985@tgs-controller [10.154.0.56]
  CPUs: 2 - Mem: 7.3 GB (6.2 GB) - Swap: 0 (0)
Mar-12 09:51:14.298 [main] DEBUG nextflow.Session - Work-dir: gs://nextflow-text-bucket/ [ext2/ext3]
Mar-12 09:51:14.299 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/daudn/tgs_workflow/nextflow/bin
Mar-12 09:51:14.388 [main] DEBUG nextflow.Session - Observer factory: TowerFactory
Mar-12 09:51:14.391 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Mar-12 09:51:14.588 [main] DEBUG nextflow.Session - Session start invoked
Mar-12 09:51:14.830 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Mar-12 09:51:14.887 [main] DEBUG nextflow.Session - Workflow process names [dsl1]: get_and_untar
Mar-12 09:51:14.945 [PathVisitor-1] DEBUG nextflow.file.PathVisitor - files for syntax: glob; folder: /data/; pattern: TGSDEV150729*.tar.bz2; options: [:]
Mar-12 09:51:15.315 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: google-lifesciences
Mar-12 09:51:15.316 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'google-lifesciences'
Mar-12 09:51:15.328 [main] DEBUG nextflow.executor.Executor - [warm up] executor > google-lifesciences
Mar-12 09:51:15.350 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'google-lifesciences' > capacity: 1000; pollInterval: 10s; dumpInterval: 5m
Mar-12 09:51:15.404 [main] DEBUG n.c.g.l.GoogleLifeSciencesExecutor - Google Life Science config=GoogleLifeSciencesConfig(project:bioinformatics-playground, zones:[], regions:[europe-west2], preemptible:false, remoteBinDir:null, location:europe-west2, disableBinDir:false, bootDiskSize:20 GB, sshDaemon:false, sshImage:gcr.io/cloud-genomics-pipelines/tools, debugMode:null, copyImage:google/cloud-sdk:alpine)
Mar-12 09:51:15.952 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > get_and_untar -- maxForks: 2; blocking: false
Mar-12 09:51:16.057 [main] DEBUG nextflow.script.ScriptRunner - > Await termination
Mar-12 09:51:16.057 [main] DEBUG nextflow.Session - Session await
Mar-12 09:51:16.312 [PathVisitor-1] ERROR nextflow.Channel - java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/
java.lang.AssertionError: java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/
        at com.google.cloud.storage.contrib.nio.CloudStoragePath.toUri(CloudStoragePath.java:356)
        at com.google.cloud.storage.contrib.nio.CloudStoragePseudoDirectoryAttributes.<init>(CloudStoragePseudoDirectoryAttributes.java:31)
        at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.readAttributes(CloudStorageFileSystemProvider.java:831)
        at java.nio.file.Files.readAttributes(Files.java:1737)
        at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
        at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
        at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
        at java.nio.file.Files.walkFileTree(Files.java:2662)
        at nextflow.file.FileHelper.visitFiles(FileHelper.groovy:742)
        at nextflow.file.PathVisitor.pathImpl(PathVisitor.groovy:162)
        at nextflow.file.PathVisitor.applyGlobPattern0(PathVisitor.groovy:130)
        at nextflow.file.PathVisitor.apply(PathVisitor.groovy:68)
        at nextflow.file.PathVisitor$_applyAsync_closure1.doCall(PathVisitor.groovy:77)
        at nextflow.file.PathVisitor$_applyAsync_closure1.call(PathVisitor.groovy)
        at groovy.lang.Closure.run(Closure.java:486)
        at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:719)
        at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:701)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/
        at java.net.URI$Parser.fail(URI.java:2848)
        at java.net.URI$Parser.parseHostname(URI.java:3387)
        at java.net.URI$Parser.parseServer(URI.java:3236)
        at java.net.URI$Parser.parseAuthority(URI.java:3155)
        at java.net.URI$Parser.parseHierarchical(URI.java:3097)
        at java.net.URI$Parser.parse(URI.java:3053)
        at java.net.URI.<init>(URI.java:673)
        at java.net.URI.<init>(URI.java:774)
        at com.google.cloud.storage.contrib.nio.CloudStoragePath.toUri(CloudStoragePath.java:354)
        ... 20 common frames omitted
Mar-12 09:51:16.361 [main] DEBUG nextflow.Session - Session await > all process finished
Mar-12 09:51:16.418 [PathVisitor-1] DEBUG nextflow.Session - Session aborted -- Cause: java.net.URISyntaxException: Illegal character in hostname at index 8: gs://tgs_ext_archive/data/
Mar-12 09:51:16.448 [main] DEBUG nextflow.Session - Session await > all barriers passed
Mar-12 09:51:16.470 [main] DEBUG nextflow.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=0; peakCpus=0; peakMemory=0; ]
Mar-12 09:51:16.634 [main] DEBUG nextflow.CacheDB - Closing CacheDB done
Mar-12 09:51:16.661 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Environment

  • Nextflow version: 20.01.0 build 5264
  • Java version: 1.8
  • Operating system: Linux
  • Bash version: GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions