csv sniffer fails, when given only a single line of BED-like input

https://github.com/mirnylab/cooler/blob/c9c718fdebccbda41ad10c47f700853f79ee3cd3/cooler/cli/balance.py#L181

I was trying to steal this blacklist ingesting code for cooltools expected cli tool - here https://github.com/mirnylab/cooltools/blob/9294dae6dd19794e61bbb50773c1db04fb627398/cooltools/cli/compute_expected.py#L127

here are some examples of its behavior (deisred and undesired):
```python
bed_content = "track=full-of-nonsense\nchr1\t9000000\t10000000\n"
ftmp = "black.tmp"
with open(ftmp,'w') as fp:
    fp.write(bed_content)

# trying to read/sniff - like in cooler-balance cli: 
blacklist = ftmp
import csv
with open(blacklist, 'rt') as f:
    print( csv.Sniffer().has_header(f.read(1024)) )
```
yields `True` like it should (I guess)

`bed_content = "chr1\t9000000\t10000000\nchr2\t9000000\t10000000\n"` yields `False` - like it should!

However `bed_content = "chr1\t9000000\t10000000"` or with the newline `bed_content = "chr1\t9000000\t10000000\n"` - yields `True` - which is very much undesired ...


after reading `has_header` source code https://github.com/python/cpython/blob/607b1027fec7b4a1602aab7df57795fbcec1c51b/Lib/csv.py#L383 - it becomes apparent - they "sniff" if a csv has a header based on the several rows - i.e. they check delimiter patterns in several rows and then decide if the first row was a header or not . Thus when there is only 1 row - everything falls back to the default assumption - which is that the first raw is a header...

@nvictus what should we do ? make a special case for BED file with a single rows ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv sniffer fails, when given only a single line of BED-like input #196

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

csv sniffer fails, when given only a single line of BED-like input #196

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions