https://github.com/mirnylab/cooler/blob/c9c718fdebccbda41ad10c47f700853f79ee3cd3/cooler/cli/balance.py#L181
I was trying to steal this blacklist ingesting code for cooltools expected cli tool - here https://github.com/mirnylab/cooltools/blob/9294dae6dd19794e61bbb50773c1db04fb627398/cooltools/cli/compute_expected.py#L127
here are some examples of its behavior (deisred and undesired):
bed_content = "track=full-of-nonsense\nchr1\t9000000\t10000000\n"
ftmp = "black.tmp"
with open(ftmp,'w') as fp:
fp.write(bed_content)
# trying to read/sniff - like in cooler-balance cli:
blacklist = ftmp
import csv
with open(blacklist, 'rt') as f:
print( csv.Sniffer().has_header(f.read(1024)) )
yields True like it should (I guess)
bed_content = "chr1\t9000000\t10000000\nchr2\t9000000\t10000000\n" yields False - like it should!
However bed_content = "chr1\t9000000\t10000000" or with the newline bed_content = "chr1\t9000000\t10000000\n" - yields True - which is very much undesired ...
after reading has_header source code https://github.com/python/cpython/blob/607b1027fec7b4a1602aab7df57795fbcec1c51b/Lib/csv.py#L383 - it becomes apparent - they "sniff" if a csv has a header based on the several rows - i.e. they check delimiter patterns in several rows and then decide if the first row was a header or not . Thus when there is only 1 row - everything falls back to the default assumption - which is that the first raw is a header...
@nvictus what should we do ? make a special case for BED file with a single rows ?
https://github.com/mirnylab/cooler/blob/c9c718fdebccbda41ad10c47f700853f79ee3cd3/cooler/cli/balance.py#L181
I was trying to steal this blacklist ingesting code for cooltools expected cli tool - here https://github.com/mirnylab/cooltools/blob/9294dae6dd19794e61bbb50773c1db04fb627398/cooltools/cli/compute_expected.py#L127
here are some examples of its behavior (deisred and undesired):
yields
Truelike it should (I guess)bed_content = "chr1\t9000000\t10000000\nchr2\t9000000\t10000000\n"yieldsFalse- like it should!However
bed_content = "chr1\t9000000\t10000000"or with the newlinebed_content = "chr1\t9000000\t10000000\n"- yieldsTrue- which is very much undesired ...after reading
has_headersource code https://github.com/python/cpython/blob/607b1027fec7b4a1602aab7df57795fbcec1c51b/Lib/csv.py#L383 - it becomes apparent - they "sniff" if a csv has a header based on the several rows - i.e. they check delimiter patterns in several rows and then decide if the first row was a header or not . Thus when there is only 1 row - everything falls back to the default assumption - which is that the first raw is a header...@nvictus what should we do ? make a special case for BED file with a single rows ?