I'm trying to run this lesson with the data files downloaded from FigShare. In the Redirection, lesson, the output of
grep -B1 -A2 NNNNNNNNNN SRR098026.fastq > bad_reads.txt
wc -l bad_reads.txt
returns 537 rather than the expected 802.
This is a problem because 537 is not a multiple of 4. This is happening because some of the reads with the string NNNNNNNNNN are non-contiguous in the file, so grep is inserting a -- line between groups of contiguous results. I think the lesson as written will mislead learners about how they can use grep, since it doesn't mention this behaviour (which I have replicated on multiple machines, so it's not just a quirk of one system).
Has anyone encountered this problem? What do you do about it?
I'm trying to run this lesson with the data files downloaded from FigShare. In the Redirection, lesson, the output of
returns
537rather than the expected 802.This is a problem because 537 is not a multiple of 4. This is happening because some of the reads with the string NNNNNNNNNN are non-contiguous in the file, so
grepis inserting a--line between groups of contiguous results. I think the lesson as written will mislead learners about how they can use grep, since it doesn't mention this behaviour (which I have replicated on multiple machines, so it's not just a quirk of one system).Has anyone encountered this problem? What do you do about it?