Context
Surfaced in Copilot's 2026-04-16 review of #531.
kg_microbe/query_utils/duckdb_loader.py sets lineterminator="\n" on the pandas reader and strips \r from the header line — but not from the rest of the file. On a file with CRLF (\r\n) line endings this leaves a trailing \r in the last field of every data row, because \r is no longer treated as part of the newline.
Effects:
- string values are silently altered (
"foo" becomes "foo\r")
- equality filtering and indexing break in ways that are hard to spot — most viewers render the
\r invisibly
- downstream joins on those columns silently drop rows
Suggested fix
Either:
- drop the custom
lineterminator and let pandas handle CRLF normally, or
- normalize the file contents on load (strip
\r from every line, not only the header).
Option 1 is the simpler fix unless there's a concrete reason the custom terminator was introduced.
File involved
kg_microbe/query_utils/duckdb_loader.py
References
- PR #531
- Copilot review at commit
1de973d, 2026-04-16T23:15Z
Context
Surfaced in Copilot's 2026-04-16 review of #531.
kg_microbe/query_utils/duckdb_loader.pysetslineterminator="\n"on the pandas reader and strips\rfrom the header line — but not from the rest of the file. On a file with CRLF (\r\n) line endings this leaves a trailing\rin the last field of every data row, because\ris no longer treated as part of the newline.Effects:
"foo"becomes"foo\r")\rinvisiblySuggested fix
Either:
lineterminatorand let pandas handle CRLF normally, or\rfrom every line, not only the header).Option 1 is the simpler fix unless there's a concrete reason the custom terminator was introduced.
File involved
kg_microbe/query_utils/duckdb_loader.pyReferences
1de973d, 2026-04-16T23:15Z