All notable changes to the khmer project will be documented in this file. See keepachangelog for more info.
The khmer project's command line scripts adhere to Semantic Versioning. The Python and C++ APIs are not yet under semantic versioning, but will be in future versions of khmer.
- Cython wrapper for liboxli.
- Cython containers for all liboxli classes.
- Header install for liboxli.
- New storage class using a Counting Quotient Filter with improved cache locality over bloom filters.
- New variants of the sequence bulk loading method with a "banding" mode and a
"mask" mode. In "banding" mode, only k-mers whose hashed values fall within a
specified range are counted. In "mask" mode, only k-mers not already pressent
in the specified mask are counted.
consume_seqfile_bandingconsume_seqfile_with_maskconsume_seqfile_banding_with_mask
- Non-ACTG handling significantly changed so that only bulk-loading functions "clean" sequences of non-DNA characters. See #1590 for details.
- Split CPython wrapper file into per-class files under
src/khmerandinclude/khmer. - Moved liboxli headers to include/oxli and implementations to src/oxli.
- Removed all CPython wrappers except ReadParser and the standalone functions.
- Dropped support for Python 2.
- Changed to absolute imports.
- Some methods on LabelHash and Hashgraph have been changed to properties, or generators where appropriate.
- All constructors have been removed from khmer/init.py.
- GraphLabels does not inherit from Hashgraph.
trim-low-abund.pydoesn't error out when given multiple files with identical basenames
- Document for submission to the Journal of Open Source Software.
- Several typos and outdated content in the documentation.
- New
--no-reformatoption forinterleave-reads.pyscript disables default read name correction behavior. - New
HashSetdata structure for managing collections of k-mer hashes and tags. - khmer package version now included in
.infofiles. - New
-o|--outfileoption forfilter-abund-single.pyscript. - New sandbox script
extract-compact-dbg.pyfor computing a compact de Bruijn graph from sequence data. - New
--quietflag to several scripts, silencing diagnostic messages in terminal output. - Support for human-friendly memory requests (2G instead of 2000000000 or 2e9).
- Support for variable-coverage trimming in the
filter-abund-single.pyscript. - Several simple examples of the Python API and the C++ API in
examples/python-apiandexamples/c++-api, respectively. - New
assemble_linear_pathfunction for baiting unambiguous contigs with a seed k-mer from a hashtable. - Support for assembling directly from k-mer graphs, and a new JunctionCountAssembler class.
- Add --info flag for obtaining citation information.
- Added Counttable and Nodetable to support non-reversible hashing functionality and k > 32.
- Add a new storage class using half a byte per entry. Exposed as SmallCounttable and SmallCountgraph.
- Added
cleaned_seqattribute tokhmer.Readclass which provides a cleaned version of the sequence of each read. - Added --summary-info to trim-low-abund.py to record run information in a file.
Nodetable,CounttableandSmallCounttableuse murmur hash 3 as hash function. This means they support kmers longer than 32 bases but means the hashes are not reversible.
- Suppress display of -x and -N command line options in script help messages.
- Switch from nose to py.test as the testing framework.
- Switch from internally managed Jenkins setup to Travis CI for continuous integration testing.
- Renamed core data structures: CountingHash --> Countgraph, Hashbits --> Nodegraph.
- Replaced the IParser and FastxParser classes with a single ReadParser class. Different input formats are supported by templating ReadParser with a reader class.
- Renamed
consume_fastaand related functions toconsume_seqfile, with support for reading sequences from additional formats pending. - Changed Sphinx documentation theme to "guzzle".
- Bug in compressed(gzip) streaming output from scripts
- The hashbits
update_fromfunction to correctly track occupied bins for calculating FPR. - Bug in the
filter-abund.pyscript when--gzipand-oflags are used simultaneously. - Bug in the hashtable
get_kmersfunction based on incorrect usage of thesubstrfunction. - Bug in
broken_paired_readerrelated to dropping short reads whenrequire_pairedis set. - Bug related to handling lowercase [acgtn] characters in input data.
- Bug in
load-graph.pythat calculated required graph space incorrectly. - Fix loading of empty partion map files
Previous to the khmer 2.1 release, all changes were documented in a file named
ChangeLog. This file is now at legacy/ChangeLog for posterity.