We present an open implementation of the HyperLogLog cardinality estimation sketch for counting fixed-length substrings of DNA strings (k-mers). The HyperLogLog sketch implementation is in C++ with a Python interface, and is distributed as part of the khmer software package. khmer is freely available from urlhttps://github.com/dib-lab/khmer under a BSD License. The features presented here are included in version 1.4 and later.
Journal: BioRxiv
DOI: 10.1101/056846
Year: 2016