I implemented *kmeans1d*, a Python library for performing *k*-means clustering on 1D data, based on the algorithm in (Xiaolin 1991), as presented in section 2.2 of (Grønlund et al., 2017).

Globally optimal *k*-means clustering is NP-hard for multi-dimensional data. LLoyd’s algorithm is a popular approach for finding a locally optimal solution. For 1-dimensional data, there are polynomial time algorithms.

*kmeans1d* contains an *O(kn + n log n)* dynamic programming algorithm for finding the globally optimal *k* clusters for *n* 1D data points. The code is written in C++—for faster execution than a pure Python implementation—and wrapped in Python.

The source code is available on GitHub:

https://github.com/dstein64/kmeans1d

The package is available on PyPI, the Python Package Index. It can be installed with *pip*.

`$ pip3 install kmeans1d`

The snippet below includes an example of how to use the library.

**References**

[1] Wu, Xiaolin. “Optimal Quantization by Matrix Searching.” Journal of Algorithms 12, no. 4 (December 1, 1991): 663

[2] Grønlund, Allan, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, and Mingzhou Song. “Fast Exact K-Means, k-Medians and Bregman Divergence Clustering in 1D.” ArXiv:1701.07204 [Cs], January 25, 2017. http://arxiv.org/abs/1701.07204.