I implemented kmeans1d, a Python library for performing k-means clustering on 1D data, based on the algorithm in (Xiaolin 1991), as presented in section 2.2 of (Grønlund et al., 2017).
Globally optimal k-means clustering is NP-hard for multi-dimensional data. LLoyd’s algorithm is a popular approach for finding a locally optimal solution. For 1-dimensional data, there are polynomial time algorithms.
kmeans1d contains an O(kn + n log n) dynamic programming algorithm for finding the globally optimal k clusters for n 1D data points. The code is written in C++—for faster execution than a pure Python implementation—and wrapped in Python.
The source code is available on GitHub: