{"id":25900,"date":"2019-07-04T19:36:06","date_gmt":"2019-07-04T23:36:06","guid":{"rendered":"https:\/\/www.dannyadam.com\/blog\/?p=25900"},"modified":"2022-12-06T09:55:26","modified_gmt":"2022-12-06T14:55:26","slug":"kmeans1d-globally-optimal-efficient-1d-k-means","status":"publish","type":"post","link":"https:\/\/www.dannyadam.com\/blog\/2019\/07\/kmeans1d-globally-optimal-efficient-1d-k-means\/","title":{"rendered":"kmeans1d: Globally Optimal Efficient 1D <em>k<\/em>&#8209;means Clustering"},"content":{"rendered":"\n<p>I implemented <em>kmeans1d<\/em>, a Python library for performing <em>k<\/em>-means clustering on 1D data, based on the algorithm from Xiaolin (1991), as presented by Gr\u00f8nlund et al. (2017, Section 2.2).<\/p>\n\n\n\n<p>Globally optimal <em>k<\/em>-means clustering is NP-hard for multi-dimensional data. Lloyd&#8217;s algorithm is a popular approach for finding a locally optimal solution. For 1-dimensional data, there are polynomial time algorithms.<\/p>\n\n\n\n<p><em>kmeans1d<\/em> contains an <em>O(kn + n log n)<\/em> dynamic programming algorithm for finding the globally optimal <em>k<\/em> clusters for <em>n<\/em> 1D data points. The code is written in C++\u2014for faster execution than a pure Python implementation\u2014and wrapped in Python.<\/p>\n\n\n\n<p>The source code is available on GitHub:<br><a href=\"https:\/\/github.com\/dstein64\/kmeans1d\">https:\/\/github.com\/dstein64\/kmeans1d<\/a><\/p>\n\n\n<p><!--more--><\/p>\n\n\n<p>The package is <a href=\"https:\/\/pypi.python.org\/pypi\/kmeans1d\">available<\/a>&nbsp;on PyPI, the Python Package Index. It can be installed with&nbsp;<em>pip<\/em>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ pip3 install kmeans1d<\/code><\/pre>\n\n\n\n<p>The snippet below includes an example of how to use the library.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-embed-handler wp-block-embed-embed-handler\"><div class=\"wp-block-embed__wrapper\">\n<style>.gist table { margin-bottom: 0; }<\/style><div style=\"tab-size: 8\" id=\"gist97065136\" class=\"gist\">\n    <div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n      <div class=\"gist-data\">\n        \n<div class=\"js-gist-file-update-container js-task-list-container\">\n      <div id=\"file-kmeans1d-py\" class=\"file my-2\">\n    \n    <div itemprop=\"text\"\n      class=\"Box-body p-0 blob-wrapper data type-python  \"\n      style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n      aria-label=\"kmeans1d.py content, created by dstein64 on 11:33PM on July 04, 2019.\"\n    >\n\n        \n<div class=\"js-check-hidden-unicode js-blob-code-container blob-code-content\">\n\n  <template class=\"js-file-alert-template\">\n  <div data-view-component=\"true\" class=\"flash flash-warn flash-full d-flex flex-items-center\">\n  <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n    <span>\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.\n      <a class=\"Link--inTextBlock\" href=\"https:\/\/github.co\/hiddenchars\" target=\"_blank\">Learn more about bidirectional Unicode characters<\/a>\n    <\/span>\n\n\n  <div data-view-component=\"true\" class=\"flash-action\">        <a href=\"{{ revealButtonHref }}\" data-view-component=\"true\" class=\"btn-sm btn\">    Show hidden characters\n<\/a>\n<\/div>\n<\/div><\/template>\n<template class=\"js-line-alert-template\">\n  <span aria-label=\"This line has hidden Unicode characters\" data-view-component=\"true\" class=\"line-alert tooltipped tooltipped-e\">\n    <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg>\n<\/span><\/template>\n\n  <table data-hpc class=\"highlight tab-size js-file-line-container\" data-tab-size=\"4\" data-paste-markdown-skip data-tagsearch-path=\"kmeans1d.py\">\n        <tr>\n          <td id=\"file-kmeans1d-py-L1\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"1\"><\/td>\n          <td id=\"file-kmeans1d-py-LC1\" class=\"blob-code blob-code-inner js-file-line\">import kmeans1d<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-kmeans1d-py-L2\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"2\"><\/td>\n          <td id=\"file-kmeans1d-py-LC2\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-kmeans1d-py-L3\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"3\"><\/td>\n          <td id=\"file-kmeans1d-py-LC3\" class=\"blob-code blob-code-inner js-file-line\">x = [4.0, 4.1, 4.2, -50, 200.2, 200.4, 200.9, 80, 100, 102]<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-kmeans1d-py-L4\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"4\"><\/td>\n          <td id=\"file-kmeans1d-py-LC4\" class=\"blob-code blob-code-inner js-file-line\">k = 4<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-kmeans1d-py-L5\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"5\"><\/td>\n          <td id=\"file-kmeans1d-py-LC5\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-kmeans1d-py-L6\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"6\"><\/td>\n          <td id=\"file-kmeans1d-py-LC6\" class=\"blob-code blob-code-inner js-file-line\">clusters, centroids = kmeans1d.cluster(x, k)<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-kmeans1d-py-L7\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"7\"><\/td>\n          <td id=\"file-kmeans1d-py-LC7\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-kmeans1d-py-L8\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"8\"><\/td>\n          <td id=\"file-kmeans1d-py-LC8\" class=\"blob-code blob-code-inner js-file-line\">print(clusters)   # [1, 1, 1, 0, 3, 3, 3, 2, 2, 2]<\/td>\n        <\/tr>\n        <tr>\n          <td id=\"file-kmeans1d-py-L9\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"9\"><\/td>\n          <td id=\"file-kmeans1d-py-LC9\" class=\"blob-code blob-code-inner js-file-line\">print(centroids)  # [-50.0, 4.1, 94.0, 200.5]<\/td>\n        <\/tr>\n  <\/table>\n<\/div>\n\n\n    <\/div>\n\n  <\/div>\n\n<\/div>\n\n      <\/div>\n      <div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/dstein64\/0e0d7f80ce6148881926b8fd6852072a\/raw\/3adf7cd795957f84c84a8df0ef1e756cae5dd420\/kmeans1d.py\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a>\n        <a href=\"https:\/\/gist.github.com\/dstein64\/0e0d7f80ce6148881926b8fd6852072a#file-kmeans1d-py\" class=\"Link--inTextBlock\">\n          kmeans1d.py\n        <\/a>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n    <\/div>\n<\/div>\n\n<\/div><\/figure>\n\n\n\n<p><strong><span style=\"text-decoration: underline;\">References<\/span><\/strong><\/p>\n\n\n\n<p>[1] Wu, Xiaolin. &#8220;Optimal Quantization by Matrix Searching.&#8221; Journal of Algorithms 12, no. 4 (December 1, 1991): 663<br><br>[2] Gr\u00f8nlund, Allan, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, and Mingzhou Song. &#8220;Fast Exact K-Means, k-Medians and Bregman Divergence Clustering in 1D.&#8221; ArXiv:1701.07204 [Cs], January 25, 2017. <a href=\"http:\/\/arxiv.org\/abs\/1701.07204\">http:\/\/arxiv.org\/abs\/1701.07204<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I implemented kmeans1d, a Python library for performing k-means clustering on 1D data, based on the algorithm from Xiaolin (1991), as presented by Gr\u00f8nlund et al. (2017, Section 2.2). Globally optimal k-means clustering is NP-hard for multi-dimensional data. Lloyd&#8217;s algorithm is a popular approach for finding a locally optimal solution. For 1-dimensional data, there are [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[77,50,46,36],"class_list":["post-25900","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-c","tag-k-means-clustering","tag-machine-learning","tag-python"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p1sCC6-6JK","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/posts\/25900","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/comments?post=25900"}],"version-history":[{"count":14,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/posts\/25900\/revisions"}],"predecessor-version":[{"id":26913,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/posts\/25900\/revisions\/26913"}],"wp:attachment":[{"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/media?parent=25900"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/categories?post=25900"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dannyadam.com\/blog\/wp-json\/wp\/v2\/tags?post=25900"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}