Table of contents

K-means

algorithm

%3 cluster_42be5812_13f0_41f0_98ab_4eaa7f816933 K-means cluster_c938024b_383d_4944_8eac_805a4fbd374d How to choose K _bd7fd142_0b25_44aa_bfbf_03648691b65e Stop using the elbow criterion for k-means and how to choose the number of clusters instead _479d2dc5_18d7_4e62_8178_6a1bfe686423 Steps _64e316f7_c57f_492e_964f_c306f34d1cf0 Machine Learning _ebd7f04c_fcc3_41f0_bdfe_ba2f2574233a Non-supervised _ebd7f04c_fcc3_41f0_bdfe_ba2f2574233a->_64e316f7_c57f_492e_964f_c306f34d1cf0 __0:cluster_42be5812_13f0_41f0_98ab_4eaa7f816933->_64e316f7_c57f_492e_964f_c306f34d1cf0 __1:cluster_42be5812_13f0_41f0_98ab_4eaa7f816933->_ebd7f04c_fcc3_41f0_bdfe_ba2f2574233a

A simple, intuitive, non-supervised, partitional machine learning technique. Takes a list of elements and splits them in multiple clusters.

The main parameter is the K number of clusters to be generated. The distance measure could be considered another one.

Steps

  • From N observations, choose K of them. These will be the seeds. Each seed identifies a cluster.

  • Assign the observation $x_{i}$ to the cluster $C_{t}$ when the distace between the observation $x_{i}$ and the seed $C_{t}$ is the lowest one among all the seeds.

  • We calculate the new centroids (the elements closest to the geometrical center of the cluster) from the current clusters.

  • We calculate the improvement (reduction in distance from the observations to the center) that will be produced if we assign a new observation to a cluster it's not assigned to.

  • We make the change that yields a better improvement.

  • Repeat 3,4 & 5 until there's no changes that generate a significant improvement.