博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
OpenCV3的机器学习算法-K-means-使用Python
阅读量:6404 次
发布时间:2019-06-23

本文共 3137 字,大约阅读时间需要 10 分钟。

hot3.png

Goal

In this chapter, we will understand the concepts of K-Means Clustering, how it works etc.

Theory

We will deal this with an example which is commonly used.

T-shirt size problem

Consider a company, which is going to release a new model of T-shirt to market. Obviously they will have to manufacture models in different sizes to satisfy people of all sizes. So the company make a data of people's height and weight, and plot them on to a graph, as below:

tshirt.jpg

image

Company can't create t-shirts with all the sizes. Instead, they divide people to Small, Medium and Large, and manufacture only these 3 models which will fit into all the people. This grouping of people into three groups can be done by k-means clustering, and algorithm provides us best 3 sizes, which will satisfy all the people. And if it doesn't, company can divide people to more groups, may be five, and so on. Check image below :

tshirt_grouped.jpg

image

How does it work ?

This algorithm is an iterative process. We will explain it step-by-step with the help of images.

Consider a set of data as below ( You can consider it as t-shirt problem). We need to cluster this data into two groups.

testdata.jpg

image

Step : 1 - Algorithm randomly chooses two centroids, C1 and C2 (sometimes, any two data are taken as the centroids).

Step : 2 - It calculates the distance from each point to both centroids. If a test data is more closer to C1, then that data is labelled with '0'. If it is closer to C2, then labelled as '1' (If more centroids are there, labelled as '2','3' etc).

In our case, we will color all '0' labelled with red, and '1' labelled with blue. So we get following image after above operations.

initial_labelling.jpg

image

Step : 3 - Next we calculate the average of all blue points and red points separately and that will be our new centroids. That is C1 and C2 shift to newly calculated centroids. (Remember, the images shown are not true values and not to true scale, it is just for demonstration only).

And again, perform step 2 with new centroids and label data to '0' and '1'.

So we get result as below :

update_centroid.jpg

image

Now Step - 2 and Step - 3 are iterated until both centroids are converged to fixed points. *(Or it may be stopped depending on the criteria we provide, like maximum number of iterations, or a specific accuracy is reached etc.)* These points are such that sum of distances between test data and their corresponding centroids are minimum. Or simply, sum of distances between C1Red_Points and C2Blue_Points is minimum.

minimize[J=AllRed_Pointsdistance(C1,Red_Point)+AllBlue_Pointsdistance(C2,Blue_Point)]

Final result almost looks like below :

final_clusters.jpg

image

So this is just an intuitive understanding of K-Means Clustering. For more details and mathematical explanation, please read any standard machine learning textbooks or check links in additional resources. It is just a top layer of K-Means clustering. There are a lot of modifications to this algorithm like, how to choose the initial centroids, how to speed up the iteration process etc.

Additional Resources

  1. , Video lectures by Prof. Andrew Ng (Some of the images are taken from this)

Exercises

转载于:https://my.oschina.net/u/2306127/blog/626531

你可能感兴趣的文章
利用DelegatingHandler实现Web Api 的Api key校验
查看>>
jquery扩展插件,让demo元素也可以resize
查看>>
vue directive 常用指令
查看>>
【转载】线性判别分析(Linear Discriminant Analysis)(二)
查看>>
keep learning
查看>>
用mongols轻松打造websocket应用
查看>>
ccf--20131203--最大矩形
查看>>
iOS开发Drag and Drop简介
查看>>
自定义Remote验证(对博客园文章“Asp.net MVC验证哪些事(3)-- Remote验证及其改进(附源码)”自定义验证的改进)...
查看>>
47、序列化和反序列化
查看>>
可扩展性设计之数据切分
查看>>
图的存储结构(邻接矩阵)
查看>>
css3过渡效果
查看>>
一个IT人的金融梦(K8量化模型简介)
查看>>
[SDOI2009]HH去散步
查看>>
PHP实现各种经典算法
查看>>
JDK 自带工具试用(一)
查看>>
Linux简单的常用命令——纯手打(慢慢积累)
查看>>
【线段树区间合并】HDU1540-Tunnel Warfare
查看>>
端口扫描之王——nmap入门精讲
查看>>