NO.PZ2022120201000009
问题如下:
a. What does K stand for in K-means clustering?
b. Explain the steps in using the K-means clustering algorithm.
c. In practice, the algorithm is often carried out with several different initial values for the centroids. How would you choose between clusters that result from different initial choices for the centroids?
解释:
a. K is the number of centroids, or equivalently, the number of clusters. This is a parameter specified a priori before the data points are assigned to the clusters.
b. 1. Specify the number of centroids, K and choose a distance measure (e.g., the Euclidean or Manhattan distance).
2. Scale the features using either standardization or normalization.
3. Select K points at random from the training data to be the centroids
4. Allocate each data point to its nearest centroid.
5. Given the points allocated to each centroid, redetermine the appropriate location of the centroids.
6. If the centroids are in a different place to their locations in the previous iteration, then repeat step 4. If the positions of the centroids have not changed, then stop.
c. You could select the centroids where the total inertia was the lowest, as this would represent the choice of centroid positions that best fitted the feature data.
老师,没太明白b的1和3有何区别