99精品欧美,男女啪啪永久免费观看网站,成人免费视频无遮挡在线看

? ? ? ?聲明：代碼的運行環境為Python3。Python3與Python2在一些細節上會有所不同，希望廣大讀者注意。本博客以代碼為主，代碼中會有詳細的注釋。相關文章將會發布在我的個人博客專欄《Python從入門到深度學習》，歡迎大家關注~

? ? ? ?在K-Means算法中，聚類的類別個數需要提前指定，對于類別個數未知的數據集，K-Means算法和K-Means++算法將很難對其進行求解，所以需要一些能夠處理未知類別個數的算法來處理此類問題。Mean Shift算法，又稱作均值漂移算法，它跟K-Means算法一樣，都是基于聚類中心的聚類算法，不同的是，它不需要提前指定聚類中心的個數，聚類中心是通過在給定區域中樣本的均值來確定的，通過不斷更新聚類中心，直至聚類中心不再改變為止。

一、Mean Shift向量與核函數

1、Mean Shift向量

? ? ? ?對于給定的n維空間中的m個樣本點，對于其中的一個樣本X，其Mean Shift的向量為：

? ? ? ?其中，指的是一個半徑為h的高維球區域，定義為：

2、核函數

? ? ? ?通過上述方式求出的Mean Shift向量時存在問題的，即在區域內每一個對樣本X的貢獻是一樣的，然而實際上，每一個樣本對樣本X的貢獻是不一樣的，我們可以通過核函數對每一個樣本的貢獻進行度量。

? ? ? ?核函數的定義如下：

? ? ? ?設Z是輸入空間，H是特征空間，如果存在一個Z到H的映射：使得所有，函數滿足條件：，則稱為核函數，為映射函數。

? ? ? ?我們在Mean Shift算法中使用的是高斯核函數，這也是最常用的核函數之一，高斯核函數的表達式為：

? ? ? ?其中，h為帶寬，當帶寬一定時，樣本點之間的距離越近，核函數的值越大；當樣本點距離一定時，帶寬越大，核函數的值越小。

? ? ? ?下面我們使用Python代碼實現高斯核函數：

            
              import numpy as np
import math

def gs_kernel(dist, h):
    '''
    高斯核函數
    :param dist: 歐氏距離
    :param h: 帶寬
    :return: 返回高斯核函數的值
    '''
    m = np.shape(dist)[0]  # 樣本個數
    one = 1 / (h * math.sqrt(2 * math.pi))
    two = np.mat(np.zeros((m, 1)))
    for i in range(m):
        two[i, 0] = (-0.5 * dist[i] * dist[i].T) / (h * h)
        two[i, 0] = np.exp(two[i, 0])
    
    gs_val = one * two
    return gs_val

二、Mean Shift原理

? ? ? ?在Mean Shift中通過迭代的方式找到最終的聚類中心，即對每一個樣本點計算其漂移均值，以計算出來的漂移均值點作為新的起始點重復上述步驟，直到滿足終止條件，得到的最終的漂移均值點即為最終的聚類中心。

? ? ? ?Mean Shift算法實現過程如下：

            
              def mean_shift(points, h=2, MIN_DISTANCE=0.000001):
    '''
    訓練Mean Shift模型
    :param points: 特征點
    :param h: 帶寬
    :param MIN_DISTANCE: 最小誤差
    :return: 返回特征點、均值漂移點、類別
    '''
    mean_shift_points = np.mat(points)
    max_min_dist = 1
    iteration = 0  # 迭代的次數
    m = np.shape(mean_shift_points)[0]  # 樣本的個數
    need_shift = [True] * m  # 標記是否需要漂移

    # 計算均值漂移向量
    while max_min_dist > MIN_DISTANCE:
        max_min_dist = 0
        iteration += 1
        print("iteration : " + str(iteration))
        for i in range(0, m):
            if not need_shift[i]:  # 判斷每一個樣本點是否需要計算偏移均值
                continue
            point_new = mean_shift_points[i]
            point_new_start = point_new
            point_new  = shift_point(point_new, points, h)  # 對樣本點進行漂移計算
            dist = distince(point_new, point_new_start)  # 計算該點與漂移后的點之間的距離
            
            if dist > max_min_dist:
                max_min_dist = dist
            if dist < MIN_DISTANCE:
                need_shift[i] = False
            
            mean_shift_points[i] = point_new
    # 計算最終的類別
    lb = lb_points(mean_shift_points)  # 計算所屬的類別
    return np.mat(points), mean_shift_points, lb

? ? ? ?其中，shift_point()方法目的在于計算漂移量，lb_points()方法的目的在于計算最終所屬分類，distance()方法用于計算歐氏距離，三個方法的實現過程分別如下：

（1）shift_point()方法

            
              def shift_point(point, points, h):
    '''
    計算漂移向量
    :param point: 需要計算的點
    :param points: 所有的樣本點
    :param h: 帶寬
    :return: 返回漂移后的點
    '''
    points = np.mat(points)
    m = np.shape(points)[0]  # 樣本的個數
    # 計算距離
    point_dist = np.mat(np.zeros((m, 1)))
    for i in range(m):
        point_dist[i, 0] = distince(point, points[i])

    # 計算高斯核函數
    point_weights = gs_kernel(point_dist, h)

    # 計算分母
    all_sum = 0.0
    for i in range(m):
        all_sum += point_weights[i, 0]

    # 計算均值偏移
    point_shifted = point_weights.T * points / all_sum
    return point_shifted

（2）lb_points()

            
              def lb_points(mean_shift_points):
    '''
    計算所屬類別
    :param mean_shift_points: 漂移向量
    :return: 返回所屬的類別
    '''
    lb_list = []
    m, n = np.shape(mean_shift_points)
    index = 0
    index_dict = {}
    for i in range(m):
        item = []
        for j in range(n):
            item.append(str(("%5.2f" % mean_shift_points[i, j])))

        item_1 = "_".join(item)
        if item_1 not in index_dict:
            index_dict[item_1] = index
            index += 1

    for i in range(m):
        item = []
        for j in range(n):
            item.append(str(("%5.2f" % mean_shift_points[i, j])))

        item_1 = "_".join(item)
        lb_list.append(index_dict[item_1])
    return lb_list

（3）distince()方法

            
              def distince(pointA, pointB):
    '''
    計算歐氏距離
    :param pointA: A點坐標
    :param pointB: B點坐標
    :return: 返回得到的歐氏距離
    '''
    return math.sqrt((pointA - pointB) * (pointA - pointB).T)

三、Mean Shift算法舉例

1、數據集：數據集含有兩個特征，如下圖所示：

2、加載數據集

? ? ? ?我們此處使用如下方法加載數據集，也可使用其他的方式進行加載，此處可以參考我的另外一篇文章《Python兩種方式加載文件內容》。加載文件內容代碼如下：

            
              def load_data(path, feature_num=2):
    '''
    導入數據
    :param path: 路徑
    :param feature_num: 特行總數
    :return: 返回數據特征
    '''
    f = open(path)
    data = []
    for line in f.readlines():
        lines = line.strip().split("\t")
        data_tmp = []
        if len(lines) != feature_num:  # 判斷特征的個數是否正確，把不符合特征數的數據去除
            continue
        for i in range(feature_num):
            data_tmp.append(float(lines[i]))
        data.append(data_tmp)
    f.close()
    return data

3、保存聚類結果

? ? ? ?通過Mean Shift聚類后，我們使用一下方法進行聚類結果的保存

            
              def save_result(file_name, data):
    '''
    保存聚類結果
    :param file_name: 保存的文件名
    :param data: 需要保存的文件
    :return:
    '''
    f = open(file_name, "w")
    m, n = np.shape(data)
    for i in range(m):
        tmp = []
        for j in range(n):
            tmp.append(str(data[i, j]))
        f.write("\t".join(tmp) + "\n")
    f.close()

4、調用Mean Shift算法

            
              if __name__ == "__main__":
    data = load_data("F://data", 2)
    points, shift_points, cluster = mean_shift(data, 2)
    save_result("sub", np.mat(cluster))
    save_result("center", shift_points)

5、結果展示

? ? ? ?得到的聚類結果如下所示：

? ? ? ? 你們在此過程中遇到了什么問題，歡迎留言，讓我看看你們都遇到了哪些問題。

更多文章、技術交流、商務合作、聯系博主

微信掃碼或搜索：z360901061

微信掃一掃加我為好友

QQ號聯系： 360901061

您的支持是博主寫作最大的動力，如果您喜歡我的文章，感覺我的文章對您有幫助，請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧，狠狠點擊下面給點支持吧，站長非常感激您！手機微信長按不能支付解決辦法：請將微信支付二維碼保存到相冊，切換到微信，然后點擊微信右上角掃一掃功能，選擇支付二維碼完成支付。

【本文對您有幫助就好】元

2元

5元

10元

20元

自定義

亚洲免费在线-亚洲免费在线播放-亚洲免费在线观看-亚洲免费在线观看视频-亚洲免费在线看-亚洲免费在线视频

Python實現Mean Shift算法