機器學習(周志華) 西瓜書 第十一章課后習題11.1—— Python實現
-
實驗題目
試編程實現 Relif 算法,并考察其在西瓜數據集 3.0 上運行結果
-
實驗原理
Relif算法原理
Relif算法目的
-
實驗過程
數據集獲取
將西瓜數據集3.0保存為data_3.txt
編號,色澤,根蒂,敲聲,紋理,臍部,觸感,密度,含糖率,好瓜
1,青綠,蜷縮,濁響,清晰,凹陷,硬滑,0.697,0.46,是
2,烏黑,蜷縮,沉悶,清晰,凹陷,硬滑,0.774,0.376,是
3,烏黑,蜷縮,濁響,清晰,凹陷,硬滑,0.634,0.264,是
4,青綠,蜷縮,沉悶,清晰,凹陷,硬滑,0.608,0.318,是
5,淺白,蜷縮,濁響,清晰,凹陷,硬滑,0.556,0.215,是
6,青綠,稍蜷,濁響,清晰,稍凹,軟粘,0.403,0.237,是
7,烏黑,稍蜷,濁響,稍糊,稍凹,軟粘,0.481,0.149,是
8,烏黑,稍蜷,濁響,清晰,稍凹,硬滑,0.437,0.211,是
9,烏黑,稍蜷,沉悶,稍糊,稍凹,硬滑,0.666,0.091,否
10,青綠,硬挺,清脆,清晰,平坦,軟粘,0.243,0.267,否
11,淺白,硬挺,清脆,模糊,平坦,硬滑,0.245,0.057,否
12,淺白,蜷縮,濁響,模糊,平坦,軟粘,0.343,0.099,否
13,青綠,稍蜷,濁響,稍糊,凹陷,硬滑,0.639,0.161,否
14,淺白,稍蜷,沉悶,稍糊,凹陷,硬滑,0.657,0.198,否
15,烏黑,稍蜷,濁響,清晰,稍凹,軟粘,0.36,0.37,否
16,淺白,蜷縮,濁響,模糊,平坦,硬滑,0.593,0.042,否
17,青綠,蜷縮,沉悶,稍糊,稍凹,硬滑,0.719,0.103,否
算法實現
定義相關變量,例如離散屬性及其取值
讀取數據函數
處理數據函數,將連續屬性規范化處理
計算兩個樣本向量的歐式距離
計算兩個樣本在屬性j上的diff值
尋找輸入樣本向量在數據集中的猜中近鄰
尋找輸入樣本向量在數據集中的猜錯近鄰
基于Relif算法的特征選擇函數
main函數,調用上述函數,并按順序輸出屬性及其對應分量值
-
實驗結果
-
程序清單:
import math
import random as rd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
D_keys = {
'色澤': ['青綠', '烏黑', '淺白'],
'根蒂': ['蜷縮', '硬挺', '稍蜷'],
'敲聲': ['清脆', '沉悶', '濁響'],
'紋理': ['稍糊', '模糊', '清晰'],
'臍部': ['凹陷', '稍凹', '平坦'],
'觸感': ['軟粘', '硬滑'],
'好瓜': ['否', '是'],
}
class_name = '好瓜'
names = ['色澤', '根蒂', '敲聲', '紋理', '臍部', '觸感', '密度', '含糖率']
# 讀取數據
def loadData(filename):
dataSet = pd.read_csv(filename)
dataSet.drop(columns=['編號'], inplace=True)
return dataSet
def processData(dataSet):
# 連續值規范化到[0,1]區間
for key in names:
if key in D_keys:
continue
x = np.array(dataSet[key])
x = x.reshape(x.shape[0], 1)
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
dataSet[key] = pd.DataFrame(x_scaled)
return dataSet
# 計算歐式距離
# 離散值若兩者不同記1,同記0
# 連續值計算絕對值,且已規范化到[0,1]區間
def calc_distance(xa, xb):
distance = 0
for key in names:
if key in D_keys:
distance += 0 if xa[key] == xb[key] else 1
else:
distance += (xa[key] - xb[key])**2
return distance**(.5)
# 計算兩個樣本在屬性j上的diff值
def calc_diff(xa, xb, j):
if j in D_keys:
return 0 if xa[j] == xb[j] else 1
else:
return abs(xa[j] - xb[j])
#尋找猜中近鄰
def find_near_hit(dataSet, i, xi):
label = xi[class_name]
hit_samples = dataSet.loc[dataSet[class_name]==label]
least_distance = 9999
for index, row in hit_samples.iterrows():
if index == i:
continue
distance = calc_distance(xi, row)
if distance < least_distance:
xi_nh = row
least_distance = distance
return xi_nh
#尋找猜錯近鄰
def find_near_miss(dataSet, i, xi):
label = xi[class_name]
miss_samples = dataSet.loc[dataSet[class_name]!=label]
least_distance = 9999
for index, row in miss_samples.iterrows():
distance = calc_distance(xi, row)
if distance < least_distance:
xi_nm = row
least_distance = distance
return xi_nm
def Relief(dataSet):
features = []
for key in names:
power = 0
for index, xi in dataSet.iterrows():
label = xi[class_name]
# near-hit
xi_nh = find_near_hit(dataSet, index, xi)
# near-miss
xi_nm = find_near_miss(dataSet, index, xi)
# compute power of key
diff_nh = calc_diff(xi, xi_nh, key)
diff_nm = calc_diff(xi, xi_nm, key)
power += -diff_nh**2 + diff_nm**2
features.append(power)
return features
if __name__=='__main__':
filename = 'data_3.txt'
dataSet = loadData(filename)
dataSet = processData(dataSet)
features = Relief(dataSet)
sequence = {feature: name for name, feature in zip(names, features)}
for feature in sorted(features, reverse=True):
print(sequence[feature], feature)
?
更多文章、技術交流、商務合作、聯系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
