特征重要性算法
項目鏈接:https://github.com/Wchenguang/gglearn/blob/master/DecisionTree/李航機器學習講解/FeatureImportance.ipynb
信息增益法 公式
-
熵的定義:
-
屬性
y y
y
的熵,表示特征的不確定性:
P ( Y = y j ) = p j , i = 1 , 2 , ?   , n P\left(Y=y_{j}\right)=p_{j}, \quad i=1,2, \cdots, n P ( Y = y j ? ) = p j ? , i = 1 , 2 , ? , n
H ( Y ) = ? ∑ j = 1 n p j log ? p j H(Y)=-\sum_{j=1}^{n} p_{j} \log p_{j} H ( Y ) = ? j = 1 ∑ n ? p j ? lo g p j ?
-
屬性
y y
y
的熵,表示特征的不確定性:
-
條件熵的定義:
-
在
x x
x
已知的情況下,
y y
y
的不確定性
P ( X = x i , Y = y j ) = p i j , i = 1 , 2 , ?   , n ; j = 1 , 2 , ?   , m P\left(X=x_{i}, Y=y_{j}\right)=p_{i j}, \quad i=1,2, \cdots, n ; \quad j=1,2, \cdots, m P ( X = x i ? , Y = y j ? ) = p i j ? , i = 1 , 2 , ? , n ; j = 1 , 2 , ? , m
H ( Y ∣ X ) = ∑ i = 1 n p i H ( Y ∣ X = x i ) H(Y | X)=\sum_{i=1}^{n} p_{i} H\left(Y | X=x_{i}\right) H ( Y ∣ X ) = i = 1 ∑ n ? p i ? H ( Y ∣ X = x i ? )
-
在
x x
x
已知的情況下,
y y
y
的不確定性
-
信息增益計算流程
-
計算特征A對數據集D的熵,即計算
y y
y
的熵
H ( D ) = ? ∑ k = 1 K ∣ C k ∣ ∣ D ∣ log ? 2 ∣ C k ∣ ∣ D ∣ H(D)=-\sum_{k=1}^{K} \frac{\left|C_{k}\right|}{|D|} \log _{2} \frac{\left|C_{k}\right|}{|D|} H ( D ) = ? k = 1 ∑ K ? ∣ D ∣ ∣ C k ? ∣ ? lo g 2 ? ∣ D ∣ ∣ C k ? ∣ ? -
計算$ x
不 同 取 值 的 情 況 下 , 不同取值的情況下,
不
同
取
值
的
情
況
下
,
y $的熵
H ( D ∣ A ) = ∑ i = 1 n ∣ D i ∣ ∣ D ∣ H ( D i ) = ? ∑ i = 1 n ∣ D i ∣ ∣ D ∣ ∑ k = 1 K ∣ D i k ∣ ∣ D i ∣ log ? 2 ∣ D i k ∣ ∣ D i ∣ H(D | A)=\sum_{i=1}^{n} \frac{\left|D_{i}\right|}{|D|} H\left(D_{i}\right)=-\sum_{i=1}^{n} \frac{\left|D_{i}\right|}{|D|} \sum_{k=1}^{K} \frac{\left|D_{i k}\right|}{\left|D_{i}\right|} \log _{2} \frac{\left|D_{i k}\right|}{\left|D_{i}\right|} H ( D ∣ A ) = i = 1 ∑ n ? ∣ D ∣ ∣ D i ? ∣ ? H ( D i ? ) = ? i = 1 ∑ n ? ∣ D ∣ ∣ D i ? ∣ ? k = 1 ∑ K ? ∣ D i ? ∣ ∣ D i k ? ∣ ? lo g 2 ? ∣ D i ? ∣ ∣ D i k ? ∣ ? -
做差計算增益
g ( D , A ) = H ( D ) ? H ( D ∣ A ) g(D, A)=H(D)-H(D | A) g ( D , A ) = H ( D ) ? H ( D ∣ A )
-
計算特征A對數據集D的熵,即計算
y y
y
的熵
import numpy as np
import math
'''
熵的計算
'''
def entropy(y_values):
e = 0
unique_vals = np.unique(y_values)
for val in unique_vals:
p = np.sum(y_values == val)/len(y_values)
e += (p * math.log(p, 2))
return -1 * e
'''
條件熵的計算
'''
def entropy_condition(x_values, y_values):
ey = entropy(y_values)
ey_condition = 0
xy = np.hstack((x_values, y_values))
unique_x = np.unique(x_values)
for x_val in unique_x:
px = np.sum(x_values == x_val) / len(x_values)
xy_condition_x = xy[np.where(xy[:, 0] == x_val)]
ey_condition_x = entropy(xy_condition_x[:, 1])
ey_condition += (px * ey_condition_x)
return ey - ey_condition
'''
信息增益比:摒棄了選擇取值多的特征為重要特征的缺點
'''
def entropy_condition_ratio(x_values, y_values):
return entropy_condition(x_values, y_values) / entropy(x_values)
- 以書中P62頁的例子作為測試,以下分別為A1, A2的信息增益
xy = np.array([[0,0,0,0,0,1,1,1,1,1,2,2,2,2,2], [0,0,1,1,0,0,0,1,0,0,0,0,1,1,0], [0,0,0,1,0,0,0,1,1,1,1,1,0,0,0],
[0,1,1,0,0,0,1,1,2,2,2,1,1,2,0], [0,0,1,1,0,0,0,1,1,1,1,1,1,1,0]]).T
#A1
print(entropy_condition(xy[:, 0].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
#A2
print(entropy_condition(xy[:, 1].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
#A3
print(entropy_condition(xy[:, 2].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
#A4
print(entropy_condition(xy[:, 3].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
- 與書中結果相合
xy = np.array([[0,0,0,0,0,1,1,1,1,1,2,2,2,2,2], [0,0,1,1,0,0,0,1,0,0,0,0,1,1,0], [0,0,0,1,0,0,0,1,1,1,1,1,0,0,0],
[0,1,1,0,0,0,1,1,2,2,2,1,1,2,0], [0,0,1,1,0,0,0,1,1,1,1,1,1,1,0]]).T
#A1
print(entropy_condition_ratio(xy[:, 0].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
#A2
print(entropy_condition_ratio(xy[:, 1].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
#A3
print(entropy_condition_ratio(xy[:, 2].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
#A4
print(entropy_condition_ratio(xy[:, 3].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
基尼指數 公式
Gini ? ( p ) = ∑ k = 1 K p k ( 1 ? p k ) = 1 ? ∑ k = 1 K p k 2 \operatorname{Gini}(p)=\sum_{k=1}^{K} p_{k}\left(1-p_{k}\right)=1-\sum_{k=1}^{K} p_{k}^{2}
G
i
n
i
(
p
)
=
k
=
1
∑
K
?
p
k
?
(
1
?
p
k
?
)
=
1
?
k
=
1
∑
K
?
p
k
2
?
Gini ? ( D ) = 1 ? ∑ k = 1 K ( ∣ C k ∣ ∣ D ∣ ) 2 \operatorname{Gini}(D)=1-\sum_{k=1}^{K}\left(\frac{\left|C_{k}\right|}{|D|}\right)^{2}
G
i
n
i
(
D
)
=
1
?
k
=
1
∑
K
?
(
∣
D
∣
∣
C
k
?
∣
?
)
2
Gini ? ( D , A ) = ∣ D 1 ∣ ∣ D ∣ Gini ? ( D 1 ) + ∣ D 2 ∣ ∣ D ∣ Gini ? ( D 2 ) \operatorname{Gini}(D, A)=\frac{\left|D_{1}\right|}{|D|} \operatorname{Gini}\left(D_{1}\right)+\frac{\left|D_{2}\right|}{|D|} \operatorname{Gini}\left(D_{2}\right)
G
i
n
i
(
D
,
A
)
=
∣
D
∣
∣
D
1
?
∣
?
G
i
n
i
(
D
1
?
)
+
∣
D
∣
∣
D
2
?
∣
?
G
i
n
i
(
D
2
?
)
'''
基尼指數計算
'''
def gini(y_values):
g = 0
unique_vals = np.unique(y_values)
for val in unique_vals:
p = np.sum(y_values == val)/len(y_values)
g += (p * p)
return 1 - g
'''
按照x取值的基尼指數的計算
'''
def gini_condition(x_values, y_values):
g_condition = {}
xy = np.hstack((x_values, y_values))
unique_x = np.unique(x_values)
for x_val in unique_x:
xy_condition_x = xy[np.where(xy[:, 0] == x_val)]
xy_condition_notx = xy[np.where(xy[:, 0] != x_val)]
g_condition[x_val] = len(xy_condition_x)/len(x_values) * gini(xy_condition_x[:, 1]) + len(xy_condition_notx)/len(x_values) * gini(xy_condition_notx[:, 1])
return g_condition
xy = np.array([[0,0,0,0,0,1,1,1,1,1,2,2,2,2,2], [0,0,1,1,0,0,0,1,0,0,0,0,1,1,0], [0,0,0,1,0,0,0,1,1,1,1,1,0,0,0],
[0,1,1,0,0,0,1,1,2,2,2,1,1,2,0], [0,0,1,1,0,0,0,1,1,1,1,1,1,1,0]]).T
#A1
print(gini_condition(xy[:, 0].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
#A2
print(gini_condition(xy[:, 1].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
#A3
print(gini_condition(xy[:, 2].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
#A4
print(gini_condition(xy[:, 3].reshape(-1, 1),
xy[:, -1].reshape(-1, 1)))
- 與書中p71相符,選擇最小的特征及 x x x 取值作為最優特征及分切點。
- 其實選取基尼指數最小,即選擇在哪個特征下以及該特征取哪個值的情況下, y y y 的不確定性最小
特征重要性的對比
以隨機森林算法進行特征重要性計算,以書中數據為例
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42).fit(xy[:, :-1], xy[:, -1])
print(rf. feature_importances_)
- 總體上相符
更多文章、技術交流、商務合作、聯系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
