亚洲免费在线-亚洲免费在线播放-亚洲免费在线观看-亚洲免费在线观看视频-亚洲免费在线看-亚洲免费在线视频

信息熵 information Entropy

系統 2463 0

Introduction

Entropy is a measure of disorder, or more precisely unpredictability. For example, a series of coin tosses with a fair coin has maximum entropy, since there is no way to predict what will come next. A string of coin tosses with a coin with two heads and no tails has zero entropy, since the coin will always come up heads. Most collections of data in the real world lie somewhere in between. It is important to realize the difference between the entropy of a set of possible outcomes, and the entropy of a particular outcome. A single toss of a fair coin has an entropy of one bit, but a particular result (e.g. "heads") has zero entropy, since it is entirely "predictable".

English text has fairly low entropy. In other words, it is fairly predictable. Even if we don't know exactly what is going to come next, we can be fairly certain that, for example, there will be many more e's than z's, or that the combination 'qu' will be much more common than any other combination with a 'q' in it and the combination 'th' will be more common than any of them. Uncompressed, English text has about one bit of entropy for each byte (eight bits) of message. [ citation needed ]

If a compression scheme is lossless—that is, you can always recover the entire original message by uncompressing—then a compressed message has the same total entropy as the original, but in fewer bits. That is, it has more entropy per bit. This means a compressed message is more unpredictable, which is why messages are often compressed before being encrypted. Roughly speaking, Shannon's source coding theorem says that a lossless compression scheme cannot compress messages, on average, to have more than one bit of entropy per bit of message. The entropy of a message is in a certain sense a measure of how much information it really contains.

Shannon's theorem also implies that no lossless compression scheme can compress all messages. If some messages come out smaller, at least one must come out larger. In the real world, this is not a problem, because we are generally only interested in compressing certain messages, for example English documents as opposed to random bytes, or digital photographs rather than noise, and don't care if our compressor makes random messages larger.




最初定義

信息理論的鼻祖之一Claude E. Shannon把信息(熵)定義為離散隨機事件的出現概率。所謂信息熵,是一個數學上頗為抽象的概念,在這里不妨把信息熵理解成某種特定信息的出現概率。


對于任意一個隨機變量 X,它的熵定義如下:變量的不確定性越大,熵也就越大,把它搞清楚所需要的信息量也就越大。   


信息熵是 信息論 中用于度量信息量的一個概念。一個系統越是有序,信息熵就越低;反之,一個系統越是混亂,信息熵就越高。所以,信息熵也可以說是系統有序化程度的一個度量。


Named after Boltzmann's H-theorem , Shannon denoted the entropy H of a discrete random variable X with possible values { x 1 , ..., x n } as,

Here E is the expected value , and I is the information content of X .

I ( X ) is itself a random variable. If p denotes the probability mass function of X then the entropy can explicitly be written as

where b is the base of the logarithm used. Common values of b are 2, Euler's number e , and 10, and the unit of entropy is bit for b =2, nat for b = e , and dit (or digit) for b =10. [ 3 ]

In the case of p i =0 for some i , the value of the corresponding summand 0log b 0 is taken to be 0, which is consistent with the limit :

.

The proof of this limit can be quickly obtained applying l'H?pital's rule .


計算公式

  H(x)=E[I(xi)]=E[ log(1/p(xi)) ]=-∑p(xi)log(p(xi)) (i=1,2,..n)



具體應用 示例

1、香農指出,它的準確信息量應該是   = -(p1*log p1 + p2 * log p2 + ... +p32 *log p32),其中,p1,p2 , ...,p32 分別是這 32 個球隊奪冠的概率。香農把它稱為“信息熵” (Entropy),一般用符號 H 表示,單位是比特。有興趣的讀者可以推算一下當 32 個球隊奪冠概率相同時,對應的信息熵等于五比特。有數學基礎的讀者還可以證明上面公式的值不可能大于五。


2、在很多情況下,對一些隨機事件,我們并不了解其概率分布,所掌握的只是與隨機事件有關的一個或幾個隨機變量的平均值。例如,我們只知道一個班的學生考試成績有三個分數檔:80分、90分、100分,且已知平均成績為90分。顯然在這種情況下,三種分數檔的概率分布并不是唯一的。因為在下列已知條件限制下p1*80+p2*90+p3*100=90,P1+p2+p3=1。有無限多組解,該選哪一組解呢?即如何從這些相容的分布中挑選出“最佳的”、“最合理”的分布來呢?這個挑選標準就是最大信息熵原理。

按最大信息熵原理,我們從全部相容的分布中挑選這樣的分布,它是在某些約束條件下(通常是給定的某些隨機變量的平均值)使信息熵達到極大值的分布。這一原理是由楊乃斯提出的。這是因為信息熵取得極大值時對應的一組概率分布出現的概率占絕對優勢。從理論上可以證明這一點。在我們把熵看作是計量不確定程度的最合適的標尺時,我們就基本已經認可在給定約束下選擇不確定程度最大的那種分布作為隨機變量的分布。因為這種隨機分布是最為隨機的,是主觀成分最少,把不確定的東西作最大估計的分布。

3 Data as a Markov process

A common way to define entropy for text is based on the Markov model of text. For an order-0 source (each character is selected independent of the last characters), the binary entropy is:

where p i is the probability of i . For a first-order Markov source (one in which the probability of selecting a character is dependent only on the immediately preceding character), the entropy rate is:

where i is a state (certain preceding characters) and p i ( j ) is the probability of j given i as the previous character.

For a second order Markov source, the entropy rate is

4 b -ary entropy

In general the b -ary entropy of a source = ( S , P ) with source alphabet S = { a 1 , ..., a n } and discrete probability distribution P = { p 1 , ..., p n } where p i is the probability of a i (say p i = p ( a i )) is defined by:

Note: the b in " b -ary entropy" is the number of different symbols of the "ideal alphabet" which is being used as the standard yardstick to measure source alphabets. In information theory, two symbols are necessary and sufficient for an alphabet to be able to encode information, therefore the default is to let b = 2 ("binary entropy"). Thus, the entropy of the source alphabet, with its given empiric probability distribution, is a number equal to the number (possibly fractional) of symbols of the "ideal alphabet", with an optimal probability distribution, necessary to encode for each symbol of the source alphabet. Also note that "optimal probability distribution" here means a uniform distribution : a source alphabet with n symbols has the highest possible entropy (for an alphabet with n symbols) when the probability distribution of the alphabet is uniform. This optimal entropy turns out to be .







信息熵 information Entropy


更多文章、技術交流、商務合作、聯系博主

微信掃碼或搜索:z360901061

微信掃一掃加我為好友

QQ號聯系: 360901061

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。

【本文對您有幫助就好】

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描上面二維碼支持博主2元、5元、10元、自定義金額等您想捐的金額吧,站長會非常 感謝您的哦!!!

發表我的評論
最新評論 總共0條評論
主站蜘蛛池模板: 欧美性理论片在线观看片免费 | 性短视频在线观看免费不卡流畅 | 狠狠综合久久综合网站 | 久久精品只有这里有 | 久久国产免费观看 | 99久久综合久中文字幕 | 国产综合网站 | 国产成人免费网站在线观看 | 久久99精品久久久久久噜噜丰满 | 久久国产精品高清一区二区三区 | 毛片免费观看日本中文 | 四虎影院永久在线观看 | 国内精品视频 | 国产一区二区亚洲精品 | 青青青青久久国产片免费精品 | 最新福利在线 | 国产国产精品人在线观看 | 日本免费一级 | 九九在线观看精品视频6 | 九九久久国产精品免费热6 九九久久精品 | 亚洲精品乱码久久久久久 | 亚洲精品动漫一区二区三区在线 | 天天视频入口 | 国产一区二区三区在线影院 | 久久亚洲精品玖玖玖玖 | 亚洲视频在线网站 | 中文字幕一区二区三区在线播放 | 亚洲综合视频 | 中文字幕在线看 | 九九热在线视频播放 | 色综合久久久久久久久久久 | 婷婷在线视频观看 | 羞羞色院91| 国产成+人+综合+亚洲不卡 | 天天干天天射天天操 | 免费区欧美一级毛片精品 | 国产伦一区二区三区免费 | 欧美成人在线免费观看 | 夜夜操夜夜 | 欧美成人看片一区二区三区 | 亚洲精品人成在线观看 |