面向站長(zhǎng)和網(wǎng)站管理員的Web緩存加速指南[翻譯]
原文(英文)地址:
http://www.mnot.net/cache_docs/
版權(quán)聲明:
署名-非商業(yè)性使用-禁止演繹 2.0
轉(zhuǎn)載: http://www.chedong.com/tech/cache_docs.html
這是一篇知識(shí)性的文檔,主要目的是為了讓W(xué)eb緩存相關(guān)概念更容易被開(kāi)發(fā)者理解并應(yīng)用于實(shí)際的應(yīng)用環(huán)境中。為了簡(jiǎn)要起見(jiàn),某些實(shí)現(xiàn)方面的細(xì)節(jié)被簡(jiǎn)化或省略了。如果你更關(guān)心細(xì)節(jié)實(shí)現(xiàn)則完全不必耐心看完本文,后面參考文檔和更多深入閱讀部分可能是你更需要的內(nèi)容。
- 什么是Web緩存,為什么要使用它?
-
緩存的類型:
- 瀏覽器緩存;
- 代理服務(wù)器緩存;
- Web緩存無(wú)害嗎?為什么要鼓勵(lì)緩存?
- Web緩存如何工作:
-
如何控制(控制不)緩存:
- HTML Meta標(biāo)簽 vs. HTTP頭信息;
- Pragma HTTP頭信息(為什么不起作用);
- 使用Expires(過(guò)期時(shí)間)HTTP頭信息控制保鮮期;
-
Cache-Control(緩存控制) HTTP頭信息;
- 校驗(yàn)參數(shù)和校驗(yàn);
- 創(chuàng)建利于緩存網(wǎng)站的竅門(mén);
- 編寫(xiě)利于緩存的腳本;
- 常見(jiàn)問(wèn)題解答;
- 緩存機(jī)制的實(shí)現(xiàn):Web服務(wù)器端配置;
- 緩存機(jī)制的實(shí)現(xiàn):服務(wù)器端腳本;
- 參考文檔和深入閱讀;
- 關(guān)于本文檔;
什么是Web緩存,為什么要使用它?
Web緩存位于Web服務(wù)器之間(1個(gè)或多個(gè),內(nèi)容源服務(wù)器)和客戶端之間(1個(gè)或多個(gè)):緩存會(huì)根據(jù)進(jìn)來(lái)的請(qǐng)求保存輸出內(nèi)容的副本,例如html頁(yè)面, 圖片,文件(統(tǒng)稱為副本),然后,當(dāng)下一個(gè)請(qǐng)求來(lái)到的時(shí)候:如果是相同的URL,緩存直接使用副本響應(yīng)訪問(wèn)請(qǐng)求,而不是向源服務(wù)器再次發(fā)送請(qǐng)求。使用緩存主要有2大理由:
- 減少相應(yīng)延遲 :因?yàn)檎?qǐng)求從緩存服務(wù)器(離客戶端更近)而不是源服務(wù)器被相應(yīng),這個(gè)過(guò)程耗時(shí)更少,讓web服務(wù)器看上去相應(yīng)更快;
- 減少網(wǎng)絡(luò)帶寬消耗 :當(dāng)副本被重用時(shí)會(huì)減低客戶端的帶寬消耗;客戶可以節(jié)省帶寬費(fèi)用,控制帶寬的需求的增長(zhǎng)并更易于管理。
緩存的類型
瀏覽器緩存
對(duì)于新一代的Web瀏覽器來(lái)說(shuō)(例如:IE,F(xiàn)irefox):一般都能在設(shè)置對(duì)話框中發(fā)現(xiàn)關(guān)于緩存的設(shè)置,通過(guò)在你的電腦上僻處一塊硬盤(pán)空間用于存儲(chǔ)你已經(jīng)看過(guò)的網(wǎng)站的副本。瀏覽器緩存根據(jù)非常簡(jiǎn)單的規(guī)則進(jìn)行工作:在同一個(gè)會(huì)話過(guò)程中(在當(dāng)前瀏覽器沒(méi)有被關(guān)閉之前)會(huì)檢查一次并確定緩存的副本足夠新。這個(gè)緩存對(duì)于用戶點(diǎn)擊“后退”或者點(diǎn)擊剛訪問(wèn)過(guò)的鏈接特別有用,如果你瀏覽過(guò)程中訪問(wèn)到同一個(gè)圖片,這些圖片可以從瀏覽器緩存中調(diào)出而即時(shí)顯現(xiàn)。
代理服務(wù)器緩存
Web代理服務(wù)器使用同樣的緩存原理,只是規(guī)模更大。代理服務(wù)器群為成百上千用戶服務(wù)使用同樣的機(jī)制;大公司和ISP經(jīng)常在他們的防火墻上架設(shè)代理緩存或者單獨(dú)的緩存設(shè)備;
由于帶路服務(wù)器緩存并非客戶端或者源服務(wù)器的一部分,而是位于原網(wǎng)絡(luò)之外,請(qǐng)求必須路由到他們才能起作用。一個(gè)方法是手工設(shè)置你的瀏覽器:告訴瀏覽器使用 那個(gè)代理,另外一個(gè)是通過(guò)中間服務(wù)器:這個(gè)中間服務(wù)器處理所有的web請(qǐng)求,并將請(qǐng)求轉(zhuǎn)發(fā)到后臺(tái)網(wǎng)絡(luò),而用戶不必配置代理,甚至不必知道代理的存在;
代理服務(wù)器緩存:是一個(gè)共享緩存,不只為一個(gè)用戶服務(wù),經(jīng)常為大量用戶使用,因此在減少相應(yīng)時(shí)間和帶寬使用方面很有效:因?yàn)橥粋€(gè)副本會(huì)被重用多次。
網(wǎng)關(guān)緩存
也被稱為反向代理緩存或間接代理緩存,網(wǎng)關(guān)緩存也是一個(gè)中間服務(wù)器,和內(nèi)網(wǎng)管理員部署緩存用于節(jié)省帶寬不同:網(wǎng)關(guān)緩存一般是網(wǎng)站管理員自己部署:讓他們的網(wǎng)站更容易擴(kuò)展并獲得更好的性能;
請(qǐng)求有幾種方法被路由到網(wǎng)關(guān)緩存服務(wù)器上:其中典型的是讓用一臺(tái)或多臺(tái)負(fù)載均衡服務(wù)器從客戶端看上去是源服務(wù)器;
網(wǎng)絡(luò)內(nèi)容發(fā)布商 (Content delivery networks CDNs)分布網(wǎng)關(guān)緩存到整個(gè)(或部分)互聯(lián)網(wǎng)上,并出售緩存服務(wù)給需要的網(wǎng)站,
Speedera
和
Akamai
就是典型的網(wǎng)絡(luò)內(nèi)容發(fā)布商(下文簡(jiǎn)稱CDN)。
本問(wèn)主要關(guān)注于瀏覽器和代理緩存,當(dāng)然,有些信息對(duì)于網(wǎng)關(guān)緩存也同樣有效;
Web緩存無(wú)害嗎?為什么要鼓勵(lì)緩存?
Web緩存在互聯(lián)網(wǎng)上最容易被誤解的技術(shù)之一:網(wǎng)站管理員經(jīng)常怕對(duì)網(wǎng)站失去控制,由于代理緩存會(huì)“隱藏”他們的用戶,讓他們感覺(jué)難以監(jiān)控誰(shuí)在使用他們的網(wǎng)站。
不幸的是:就算不考慮Web緩存,互聯(lián)網(wǎng)上也有很多網(wǎng)站使用非常多的參數(shù)以便管理員精確地跟蹤用戶如何使用他們的網(wǎng)站;如果這類問(wèn)題也是你關(guān)心的,本文將告訴你如何獲得精確的統(tǒng)計(jì)而不必將網(wǎng)站設(shè)計(jì)的非常緩存不友好。
另外一個(gè)抱怨是緩存會(huì)給用戶過(guò)期或失效的數(shù)據(jù);無(wú)論如何:本文可以告訴你怎樣配置你的服務(wù)器來(lái)控制你的內(nèi)容將被如何緩存。
CDN是另外一個(gè)有趣的方向,和其他代理緩存不同:CDN的網(wǎng)關(guān)緩存為希望被緩存的網(wǎng)站服務(wù),沒(méi)有以上顧慮。即使你使用了CDN,你也要考慮后續(xù)的代理服務(wù)器緩存和瀏覽器緩存問(wèn)題。
另外一方面:如果良好地規(guī)劃了你的網(wǎng)站,緩存會(huì)有助于網(wǎng)站服務(wù)更快,并節(jié)省服務(wù)器負(fù)載和互聯(lián)網(wǎng)的鏈接請(qǐng)求。這個(gè)改善是顯著的:一個(gè)難以緩存的網(wǎng)站可能需要幾秒去載入頁(yè)面,而對(duì)比有緩存的網(wǎng)站頁(yè)面幾乎是即時(shí)顯現(xiàn):用戶更喜歡速度快的網(wǎng)站并更經(jīng)常的訪問(wèn);
這樣想:很多大型互聯(lián)網(wǎng)公司為全世界服務(wù)器群投入上百萬(wàn)資金,為的就是讓用戶訪問(wèn)盡可能快,客戶端緩存也是這個(gè)目的,只不過(guò)更靠近用戶一端,而且最好的一點(diǎn)是你甚至根本不用為此付費(fèi)。
事實(shí)上,無(wú)論你是否喜歡,代理服務(wù)器和瀏覽器都回啟用緩存。如果你沒(méi)有配置網(wǎng)站正確的緩存,他們會(huì)按照缺省或者緩存管理員的策略進(jìn)行緩存。
緩存如何工作
所有的緩存都用一套規(guī)則來(lái)幫助他們決定什么時(shí)候使用緩存中的副本提供服務(wù)(假設(shè)有副本可用的情況下);一些規(guī)則在協(xié)議中有定義(HTTP協(xié)議1.0和1.1),一些規(guī)則由緩存的管理員設(shè)置(瀏覽器的用戶或者代理服務(wù)器的管理員);
一般說(shuō)來(lái):遵循以下基本的規(guī)則(不必?fù)?dān)心,你不必知道所有的細(xì)節(jié),細(xì)節(jié)將隨后說(shuō)明)
- 如果響應(yīng)頭信息:告訴緩存器不要保留緩存,緩存器就不會(huì)緩存相應(yīng)內(nèi)容;
- 如果請(qǐng)求信息是需要認(rèn)證或者安全加密的,相應(yīng)內(nèi)容也不會(huì)被緩存;
- 如果在回應(yīng)中不存在校驗(yàn)器(ETag或者Last-Modified頭信息),緩存服務(wù)器會(huì)認(rèn)為缺乏直接的更新度信息,內(nèi)容將會(huì)被認(rèn)為不可緩存。
-
一個(gè)緩存的副本如果含有以下信息:內(nèi)容將會(huì)被認(rèn)為是足夠新的
- 含有完整的過(guò)期時(shí)間和壽命控制頭信息,并且內(nèi)容仍在保鮮期內(nèi);
- 瀏覽器已經(jīng)使用過(guò)緩存副本,并且在一個(gè)會(huì)話中已經(jīng)檢查過(guò)內(nèi)容的新鮮度;
- 緩存代理服務(wù)器近期內(nèi)已經(jīng)使用過(guò)緩存副本,并且內(nèi)容的最后更新時(shí)間在上次使用期之前;
- 夠新的副本將直接從緩存中送出,而不會(huì)向源服務(wù)器發(fā)送請(qǐng)求;
- 如果緩存的副本已經(jīng)太舊了,緩存服務(wù)器將向源服務(wù)器發(fā)出請(qǐng)求校驗(yàn)請(qǐng)求,用于確定是否可以繼續(xù)使用當(dāng)前拷貝繼續(xù)服務(wù);
如果副本足夠新,從緩存中提取就立刻能用了;
而經(jīng)緩存器校驗(yàn)后發(fā)現(xiàn)副本的原件沒(méi)有變化,系統(tǒng)也會(huì)避免將副本內(nèi)容從源服務(wù)器整個(gè)重新傳輸一遍。
如何控制(控制不)緩存
有很多工具可以幫助設(shè)計(jì)師和網(wǎng)站管理員調(diào)整緩存服務(wù)器對(duì)待網(wǎng)站的方式,這也許需要你親自下手對(duì)服務(wù)器的配置進(jìn)行一些調(diào)整,但絕對(duì)值得;了解如何使用這些工具請(qǐng)參考后面的實(shí)現(xiàn)章節(jié);
HTML meta標(biāo)簽和HTTP 頭信息
HTML的編寫(xiě)者會(huì)在文檔的<HEAD>區(qū)域中加入描述文檔的各種屬性,這些META標(biāo)簽常常被用于標(biāo)記文檔不可以被緩存或者標(biāo)記多長(zhǎng)時(shí)間后過(guò)期;
META標(biāo)簽使用很簡(jiǎn)單:但是效率并不高,因?yàn)橹挥袔追N瀏覽器會(huì)遵循這個(gè)標(biāo)記(那些真正會(huì)“讀懂”HTML的瀏覽器),沒(méi)有一種緩存代理服務(wù)器能遵循這個(gè) 規(guī)則(因?yàn)樗鼈儙缀跬耆唤馕鑫臋n中HTML內(nèi)容);有事會(huì)在Web頁(yè)面中增加:Pragma: no-cache這個(gè)META標(biāo)記,如果要讓頁(yè)面保持刷新,這個(gè)標(biāo)簽其實(shí)完全沒(méi)有必要。
如果你的網(wǎng)站托管在ISP機(jī)房中,并且機(jī)房可能不給你權(quán)限去控制HTTP的頭信息(如:Expires和Cache-Control),大聲控訴:這些機(jī)制對(duì)于你的工作來(lái)說(shuō)是必須的;
另外一方面: HTTP頭信息可以讓你對(duì)瀏覽器和代理服務(wù)器如何處理你的副本進(jìn)行更多的控制。他們?cè)贖TML代碼中是看不見(jiàn)的,一般由Web服務(wù)器自動(dòng)生成。但是,根據(jù) 你使用的服務(wù),你可以在某種程度上進(jìn)行控制。在下文中:你將看到一些有趣的HTTP頭信息,和如何在你的站點(diǎn)上應(yīng)用部署這些特性。
HTTP頭信息發(fā)送在HTML代碼之前,只有被瀏覽器和一些中間緩存能看到,一個(gè)典型的HTTP 1.1協(xié)議返回的頭信息看上去像這樣:
Date: Fri, 30 Oct 1998 13:19:41 GMT
Server: Apache/1.3.3 (Unix)
Cache-Control: max-age=3600, must-revalidate
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT
ETag: "3e86-410-3596fbbc"
Content-Length: 1040
Content-Type: text/html
在頭信息空一行后是HTML代碼的輸出,關(guān)于如何設(shè)置HTTP頭信息請(qǐng)參考實(shí)現(xiàn)章節(jié);
Pragma HTTP頭信息 (為什么它不起作用)
很多人認(rèn)為在HTTP頭信息中設(shè)置了Pragma: no-cache后會(huì)讓內(nèi)容無(wú)法被緩存。但事實(shí)并非如此:HTTP的規(guī)范中,響應(yīng)型頭信息沒(méi)有任何關(guān)于Pragma屬性的說(shuō)明,而討論了的是請(qǐng)求型頭信息 Pragma屬性(頭信息也由瀏覽器發(fā)送給服務(wù)器),雖然少數(shù)集中緩存服務(wù)器會(huì)遵循這個(gè)頭信息,但大部分不會(huì)。用了Pragma也不起什么作用,要用就使 用下列頭信息:
使用Expires(過(guò)期時(shí)間)HTTP頭信息來(lái)控制保鮮期
Expires(過(guò)期時(shí)間) 屬性是HTTP控制緩存的基本手段,這個(gè)屬性告訴緩存器:相關(guān)副本在多長(zhǎng)時(shí)間內(nèi)是新鮮的。過(guò)了這個(gè)時(shí)間,緩存器就會(huì)向源服務(wù)器發(fā)送請(qǐng)求,檢查文檔是否被修改。幾乎所有的緩存服務(wù)器都支持Expires(過(guò)期時(shí)間)屬性;
大部分Web服務(wù)器支持你用幾種方式設(shè)置Expires屬性;一般的:可以設(shè)計(jì)一個(gè)絕對(duì)時(shí)間間隔:基于客戶最后查看副本的時(shí)間(最后訪問(wèn)時(shí)間)或者根據(jù)服務(wù)器上文檔最后被修改的時(shí)間;
Expires頭信息:對(duì)于設(shè)置靜態(tài)圖片文件(例如導(dǎo)航欄和圖片按鈕)可緩存特別有用;因?yàn)檫@些圖片修改很少,你可以給它們?cè)O(shè)置一個(gè)特別長(zhǎng)的過(guò)期時(shí)間,這會(huì)使你的網(wǎng)站對(duì)用戶變得相應(yīng)非常快;他們對(duì)于控制有規(guī)律改變的網(wǎng)頁(yè)也很有用,例如:你每天早上6點(diǎn)更新新聞頁(yè),你可以設(shè)置副本的過(guò)期時(shí)間也是這個(gè)時(shí)間,這樣緩存 服務(wù)器就知道什么時(shí)候去取一個(gè)更新版本,而不必讓用戶去按瀏覽器的“刷新”按鈕。
過(guò)期時(shí)間頭信息屬性值
只能
是HTTP格式的日期時(shí)間,其他的都會(huì)被解析成當(dāng)前時(shí)間“之前”,副本會(huì)過(guò)期,記住:HTTP的日期時(shí)間必須是格林威治時(shí)間(GMT),而不是本地時(shí)間。舉例:
所以使用過(guò)期時(shí)間屬性一定要確認(rèn)你的Web服務(wù)器時(shí)間設(shè)置正確,一個(gè)途徑是通過(guò)網(wǎng)絡(luò)時(shí)間同步協(xié)議(Network Time Protocol NTP),和你的系統(tǒng)管理員那里你可以了解更多細(xì)節(jié)。
雖然過(guò)期時(shí)間屬性非常有用,但是它還是有些局限,首先:是牽扯到了日期,這樣Web服務(wù)器的時(shí)間和緩存服務(wù)器的時(shí)間必須是同步的,如果有些不同步,要么是應(yīng)該緩存的內(nèi)容提前過(guò)期了,要么是過(guò)期結(jié)果沒(méi)及時(shí)更新。
還有一個(gè)過(guò)期時(shí)間設(shè)置的問(wèn)題也不容忽視:如果你設(shè)置的過(guò)期時(shí)間是一個(gè)固定的時(shí)間,如果你返回內(nèi)容的時(shí)候又沒(méi)有連帶更新下次過(guò)期的時(shí)間,那么之后所有訪問(wèn)請(qǐng)求都會(huì)被發(fā)送給源Web服務(wù)器,反而增加了負(fù)載和響應(yīng)時(shí)間;
Cache-Control(緩存控制) HTTP頭信息
HTTP 1.1介紹了另外一組頭信息屬性:Cache-Control響應(yīng)頭信息,讓網(wǎng)站的發(fā)布者可以更全面的控制他們的內(nèi)容,并定位過(guò)期時(shí)間的限制。
有用的 Cache-Control響應(yīng)頭信息包括:
- max-age =[秒] — 執(zhí)行緩存被認(rèn)為是最新的最長(zhǎng)時(shí)間。類似于過(guò)期時(shí)間,這個(gè)參數(shù)是基于請(qǐng)求時(shí)間的相對(duì)時(shí)間間隔,而不是絕對(duì)過(guò)期時(shí)間,[秒]是一個(gè)數(shù)字,單位是秒:從請(qǐng)求時(shí)間開(kāi)始到過(guò)期時(shí)間之間的秒數(shù)。
- s-maxage =[秒] — 類似于max-age屬性,除了他應(yīng)用于共享(如:代理服務(wù)器)緩存
- public — 標(biāo)記認(rèn)證內(nèi)容也可以被緩存,一般來(lái)說(shuō): 經(jīng)過(guò)HTTP認(rèn)證才能訪問(wèn)的內(nèi)容,輸出是自動(dòng)不可以緩存的;
- no-cache — 強(qiáng)制每次請(qǐng)求直接發(fā)送給源服務(wù)器,而不經(jīng)過(guò)本地緩存版本的校驗(yàn)。這對(duì)于需要確認(rèn)認(rèn)證應(yīng)用很有用(可以和public結(jié)合使用),或者嚴(yán)格要求使用最新數(shù)據(jù)的應(yīng)用(不惜犧牲使用緩存的所有好處);
- no-store — 強(qiáng)制緩存在任何情況下都不要保留任何副本
- must-revalidate — 告訴緩存必須遵循所有你給予副本的新鮮度的,HTTP允許緩存在某些特定情況下返回過(guò)期數(shù)據(jù),指定了這個(gè)屬性,你高速緩存,你希望嚴(yán)格的遵循你的規(guī)則。
- proxy-revalidate — 和 must-revalidate類似,除了他只對(duì)緩存代理服務(wù)器起作用
舉例:
如果你計(jì)劃試用Cache-Control屬性,你應(yīng)該看一下這篇HTTP文檔,詳見(jiàn)參考和深入閱讀;
校驗(yàn)參數(shù)和校驗(yàn)
在Web緩存如何工作: 我們說(shuō)過(guò):校驗(yàn)是當(dāng)副本已經(jīng)修改后,服務(wù)器和緩存之間的通訊機(jī)制;使用這個(gè)機(jī)制:緩存服務(wù)器可以避免副本實(shí)際上仍然足夠新的情況下重復(fù)下載整個(gè)原件。
校驗(yàn)參數(shù)非常重要,如果1個(gè)不存在,并且沒(méi)有任何信息說(shuō)明保鮮期(Expires或Cache-Control)的情況下,緩存將不會(huì)存儲(chǔ)任何副本;
最常見(jiàn)的校驗(yàn)參數(shù)是文檔的最后修改時(shí)間,通過(guò)最后Last-Modified頭信息可以,當(dāng)一份緩存包含Last-Modified信息,他基于此信息,通過(guò)添加一個(gè)If-Modified-Since請(qǐng)求參數(shù),向服務(wù)器查詢:這個(gè)副本從上次查看后是否被修改了。
HTTP 1.1介紹了另外一個(gè)校驗(yàn)參數(shù): ETag,服務(wù)器是服務(wù)器生成的唯一標(biāo)識(shí)符ETag,每次副本的標(biāo)簽都會(huì)變化。由于服務(wù)器控制了ETag如何生成,緩存服務(wù)器可以通過(guò)If-None-Match請(qǐng)求的返回沒(méi)變則當(dāng)前副本和原件完全一致。
所有的緩存服務(wù)器都使用Last-Modified時(shí)間來(lái)確定副本是否夠新,而ETag校驗(yàn)正變得越來(lái)越流行;
所有新一代的Web服務(wù)器都對(duì)靜態(tài)內(nèi)容(如:文件)自動(dòng)生成ETag和Last-Modified頭信息,而你不必做任何設(shè)置。但是,服務(wù)器對(duì)于動(dòng)態(tài)內(nèi)容(例如:CGI,ASP或數(shù)據(jù)庫(kù)生成的網(wǎng)站)并不知道如何生成這些信息,參考一下編寫(xiě)利于緩存的腳本章節(jié);
創(chuàng)建利于緩存網(wǎng)站的竅門(mén)
除了使用新鮮度信息和校驗(yàn),你還有很多方法使你的網(wǎng)站緩存友好。
- 保持URL穩(wěn)定 : 這是緩存的金科玉律,如果你給在不同的頁(yè)面上,給不同用戶或者從不同的站點(diǎn)上提供相同的內(nèi)容,應(yīng)該使用相同的URL,這是使你的網(wǎng)站緩存友好最簡(jiǎn)單,也是 最高效的方法。例如:如果你在頁(yè)面上使用 "/index.html" 做為引用,那么就一直用這個(gè)地址;
- 使用一個(gè)共用的庫(kù) 存放每頁(yè)都引用的圖片和其他頁(yè)面元素;
- 對(duì)于不經(jīng)常改變的圖片/頁(yè)面啟用緩存 ,并使用Cache-Control: max-age屬性設(shè)置一個(gè)較長(zhǎng)的過(guò)期時(shí)間;
- 對(duì)于定期更新的內(nèi)容 設(shè)置一個(gè)緩存服務(wù)器可識(shí)別的max-age屬性或過(guò)期時(shí)間;
- 如果數(shù)據(jù)源(特別是下載文件)變更,修改名稱 ,這樣:你可以讓其很長(zhǎng)時(shí)間不過(guò)期,并且保證服務(wù)的是正確的版本;而鏈接到下載文件的頁(yè)面是一個(gè)需要設(shè)置較短過(guò)期時(shí)間的頁(yè)面。
- 萬(wàn)不得已不要改變文件 ,否則你會(huì)提供一個(gè)非常新的Last-Modified日期;例如:當(dāng)你更新了網(wǎng)站,不要復(fù)制整個(gè)網(wǎng)站的所有文件,只上傳你修改的文件。
- 只在必要的時(shí)候使用Cookie ,cookie是非常難被緩存的,而且在大多數(shù)情況下是不必要的,如果使用cookie,控制在動(dòng)態(tài)網(wǎng)頁(yè)上;
- 減少試用SSL ,加密的頁(yè)面不會(huì)被任何共享緩存服務(wù)器緩存,只在必要的時(shí)候使用,并且在SSL頁(yè)面上減少圖片的使用;
- 使用可緩存性評(píng)估引擎 ,這對(duì)于你實(shí)踐本文的很多概念都很有幫助;
編寫(xiě)利于緩存的腳本
腳本缺省不會(huì)返回校驗(yàn)參數(shù)(返回Last-Modified或ETag頭信息)或其他新鮮度信息(Expires或Cache-Control),有些動(dòng)態(tài)腳本的確是動(dòng)態(tài)內(nèi)容(每次相應(yīng)內(nèi)容都不一樣),但是更多(搜索引擎,數(shù)據(jù)庫(kù)引擎網(wǎng)站)網(wǎng)站還是能從緩存友好中獲益的。
一般說(shuō)來(lái),如果腳本生成的輸出在未來(lái)一段時(shí)間(幾分鐘或者幾天)都是可重復(fù)復(fù)制的,那么就是可緩存的。如果腳本輸出內(nèi)容只隨URL變化而變化,也是可緩存的;但如果輸出會(huì)根據(jù)cookie,認(rèn)證信息或者其他外部條件變化,則還是不可緩存的。
- 最利于緩存的腳本就是將內(nèi)容改變時(shí)導(dǎo)出成靜態(tài)文件,Web服務(wù)器可以將其當(dāng)作另外一個(gè)網(wǎng)頁(yè)并生成和試用校驗(yàn)參數(shù),讓一些都變得更簡(jiǎn)單,只需要寫(xiě)入文件即可,這樣最后修改時(shí)間也有了;
- 另外一個(gè)讓腳本可緩存的方法是對(duì)一段時(shí)間內(nèi)能保持較新的內(nèi)容設(shè)置一個(gè)相對(duì)壽命的頭信息,雖然通過(guò)Expires頭信息也可以實(shí)現(xiàn),但更容易的是用Cache-Control: max-age屬性,它會(huì)讓首次請(qǐng)求后一段時(shí)間內(nèi)緩存保持新鮮;
- 如果以上做法你都做不到,你可以讓腳本生成一個(gè)校驗(yàn)屬性,并對(duì) If-Modified-Since 和/或If-None-Match請(qǐng)求作出反應(yīng),這些屬性可以從解析HTTP頭信息得到,并對(duì)符合條件的內(nèi)容返回304 Not Modified(內(nèi)容未改變),可惜的是,這種做法比不上前2種高效;
其他竅門(mén):
- 盡量避免使用POST,除非萬(wàn)不得已,POST模式的返回內(nèi)容不會(huì)被大部分緩存服務(wù)器保存,如果你發(fā)送內(nèi)容通過(guò)URL和查詢(通過(guò)GET模式)的內(nèi)容可以緩存下來(lái)供以后使用;
- 不要在URL中加入針對(duì)每個(gè)用戶的識(shí)別信息:除非內(nèi)容是針對(duì)每個(gè)用戶不同的;
- 不要統(tǒng)計(jì)一個(gè)用戶來(lái)自一個(gè)地址的所有請(qǐng)求,因?yàn)榫彺娉3J且黄鸸ぷ鞯模?
- 生成并返回Content-Length頭信息,如果方便的話,這個(gè)屬性讓你的腳本在可持續(xù)鏈接模式時(shí):客戶端可以通過(guò)一個(gè)TCP/IP鏈接同時(shí)請(qǐng)求多個(gè)副本,而不是為每次請(qǐng)求單獨(dú)建立鏈接,這樣你的網(wǎng)站相應(yīng)會(huì)快很多;
常見(jiàn)問(wèn)題解答
讓網(wǎng)站變得可緩存的要點(diǎn)是什么?
好的策略是確定那些內(nèi)容最熱門(mén),大量的復(fù)制(特別是圖片)并針對(duì)這些內(nèi)容先部署緩存。
如何讓頁(yè)面通過(guò)緩存達(dá)到最快相應(yīng)?
緩存最好的副本是那些可以長(zhǎng)時(shí)間保持新鮮的內(nèi)容;基于校驗(yàn)雖然有助于加快相應(yīng),但是它不得不和源服務(wù)器聯(lián)系一次去檢查內(nèi)容是否夠新,如果緩存服務(wù)器上就知道內(nèi)容是新的,內(nèi)容就可以直接相應(yīng)返回了。
我理解緩存是好的,但是我不得不統(tǒng)計(jì)多少人訪問(wèn)了我的網(wǎng)站!
如果你必須知道每次頁(yè)面訪問(wèn)的,選擇【一】個(gè)頁(yè)面上的小元素,或者頁(yè)面本身,通過(guò)適當(dāng)?shù)念^信息讓其不可緩存,例如: 可以在每個(gè)頁(yè)面上部署一個(gè)1x1像素的透明圖片。Referer頭信息會(huì)有包含這個(gè)圖片的每個(gè)頁(yè)面信息;
明確一點(diǎn):這個(gè)并不會(huì)給你一個(gè)關(guān)于你用戶精確度很高的統(tǒng)計(jì),而且這對(duì)互聯(lián)網(wǎng)和你的用戶這都不太好,消耗了額外的帶寬,強(qiáng)迫用戶去訪問(wèn)無(wú)法緩存的內(nèi)容。了解更多信息,參考訪問(wèn)統(tǒng)計(jì)資料。
我如何能看到HTTP頭信息的內(nèi)容?
很多瀏覽器在頁(yè)面屬性或類似界面中可以讓你看到Expires 和Last-Modified信息;如果有的話:你會(huì)找到頁(yè)面信息的菜單和頁(yè)面相關(guān)的文件(如圖片),并且包含他們的詳細(xì)信息;
看到完整的頭信息,你可以用telnet手工連接到Web服務(wù)器;
為此:你可能需要用一個(gè)字段指定端口(缺省是80),或者鏈接到www.example.com:80 或者 www.example.com 80(注意是空格),更多設(shè)置請(qǐng)參考一下telnet客戶端的文檔;
打開(kāi)網(wǎng)站鏈接:請(qǐng)求一個(gè)查看鏈接,如果你想看到http://www.example.com/foo.html 連接到www.example.com的80端口后,鍵入:
GET /foo.html HTTP/1.1 [return]
Host: www.example.com [回車][回車]
Host: www.example.com [return][return]
在[回車]處按鍵盤(pán)的回車鍵;在最后,要按2次回車,然后,就會(huì)輸出頭信息及完整頁(yè)面,如果只想看頭信息,將GET換成HEAD。
我的頁(yè)面是密碼保護(hù)的,代理緩存服務(wù)器如何處理他們?
缺省的,網(wǎng)頁(yè)被HTTP認(rèn)證保護(hù)的都是私密內(nèi)容,它們不會(huì)被任何共享緩存保留。但是,你可以通過(guò)設(shè)置Cache-Control: public讓認(rèn)證頁(yè)面可緩存,HTTP 1.1標(biāo)準(zhǔn)兼容的緩存服務(wù)器會(huì)認(rèn)出它們可緩存。
如果你認(rèn)為這些可緩存的頁(yè)面,但是需要每個(gè)用戶認(rèn)證后才能看,可以組合使用Cache-Control: public和no-cache頭信息,高速緩存必須在提供副本之前,將將新客戶的認(rèn)證信息提交給源服務(wù)器。設(shè)置就是這樣:
Cache-Control: public, no-cache
無(wú)論如何:這是減少認(rèn)證請(qǐng)求的最好方法,例如: 你的圖片是不機(jī)密的,將它們部署在另外一個(gè)目錄,并對(duì)此配置服務(wù)器不強(qiáng)制認(rèn)證。這樣,那些圖片會(huì)缺省都緩存。
我們是否要擔(dān)心用戶通過(guò)cache訪問(wèn)我的站點(diǎn)?
代理服務(wù)器上SSL頁(yè)面不會(huì)被緩存(不推薦被緩存),所以你不必為此擔(dān)心。但是,由于緩存保存了非SSL請(qǐng)求和從他們抓取的URL,你要意識(shí)到?jīng)]有安全保護(hù)的網(wǎng)站,可能被不道德的管理員可能搜集用戶隱私,特別是通過(guò)URL。
實(shí)際上,位于服務(wù)器和客戶端之間的管理員可以搜集這類信息。特別是通過(guò)CGI腳本在通過(guò)URL傳遞用戶名和密碼的時(shí)候會(huì)有很大問(wèn)題;這對(duì)泄露用戶名和密碼是一個(gè)很大的漏洞;
如果你初步懂得互聯(lián)網(wǎng)的安全機(jī)制,你不會(huì)對(duì)緩存服務(wù)器有任何。
我在尋找一個(gè)包含在Web發(fā)布系統(tǒng)解決方案,那些是比較有緩存意識(shí)的系統(tǒng)?
這很難說(shuō),一般說(shuō)來(lái)系統(tǒng)越復(fù)雜越難緩存。最差就是全動(dòng)態(tài)發(fā)布并不提供校驗(yàn)參數(shù);你無(wú)發(fā)緩存任何內(nèi)容。可以向系統(tǒng)提供商的技術(shù)人員了解一下,并參考后面的實(shí)現(xiàn)說(shuō)明。
我的圖片設(shè)置了1個(gè)月后過(guò)期,但是我現(xiàn)在需要現(xiàn)在更新。
過(guò)期時(shí)間是繞不過(guò)去的,除非緩存(瀏覽器或者代理服務(wù)器)空間不足才會(huì)刪除副本,緩存副本在過(guò)期之間會(huì)被一直使用。
最好的辦法是改變它們的鏈接,這樣,新的副本將會(huì)從源服務(wù)器上重新下載。記住:引用它們的頁(yè)面本身也會(huì)被緩存。因此,使用靜態(tài)圖片和類似內(nèi)容是很容易緩存的,而引用他們的HTML頁(yè)面則要保持非常更新;
如果你希望對(duì)指定的緩存服務(wù)器重新載入一個(gè)副本,你可以強(qiáng)制使用“刷新”(在FireFox中在reload的時(shí)候按住shift鍵:就會(huì)有前面提到惡Pragma: no-cache頭信息發(fā)出)。或者你可以讓緩存的管理員從他們的界面中刪除相應(yīng)內(nèi)容;
我運(yùn)行一個(gè)Web托管服務(wù),如何讓我的用戶發(fā)布緩存友好的網(wǎng)頁(yè)?
如果你使用apahe,可以考慮允許他們使用.htaccess文件并提供相應(yīng)的文檔;
另外一方面: 你也可以考慮在各種虛擬主機(jī)上建立各種緩存策略。例如: 你可以設(shè)置一個(gè)目錄 /cache-1m 專門(mén)用于存放訪問(wèn)1個(gè)月的訪問(wèn),另外一個(gè) /no-cache目錄則被用提供不可存儲(chǔ)副本的服務(wù)。
無(wú)論如何:對(duì)于大量用戶訪問(wèn)還是應(yīng)該用緩存。對(duì)于大網(wǎng)站,這方面的節(jié)約很明顯(帶寬和服務(wù)器負(fù)載);
我標(biāo)記了一些網(wǎng)頁(yè)是可緩存的,但是瀏覽器仍然每次發(fā)送請(qǐng)求給服務(wù)。如何強(qiáng)制他們保存副本?
緩存服務(wù)器并不會(huì)總保存副本并重用副本;他們只是在特定情況下會(huì)不保存并使用副本。所有的緩存服務(wù)器都回基于文件的大小,類型(例如:圖片 頁(yè)面),或者服務(wù)器空間的剩余來(lái)確定如何緩存。你的頁(yè)面相比更熱門(mén)或者更大的文件相比,并不值得緩存。
所以有些緩存服務(wù)器允許管理員根據(jù)文件類型確定緩存副本的優(yōu)先級(jí),允許某些副本被永久緩存并長(zhǎng)期有效;
緩存機(jī)制的實(shí)現(xiàn) - Web服務(wù)器端配置
一般說(shuō)來(lái),應(yīng)該選擇最新版本的Web服務(wù)器程序來(lái)部署。不僅因?yàn)樗鼈儼嗬诰彺娴墓δ埽掳姹就谛阅芎桶踩苑矫娑加泻芏嗟母纳啤?
Apache HTTP服務(wù)器
Apache有些可選的模塊來(lái)包含這些頭信息: 包括Expires和Cache-Control。 這些模塊在1.2版本以上都支持;
這些模塊需要和apache一起編譯;雖然他們已經(jīng)包含在發(fā)布版本中,但缺省并沒(méi)有啟用。為了確定相應(yīng)模塊已經(jīng)被啟用:找到httpd程序并運(yùn)行httpd -l 它會(huì)列出可用的模塊,我們需要用的模塊是mod_expires和mod_headers
- 如果這些模塊不可用,你需要聯(lián)系管理員,重新編譯并包含這些模塊。這些模塊有時(shí)候通過(guò)配置文件中把注釋掉的配置啟用,或者在編譯的時(shí)候增加-enable -module=expires和-enable-module=headers選項(xiàng)(在apache 1.3和以上版本)。 參考Apache發(fā)布版中的INSTALL文件;
Apache一旦啟用了相應(yīng)的模塊,你就可以在.htaccess文件或者在服務(wù)器的access.conf文件中通過(guò)mod_expires設(shè)置副本什 么時(shí)候過(guò)期。你可設(shè)置過(guò)期從訪問(wèn)時(shí)間或文件修改時(shí)間開(kāi)始計(jì)算,并且應(yīng)用到某種文件類型上或缺省設(shè)置,參考
模塊的文檔
獲得更多信息,或者遇到問(wèn)題的時(shí)候向你身邊的apache專家討教。
應(yīng)用Cache-Control頭信息,你需要使用mod_headers,它將允許你設(shè)置任意的HTTP頭信息,參考
mod_headers的文檔
可以獲得更多資料;
這里有個(gè)例子說(shuō)明如何使用頭信息:
-
.htaccess文件允許web發(fā)布者使用命令只在配置文件中用到的命令。他影響到所在目錄及其子目錄;問(wèn)一下你的服務(wù)器管理員確認(rèn)這個(gè)功能是否啟用了。
ExpiresActive On
### 設(shè)置 .gif 在被訪問(wèn)過(guò)后1個(gè)月過(guò)期。
ExpiresByType image/gif A2592000
### 其他文件設(shè)置為最后修改時(shí)間1天后過(guò)期
### (用了另外的語(yǔ)法)
ExpiresDefault "modification plus 1 day"
### 在index.html文件應(yīng)用 Cache-Control頭屬性
<Files index.html>
Header append Cache-Control "public, must-revalidate"
</Files>
- 注意: 在適當(dāng)情況下mod_expires會(huì)自動(dòng)計(jì)算并插入Cache-Control:max-age 頭信息
Apache 2.0的配置和1.3類似,更多信息可以參考2.0的
mod_expires
和
mod_headers文檔
;
Microsoft IIS服務(wù)器
Microsoft的IIS可以非常容易的設(shè)置頭信息,注意:這只針對(duì)IIS 4.0服務(wù)器,并且只能在NT服務(wù)器上運(yùn)行。
為網(wǎng)站的一個(gè)區(qū)域設(shè)置頭信息,先要到管理員工具界面中,然后設(shè)置屬性。選擇HTTP Header選單,你會(huì)看到2個(gè)有趣的區(qū)域:?jiǎn)⒂脙?nèi)容過(guò)期和定制HTTP頭信息。頭一個(gè)設(shè)置會(huì)自動(dòng)配置,第二個(gè)可以用于設(shè)置Cache-Control頭信息;
設(shè)置asp頁(yè)面的頭信息可以參考后面的ASP章節(jié),也可以通過(guò)ISAPI模塊設(shè)置頭信息,細(xì)節(jié)請(qǐng)參考MSDN。
Netscape/iPlanet企業(yè)服務(wù)器
3.6版本以后,Netscape/iPlanet已經(jīng)不能設(shè)置Expires頭信息了,他從3.0版本開(kāi)始支持HTTP 1.1的功能。這意味著HTTP 1.1的緩存(代理服務(wù)器/瀏覽器)優(yōu)勢(shì)都可以通過(guò)你對(duì)Cache-Control設(shè)置來(lái)獲得。
使用Cache-Control頭信息,在管理服務(wù)器上選擇內(nèi)容管理|緩存設(shè)置目錄。然后:使用資源選擇器,選擇你希望設(shè)置頭信息的目錄。設(shè)置完頭信息后,點(diǎn)擊“OK”。更多信息請(qǐng)參考
Netscape/iPlanet企業(yè)服務(wù)器的手冊(cè)
。
緩存機(jī)制的實(shí)現(xiàn):服務(wù)器端腳本
需要注意的一點(diǎn)是:也許服務(wù)器設(shè)置HTTP頭信息比腳本語(yǔ)言更容易,但是兩者你都應(yīng)該使用。
因?yàn)榉?wù)器端的腳本主要是為了動(dòng)態(tài)內(nèi)容,他本身不產(chǎn)生可緩存的文件頁(yè)面,即使內(nèi)容實(shí)際是可以緩存的。如果你的內(nèi)容經(jīng)常改變,但是不是每次頁(yè)面請(qǐng)求都改變, 考慮設(shè)置一個(gè)Cache-Control: max-age頭信息;大部分用戶會(huì)在短時(shí)間內(nèi)多次訪問(wèn)同一頁(yè)面。例如: 用戶點(diǎn)擊“后退”按鈕,即使沒(méi)有新內(nèi)容,他們?nèi)匀灰俅螐姆?wù)器下載內(nèi)容查看。
CGI程序
CGI腳本是生成內(nèi)容最流行的方式之一,你可以很容易在發(fā)送內(nèi)容之前的擴(kuò)展HTTP頭信息;大部分CGI實(shí)現(xiàn)都需要你寫(xiě) Content-Type頭信息,例如這個(gè)Perl腳本:
print "Content-type: text/html\n";
print "Expires: Thu, 29 Oct 1998 17:04:19 GMT\n";
print "\n";
### 后面是內(nèi)容體...
由于都是文本,你可以很容易通過(guò)內(nèi)置函數(shù)生成Expires和其他日期相關(guān)的頭信息。如果你使用Cache-Control: max-age;會(huì)更簡(jiǎn)單;
這樣腳本可以在被請(qǐng)求后緩存10分鐘;這樣用戶如果按“后退”按鈕,他們不會(huì)重新提交請(qǐng)求;
CGI的規(guī)范同時(shí)也允許客戶端發(fā)送頭信息,每個(gè)頭信息都有一個(gè)‘HTTP_’的前綴;這樣如果一個(gè)客戶端發(fā)送一個(gè)If-Modified-Since請(qǐng)求,就是這樣的:
參考一下
cgi_buffer
庫(kù),一個(gè)自動(dòng)處理ETag的生成和校驗(yàn)的庫(kù),生成Content-Length屬性和對(duì)內(nèi)容進(jìn)行g(shù)zip壓縮。在Python腳本中也只需加入一行;
服務(wù)器端包含 Server Side Includes
SSI(經(jīng)常使用.shtml擴(kuò)展名)是網(wǎng)站發(fā)布者最早可以生成動(dòng)態(tài)內(nèi)容的方案。通過(guò)在頁(yè)面中設(shè)置特別的標(biāo)記,也成為一種嵌入HTML的腳本;
大部分SSI的實(shí)現(xiàn)無(wú)法設(shè)置校驗(yàn)器,于是無(wú)法緩存。但是Apache可以通過(guò)對(duì)特定文件的組執(zhí)行權(quán)限設(shè)置實(shí)現(xiàn)允許用戶設(shè)置那種SSI可以被緩存;結(jié)合XbitHack調(diào)整整個(gè)目錄。更多文檔請(qǐng)參考
mod_include文檔
。
PHP
PHP是一個(gè)內(nèi)建在web服務(wù)器中的服務(wù)器端腳本語(yǔ)言,當(dāng)做為HTML嵌入式腳本,很像SSI,但是有更多的選項(xiàng),PHP可以在各種Web服務(wù)器上設(shè)置為CGI模式運(yùn)行,或者做為Apache的模塊;
缺省PHP生成副本沒(méi)有設(shè)置校驗(yàn)器,于是也無(wú)法緩存,但是開(kāi)發(fā)者可以通過(guò)Header()函數(shù)來(lái)生成HTTP的頭信息;
例如:以下代碼會(huì)生成一個(gè)Cache-Control頭信息,并設(shè)置為3天以后過(guò)期的Expires頭信息;
Header("Cache-Control: must-revalidate");
$offset = 60 * 60 * 24 * 3;
$ExpStr = "Expires: " . gmdate("D, d M Y H:i:s", time() + $offset) . " GMT";
Header($ExpStr);
?>
記住: Header()的輸出必須先于所有其他HTML的輸出;
正如你看到的:你可以手工創(chuàng)建HTTP日期;PHP沒(méi)有為你提供專門(mén)的函數(shù)(新版本已經(jīng)讓這個(gè)越來(lái)越容易了,請(qǐng)參考PHP的
日期相關(guān)函數(shù)文檔
),當(dāng)然,最簡(jiǎn)單的還是設(shè)置Cache-Control: max-age頭信息,而且對(duì)于大部分情況都比較適用;
更多信息,請(qǐng)參考
header相關(guān)的文檔
;
也請(qǐng)參考一下
cgi_buffer
庫(kù),自動(dòng)處理ETag的生成和校驗(yàn),Content-Length生成和內(nèi)容的gzip壓縮,PHP腳本只需包含1行代碼;
Cold Fusion
Cold Fusion
是Macromedia的商業(yè)服務(wù)器端腳本引擎,并且支持多種Windows平臺(tái),Linux平臺(tái)和多種Unix平臺(tái)。Cold Fusion通過(guò)CFHEADER標(biāo)記設(shè)置HTTP頭信息相對(duì)容易。可惜的是:以下的Expires頭信息的設(shè)置有些容易誤導(dǎo);
它并不像你想像的那樣工作,因?yàn)闀r(shí)間(本例中為請(qǐng)求發(fā)起的時(shí)間)并不會(huì)被轉(zhuǎn)換成一個(gè)符合HTTP時(shí)間,而且打印出副本的Cold fusion的日期/時(shí)間對(duì)象,大部分客戶端會(huì)忽略或者將其轉(zhuǎn)換成1970年1月1日。
但是:Cold Fusion另外提供了一套日期格式化函數(shù), GetHttpTimeSTring. 結(jié)合DateAdd函數(shù),就很容易設(shè)置過(guò)期時(shí)間了,這里我們?cè)O(shè)置一個(gè)Header聲明副本在1個(gè)月以后過(guò)期;
你也可以使用CFHEADER標(biāo)簽來(lái)設(shè)置Cache-Control: max-age等其他頭信息;
記住:Web服務(wù)器也會(huì)將頭信息設(shè)置轉(zhuǎn)給Cold Fusion(做為CGI運(yùn)行的時(shí)候),檢查你的服務(wù)器設(shè)置并確定你是否可以利用服務(wù)器設(shè)置代替Cold Fusion。
ASP和ASP.NET
在asp中設(shè)置HTTP頭信息是:確認(rèn)Response方法先于HTML內(nèi)容輸出前被調(diào)用,或者使用 Response.Buffer暫存輸出;同樣的:注意某些版本的IIS缺省設(shè)置會(huì)輸出Cache-Control: private 頭信息,必須聲明成public才能被共享緩存服務(wù)器緩存。
IIS的ASP和其他web服務(wù)器都允許你設(shè)置HTTP頭信息,例如: 設(shè)置過(guò)期時(shí)間,你可以設(shè)置Response對(duì)象的屬性;
設(shè)置請(qǐng)求的副本在輸出的指定分鐘后過(guò)期,類似的:也可以設(shè)置絕對(duì)的過(guò)期時(shí)間(確認(rèn)你的HTTP日期格式正確)
Cache-Control頭信息可以這樣設(shè)置:
在ASP.NET中,Response.Expires 已經(jīng)不推薦使用了,正確的方法是通過(guò)Response.Cache設(shè)置Cache相關(guān)的頭信息;
Response.Cache.SetCacheability ( HttpCacheability.Public ) ;
參考
MSDN文檔
可以找到更多相關(guān)新年系;
參考文檔和深入閱讀
HTTP 1.1 規(guī)范定義
HTTP 1.1的規(guī)范有大量的擴(kuò)展用于頁(yè)面緩存,以及權(quán)威的接口實(shí)現(xiàn)指南,參考章節(jié):13, 14.9, 14.21, 以及 14.25.
Web-Caching.com
非常精彩的介紹緩存相關(guān)概念,并介紹其他在線資源。
關(guān)于非連續(xù)性訪問(wèn)統(tǒng)計(jì)
Jeff Goldberg內(nèi)容豐富的演說(shuō)告訴你為什么不應(yīng)該過(guò)度依賴訪問(wèn)統(tǒng)計(jì)和計(jì)數(shù)器;
可緩存性檢測(cè)引擎
可緩存的引擎設(shè)計(jì),檢測(cè)網(wǎng)頁(yè)并確定其如何與Web緩存服務(wù)器交互, 這個(gè)引擎配合這篇指南是一個(gè)很好的調(diào)試工具,
cgi_buffer庫(kù)
包含庫(kù):用于CGI模式運(yùn)行的Perl/Python/PHP腳本,自動(dòng)處理ETag生成/校驗(yàn),Content-Length生成和內(nèi)容壓縮。正確地。 Python版本也被用作其他大量的CGI腳本。
關(guān)于本文檔
本文版權(quán)屬于Mark Nottingham <
mnot@pobox.com
>,本作品遵循
創(chuàng)作共用版權(quán)
。
如果你鏡像本文,請(qǐng)通過(guò)以上郵件告知,這樣你可以在更新時(shí)被通知;
所有的商標(biāo)屬于其所有人。
雖然作者確信內(nèi)容在發(fā)布時(shí)的正確性,但不保證其應(yīng)用或引申應(yīng)用的正確性,如有誤傳,錯(cuò)誤或其他需要澄清的問(wèn)題請(qǐng)盡快告知作者;
本文最新版本可以從
http://www.mnot.net/cache_docs/
獲得;
翻譯版本包括:
捷克語(yǔ)版
,
法語(yǔ)版
和
中文版
。
版本: 1.81 - 2007年3月16日
創(chuàng)作共用版權(quán)聲明
翻譯:
車東
2007年9月6日
Caching Tutorial
for Web Authors and Webmasters
- What’s a Web Cache? Why do people use them?
- Kinds of Web Caches
- Aren’t Web Caches bad for me? Why should I help them?
- How Web Caches Work
- How (and how not) to Control Caches
- Tips for Building a Cache-Aware Site
- Writing Cache-Aware Scripts
- Frequently Asked Questions
- Implementation Notes — Web Servers
- Implementation Notes — Server-Side Scripting
- References and Further Information
- About This Document
What’s a Web Cache? Why do people use them?
A Web cache sits between one or more Web servers (also known as origin servers ) and a client or many clients, and watches requests come by, saving copies of the responses — like HTML pages, images and files (collectively known as representations ) — for itself. Then, if there is another request for the same URL, it can use the response that it has, instead of asking the origin server for it again.
There are two main reasons that Web caches are used:
- To reduce latency — Because the request is satisfied from the cache (which is closer to the client) instead of the origin server, it takes less time for it to get the representation and display it. This makes the Web seem more responsive.
- To reduce network traffic — Because representations are reused, it reduces the amount of bandwidth used by a client. This saves money if the client is paying for traffic, and keeps their bandwidth requirements lower and more manageable.
Kinds of Web Caches
Browser Caches
If you examine the preferences dialog of any modern Web browser (like Internet Explorer, Safari or Mozilla), you’ll probably notice a “cache” setting. This lets you set aside a section of your computer’s hard disk to store representations that you’ve seen, just for you. The browser cache works according to fairly simple rules. It will check to make sure that the representations are fresh, usually once a session (that is, the once in the current invocation of the browser).
This cache is especially useful when users hit the “back” button or click a link to see a page they’ve just looked at. Also, if you use the same navigation images throughout your site, they’ll be served from browsers’ caches almost instantaneously.
Proxy Caches
Web proxy caches work on the same principle, but a much larger scale. Proxies serve hundreds or thousands of users in the same way; large corporations and ISPs often set them up on their firewalls, or as standalone devices (also known as intermediaries ).
Because proxy caches aren’t part of the client or the origin server, but instead are out on the network, requests have to be routed to them somehow. One way to do this is to use your browser’s proxy setting to manually tell it what proxy to use; another is using interception. Interception proxies have Web requests redirected to them by the underlying network itself, so that clients don’t need to be configured for them, or even know about them.
Proxy caches are a type of shared cache ; rather than just having one person using them, they usually have a large number of users, and because of this they are very good at reducing latency and network traffic. That’s because popular representations are reused a number of times.
Gateway Caches
Also known as “reverse proxy caches” or “surrogate caches,” gateway caches are also intermediaries, but instead of being deployed by network administrators to save bandwidth, they’re typically deployed by Webmasters themselves, to make their sites more scalable, reliable and better performing.
Requests can be routed to gateway caches by a number of methods, but typically some form of load balancer is used to make one or more of them look like the origin server to clients.
Content delivery networks (CDNs) distribute gateway caches throughout the Internet (or a part of it) and sell caching to interested Web sites. Speedera and Akamai are examples of CDNs.
This tutorial focuses mostly on browser and proxy caches, although some of the information is suitable for those interested in gateway caches as well.
Aren’t Web Caches bad for me? Why should I help them?
Web caching is one of the most misunderstood technologies on the Internet. Webmasters in particular fear losing control of their site, because a proxy cache can “hide” their users from them, making it difficult to see who’s using the site.
Unfortunately for them, even if Web caches didn’t exist, there are too many variables on the Internet to assure that they’ll be able to get an accurate picture of how users see their site. If this is a big concern for you, this tutorial will teach you how to get the statistics you need without making your site cache-unfriendly.
Another concern is that caches can serve content that is out of date, or stale . However, this tutorial can show you how to configure your server to control how your content is cached.
CDNs are an interesting development, because unlike many proxy caches, their gateway caches are aligned with the interests of the Web site being cached, so that these problems aren’t seen. However, even when you use a CDN, you still have to consider that there will be proxy and browser caches downstream.
On the other hand, if you plan your site well, caches can help your Web site load faster, and save load on your server and Internet link. The difference can be dramatic; a site that is difficult to cache may take several seconds to load, while one that takes advantage of caching can seem instantaneous in comparison. Users will appreciate a fast-loading site, and will visit more often.
Think of it this way; many large Internet companies are spending millions of dollars setting up farms of servers around the world to replicate their content, in order to make it as fast to access as possible for their users. Caches do the same for you, and they’re even closer to the end user. Best of all, you don’t have to pay for them.
The fact is that proxy and browser caches will be used whether you like it or not. If you don’t configure your site to be cached correctly, it will be cached using whatever defaults the cache’s administrator decides upon.
How Web Caches Work
All caches have a set of rules that they use to determine when to serve a representation from the cache, if it’s available. Some of these rules are set in the protocols (HTTP 1.0 and 1.1), and some are set by the administrator of the cache (either the user of the browser cache, or the proxy administrator).
Generally speaking, these are the most common rules that are followed (don’t worry if you don’t understand the details, it will be explained below):
- If the response’s headers tell the cache not to keep it, it won’t.
- If the request is authenticated or secure (i.e., HTTPS), it won’t be cached.
-
A cached representation is considered
fresh
(that is, able to be sent to a client without checking with the origin server) if:
- It has an expiry time or other age-controlling header set, and is still within the fresh period, or
- If the cache has seen the representation recently, and it was modified relatively long ago.
- If an representation is stale, the origin server will be asked to validate it, or tell the cache whether the copy that it has is still good.
- Under certain circumstances — for example, when it’s disconnected from a network — a cache can serve stale responses without checking with the origin server.
If no validator (an
ETag
or
Last-Modified
header) is present on a response,
and
it doesn't have any explicit freshness information, it will usually — but not always — be considered uncacheable.
Together, freshness and validation are the most important ways that a cache works with content. A fresh representation will be available instantly from the cache, while a validated representation will avoid sending the entire representation over again if it hasn’t changed.
How (and how not) to Control Caches
There are several tools that Web designers and Webmasters can use to fine-tune how caches will treat their sites. It may require getting your hands a little dirty with your server’s configuration, but the results are worth it. For details on how to use these tools with your server, see the Implementation sections below.
HTML Meta Tags and HTTP Headers
HTML authors can put tags in a document’s <HEAD> section that describe its attributes. These meta tags are often used in the belief that they can mark a document as uncacheable, or expire it at a certain time.
Meta tags are easy to use, but aren’t very effective. That’s because they’re only honored by a few browser caches (which actually read the HTML), not proxy caches (which almost never read the HTML in the document). While it may be tempting to put a Pragma: no-cache meta tag into a Web page, it won’t necessarily cause it to be kept fresh.
If your site is hosted at an ISP or hosting farm and they don’t give you the ability to set arbitrary HTTP headers (like
Expires
and
Cache-Control
), complain loudly; these are tools necessary for doing your job.
On the other hand, true HTTP headers give you a lot of control over how both browser caches and proxies handle your representations. They can’t be seen in the HTML, and are usually automatically generated by the Web server. However, you can control them to some degree, depending on the server you use. In the following sections, you’ll see what HTTP headers are interesting, and how to apply them to your site.
HTTP headers are sent by the server before the HTML, and only seen by the browser and any intermediate caches. Typical HTTP 1.1 response headers might look like this:
HTTP/1.1 200 OK Date: Fri, 30 Oct 1998 13:19:41 GMT Server: Apache/1.3.3 (Unix) Cache-Control: max-age=3600, must-revalidate Expires: Fri, 30 Oct 1998 14:19:41 GMT Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT ETag: "3e86-410-3596fbbc" Content-Length: 1040 Content-Type: text/html
The HTML would follow these headers, separated by a blank line. See the Implementation sections for information about how to set HTTP headers.
Pragma HTTP Headers (and why they don’t work)
Many people believe that assigning a
Pragma: no-cache
HTTP header to a representation will make it uncacheable. This is not necessarily true; the HTTP specification does not set any guidelines for Pragma response headers; instead, Pragma request headers (the headers that a browser sends to a server) are discussed. Although a few caches may honor this header, the majority won’t, and it won’t have any effect. Use the headers below instead.
Controlling Freshness with the Expires HTTP Header
The
Expires
HTTP header is a basic means of controlling caches; it tells all caches how long the associated representation is fresh for. After that time, caches will always check back with the origin server to see if a document is changed.
Expires
headers are supported by practically every cache.
Most Web servers allow you to set
Expires
response headers in a number of ways. Commonly, they will allow setting an absolute time to expire, a time based on the last time that the client retrieved the representation (last
access time
), or a time based on the last time the document changed on your server (last
modification time
).
Expires
headers are especially good for making static images (like navigation bars and buttons) cacheable. Because they don’t change much, you can set extremely long expiry time on them, making your site appear much more responsive to your users. They’re also useful for controlling caching of a page that is regularly changed. For instance, if you update a news page once a day at 6am, you can set the representation to expire at that time, so caches will know when to get a fresh copy, without users having to hit ‘reload’.
The
only
value valid in an
Expires
header is a HTTP date; anything else will most likely be interpreted as ‘in the past’, so that the representation is uncacheable. Also, remember that the time in a HTTP date is Greenwich Mean Time (GMT), not local time.
For example:
Expires: Fri, 30 Oct 1998 14:19:41 GMT
It’s important to make sure that your Web server’s clock is accurate if you use the
Expires
header. One way to do this is using the
Network Time Protocol
(NTP); talk to your local system administrator to find out more.
Although the
Expires
header is useful, it has some limitations. First, because there’s a date involved, the clocks on the Web server and the cache must be synchronised; if they have a different idea of the time, the intended results won’t be achieved, and caches might wrongly consider stale content as fresh.
Another problem with
Expires
is that it’s easy to forget that you’ve set some content to expire at a particular time. If you don’t update an
Expires
time before it passes, each and every request will go back to your Web server, increasing load and latency.
Cache-Control HTTP Headers
HTTP 1.1 introduced a new class of headers,
Cache-Control
response headers, to give Web publishers more control over their content, and to address the limitations of
Expires
.
Useful
Cache-Control
response headers include:
-
max-age=
[seconds] — specifies the maximum amount of time that an representation will be considered fresh. Similar toExpires
, this directive is relative to the time of the request, rather than absolute. [seconds] is the number of seconds from the time of the request you wish the representation to be fresh for. -
s-maxage=
[seconds] — similar tomax-age
, except that it only applies to shared (e.g., proxy) caches. -
public
— marks authenticated responses as cacheable; normally, if HTTP authentication is required, responses are automatically private. -
private
— allows caches that are specific to one user (e.g., in a browser) to store the response; shared caches (e.g., in a proxy) may not. -
no-cache
— forces caches to submit the request to the origin server for validation before releasing a cached copy, every time. This is useful to assure that authentication is respected (in combination with public), or to maintain rigid freshness, without sacrificing all of the benefits of caching. -
no-store
— instructs caches not to keep a copy of the representation under any conditions. -
must-revalidate
— tells caches that they must obey any freshness information you give them about a representation. HTTP allows caches to serve stale representations under special conditions; by specifying this header, you’re telling the cache that you want it to strictly follow your rules. -
proxy-revalidate
— similar tomust-revalidate
, except that it only applies to proxy caches.
For example:
Cache-Control: max-age=3600, must-revalidate
If you plan to use the
Cache-Control
headers, you should have a look at the excellent documentation in HTTP 1.1; see
References and Further Information
.
Validators and Validation
In How Web Caches Work , we said that validation is used by servers and caches to communicate when an representation has changed. By using it, caches avoid having to download the entire representation when they already have a copy locally, but they’re not sure if it’s still fresh.
Validators are very important; if one isn’t present, and there isn’t any freshness information (
Expires
or
Cache-Control
) available, caches will not store a representation at all.
The most common validator is the time that the document last changed, as communicated in
Last-Modified
header. When a cache has an representation stored that includes a
Last-Modified
header, it can use it to ask the server if the representation has changed since the last time it was seen, with an
If-Modified-Since
request.
HTTP 1.1 introduced a new kind of validator called the
ETag
. ETags are unique identifiers that are generated by the server and changed every time the representation does. Because the server controls how the ETag is generated, caches can be surer that if the ETag matches when they make a
If-None-Match
request, the representation really is the same.
Almost all caches use Last-Modified times in determining if an representation is fresh; ETag validation is also becoming prevalent.
Most modern Web servers will generate both
ETag
and
Last-Modified
headers to use as validators for static content (i.e., files) automatically; you won’t have to do anything. However, they don’t know enough about dynamic content (like CGI, ASP or database sites) to generate them; see
Writing Cache-Aware Scripts
.
Tips for Building a Cache-Aware Site
Besides using freshness information and validation, there are a number of other things you can do to make your site more cache-friendly.
- Use URLs consistently — this is the golden rule of caching. If you serve the same content on different pages, to different users, or from different sites, it should use the same URL. This is the easiest and most effective way to make your site cache-friendly. For example, if you use “/index.html” in your HTML as a reference once, always use it that way.
- Use a common library of images and other elements and refer back to them from different places.
-
Make caches store images and pages that don’t change often
by using a
Cache-Control: max-age
header with a large value. - Make caches recognise regularly updated pages by specifying an appropriate max-age or expiration time.
- If a resource (especially a downloadable file) changes, change its name. That way, you can make it expire far in the future, and still guarantee that the correct version is served; the page that links to it is the only one that will need a short expiry time.
-
Don’t change files unnecessarily.
If you do, everything will have a falsely young
Last-Modified
date. For instance, when updating your site, don’t copy over the entire site; just move the files that you’ve changed. - Use cookies only where necessary — cookies are difficult to cache, and aren’t needed in most situations. If you must use a cookie, limit its use to dynamic pages.
- Minimize use of SSL — because encrypted pages are not stored by shared caches, use them only when you have to, and use images on SSL pages sparingly.
- Check your pages with REDbot — it can help you apply many of the concepts in this tutorial.
Writing Cache-Aware Scripts
By default, most scripts won’t return a validator (a
Last-Modified
or
ETag
response header) or freshness information (
Expires
or
Cache-Control
). While some scripts really are dynamic (meaning that they return a different response for every request), many (like search engines and database-driven sites) can benefit from being cache-friendly.
Generally speaking, if a script produces output that is reproducible with the same request at a later time (whether it be minutes or days later), it should be cacheable. If the content of the script changes only depending on what’s in the URL, it is cacheble; if the output depends on a cookie, authentication information or other external criteria, it probably isn’t.
-
The best way to make a script cache-friendly (as well as perform better) is to dump its content to a plain file whenever it changes. The Web server can then treat it like any other Web page, generating and using validators, which makes your life easier. Remember to only write files that have changed, so the
Last-Modified
times are preserved. -
Another way to make a script cacheable in a limited fashion is to set an age-related header for as far in the future as practical. Although this can be done with
Expires
, it’s probably easiest to do so withCache-Control: max-age
, which will make the request fresh for an amount of time after the request. -
If you can’t do that, you’ll need to make the script generate a validator, and then respond to
If-Modified-Since
and/orIf-None-Match
requests. This can be done by parsing the HTTP headers, and then responding with304 Not Modified
when appropriate. Unfortunately, this is not a trival task.
Some other tips;
- Don’t use POST unless it’s appropriate. Responses to the POST method aren’t kept by most caches; if you send information in the path or query (via GET), caches can store that information for the future.
- Don’t embed user-specific information in the URL unless the content generated is completely unique to that user.
- Don’t count on all requests from a user coming from the same host , because caches often work together.
-
Generate
Content-Length
response headers. It’s easy to do, and it will allow the response of your script to be used in a persistent connection . This allows clients to request multiple representations on one TCP/IP connection, instead of setting up a connection for every request. It makes your site seem much faster.
See the Implementation Notes for more specific information.
Frequently Asked Questions
What are the most important things to make cacheable?
A good strategy is to identify the most popular, largest representations (especially images) and work with them first.
How can I make my pages as fast as possible with caches?
The most cacheable representation is one with a long freshness time set. Validation does help reduce the time that it takes to see a representation, but the cache still has to contact the origin server to see if it’s fresh. If the cache already knows it’s fresh, it will be served directly.
I understand that caching is good, but I need to keep statistics on how many people visit my page!
If you must know every time a page is accessed, select ONE small item on a page (or the page itself), and make it uncacheable, by giving it a suitable headers. For example, you could refer to a 1x1 transparent uncacheable image from each page. The
Referer
header will contain information about what page called it.
Be aware that even this will not give truly accurate statistics about your users, and is unfriendly to the Internet and your users; it generates unnecessary traffic, and forces people to wait for that uncached item to be downloaded. For more information about this, see On Interpreting Access Statistics in the references .
How can I see a representation’s HTTP headers?
Many Web browsers let you see the
Expires
and
Last-Modified
headers are in a “page info” or similar interface. If available, this will give you a menu of the page and any representations (like images) associated with it, along with their details.
To see the full headers of a representation, you can manually connect to the Web server using a Telnet client.
To do so, you may need to type the port (be default, 80) into a separate field, or you may need to connect to
www.example.com:80
or
www.example.com 80
(note the space). Consult your Telnet client’s documentation.
Once you’ve opened a connection to the site, type a request for the representation. For instance, if you want to see the headers for
http://www.example.com/foo.html
, connect to
www.example.com
, port
80
, and type:
GET /foo.html HTTP/1.1 [return] Host: www.example.com [return][return]
Press the Return key every time you see
[return]
; make sure to press it twice at the end. This will print the headers, and then the full representation. To see the headers only, substitute HEAD for GET.
My pages are password-protected; how do proxy caches deal with them?
By default, pages protected with HTTP authentication are considered private; they will not be kept by shared caches. However, you can make authenticated pages public with a Cache-Control: public header; HTTP 1.1-compliant caches will then allow them to be cached.
If you’d like such pages to be cacheable, but still authenticated for every user, combine the
Cache-Control: public
and
no-cache
headers. This tells the cache that it must submit the new client’s authentication information to the origin server before releasing the representation from the cache. This would look like:
Cache-Control: public, no-cache
Whether or not this is done, it’s best to minimize use of authentication; for example, if your images are not sensitive, put them in a separate directory and configure your server not to force authentication for it. That way, those images will be naturally cacheable.
Should I worry about security if people access my site through a cache?
SSL pages are not cached (or decrypted) by proxy caches, so you don’t have to worry about that. However, because caches store non-SSL requests and URLs fetched through them, you should be conscious about unsecured sites; an unscrupulous administrator could conceivably gather information about their users, especially in the URL.
In fact, any administrator on the network between your server and your clients could gather this type of information. One particular problem is when CGI scripts put usernames and passwords in the URL itself; this makes it trivial for others to find and user their login.
If you’re aware of the issues surrounding Web security in general, you shouldn’t have any surprises from proxy caches.
I’m looking for an integrated Web publishing solution. Which ones are cache-aware?
It varies. Generally speaking, the more complex a solution is, the more difficult it is to cache. The worst are ones which dynamically generate all content and don’t provide validators; they may not be cacheable at all. Speak with your vendor’s technical staff for more information, and see the Implementation notes below.
My images expire a month from now, but I need to change them in the caches now!
The Expires header can’t be circumvented; unless the cache (either browser or proxy) runs out of room and has to delete the representations, the cached copy will be used until then.
The most effective solution is to change any links to them; that way, completely new representations will be loaded fresh from the origin server. Remember that the page that refers to an representation will be cached as well. Because of this, it’s best to make static images and similar representations very cacheable, while keeping the HTML pages that refer to them on a tight leash.
If you want to reload an representation from a specific cache, you can either force a reload (in Firefox, holding down shift while pressing ‘reload’ will do this by issuing a
Pragma: no-cache
request header) while using the cache. Or, you can have the cache administrator delete the representation through their interface.
I run a Web Hosting service. How can I let my users publish cache-friendly pages?
If you’re using Apache, consider allowing them to use .htaccess files and providing appropriate documentation.
Otherwise, you can establish predetermined areas for various caching attributes in each virtual server. For instance, you could specify a directory /cache-1m that will be cached for one month after access, and a /no-cache area that will be served with headers instructing caches not to store representations from it.
Whatever you are able to do, it is best to work with your largest customers first on caching. Most of the savings (in bandwidth and in load on your servers) will be realized from high-volume sites.
I’ve marked my pages as cacheable, but my browser keeps requesting them on every request. How do I force the cache to keep representations of them?
Caches aren’t required to keep a representation and reuse it; they’re only required to not keep or use them under some conditions. All caches make decisions about which representations to keep based upon their size, type (e.g., image vs. html), or by how much space they have left to keep local copies. Yours may not be considered worth keeping around, compared to more popular or larger representations.
Some caches do allow their administrators to prioritize what kinds of representations are kept, and some allow representations to be “pinned” in cache, so that they’re always available.
Implementation Notes — Web Servers
Generally speaking, it’s best to use the latest version of whatever Web server you’ve chosen to deploy. Not only will they likely contain more cache-friendly features, new versions also usually have important security and performance improvements.
Apache HTTP Server
Apache uses optional modules to include headers, including both Expires and Cache-Control. Both modules are available in the 1.2 or greater distribution.
The modules need to be built into Apache; although they are included in the distribution, they are not turned on by default. To find out if the modules are enabled in your server, find the httpd binary and run
httpd -l
; this should print a list of the available modules (note that this only lists compiled-in modules; on later versions of Apache, use
httpd -M
to include dynamically loaded modules as well). The modules we’re looking for are mod_expires and mod_headers.
-
If they aren’t available, and you have administrative access, you can recompile Apache to include them. This can be done either by uncommenting the appropriate lines in the Configuration file, or using the
-enable-module=expires
and-enable-module=headers
arguments to configure (1.3 or greater). Consult the INSTALL file found with the Apache distribution.
Once you have an Apache with the appropriate modules, you can use mod_expires to specify when representations should expire, either in .htaccess files or in the server’s access.conf file. You can specify expiry from either access or modification time, and apply it to a file type or as a default. See the module documentation for more information, and speak with your local Apache guru if you have trouble.
To apply
Cache-Control
headers, you’ll need to use the mod_headers module, which allows you to specify arbitrary HTTP headers for a resource. See
the mod_headers documentation
.
Here’s an example .htaccess file that demonstrates the use of some headers.
- .htaccess files allow web publishers to use commands normally only found in configuration files. They affect the content of the directory they’re in and their subdirectories. Talk to your server administrator to find out if they’re enabled.
### activate mod_expires ExpiresActive On ### Expire .gif's 1 month from when they're accessed ExpiresByType image/gif A2592000 ### Expire everything else 1 day from when it's last modified ### (this uses the Alternative syntax) ExpiresDefault "modification plus 1 day" ### Apply a Cache-Control header to index.html <Files index.html> Header append Cache-Control "public, must-revalidate" </Files>
-
Note that mod_expires automatically calculates and inserts a
Cache-Control:max-age
header as appropriate.
Apache 2’s configuration is very similar to that of 1.3; see the 2.2 mod_expires and mod_headers documentation for more information.
Microsoft IIS
Microsoft ’s Internet Information Server makes it very easy to set headers in a somewhat flexible way. Note that this is only possible in version 4 of the server, which will run only on NT Server.
To specify headers for an area of a site, select it in the
Administration Tools
interface, and bring up its properties. After selecting the
HTTP Headers
tab, you should see two interesting areas;
Enable Content Expiration
and
Custom HTTP headers
. The first should be self-explanatory, and the second can be used to apply Cache-Control headers.
See the ASP section below for information about setting headers in Active Server Pages. It is also possible to set headers from ISAPI modules; refer to MSDN for details.
Netscape/iPlanet Enterprise Server
As of version 3.6, Enterprise Server does not provide any obvious way to set Expires headers. However, it has supported HTTP 1.1 features since version 3.0. This means that HTTP 1.1 caches (proxy and browser) will be able to take advantage of Cache-Control settings you make.
To use Cache-Control headers, choose
Content Management | Cache Control Directives
in the administration server. Then, using the Resource Picker, choose the directory where you want to set the headers. After setting the headers, click ‘OK’. For more information, see the
NES manual
.
Implementation Notes — Server-Side Scripting
One thing to keep in mind is that it may be easier to set HTTP headers with your Web server rather than in the scripting language. Try both.
Because the emphasis in server-side scripting is on dynamic content, it doesn’t make for very cacheable pages, even when the content could be cached. If your content changes often, but not on every page hit, consider setting a Cache-Control: max-age header; most users access pages again in a relatively short period of time. For instance, when users hit the ‘back’ button, if there isn’t any validator or freshness information available, they’ll have to wait until the page is re-downloaded from the server to see it.
CGI
CGI scripts are one of the most popular ways to generate content. You can easily append HTTP response headers by adding them before you send the body; Most CGI implementations already require you to do this for the
Content-Type
header. For instance, in Perl;
#!/usr/bin/perl print "Content-type: text/html\n"; print "Expires: Thu, 29 Oct 1998 17:04:19 GMT\n"; print "\n"; ### the content body follows...
Since it’s all text, you can easily generate
Expires
and other date-related headers with in-built functions. It’s even easier if you use
Cache-Control: max-age
;
print "Cache-Control: max-age=600\n";
This will make the script cacheable for 10 minutes after the request, so that if the user hits the ‘back’ button, they won’t be resubmitting the request.
The CGI specification also makes request headers that the client sends available in the environment of the script; each header has ‘HTTP_’ prepended to its name. So, if a client makes an
If-Modified-Since
request, it will show up as
HTTP_IF_MODIFIED_SINCE
.
See also the
cgi_buffer
library, which automatically handles ETag generation and validation,
Content-Length
generation and gzip content-coding for Perl and Python CGI scripts with a one-line include. The Python version can also be used to wrap arbitrary CGI scripts with.
Server Side Includes
SSI (often used with the extension .shtml) is one of the first ways that Web publishers were able to get dynamic content into pages. By using special tags in the pages, a limited form of in-HTML scripting was available.
Most implementations of SSI do not set validators, and as such are not cacheable. However, Apache’s implementation does allow users to specify which SSI files can be cached, by setting the group execute permissions on the appropriate files, combined with the
XbitHack full
directive. For more information, see the
mod_include documentation
.
PHP
PHP is a server-side scripting language that, when built into the server, can be used to embed scripts inside a page’s HTML, much like SSI, but with a far larger number of options. PHP can be used as a CGI script on any Web server (Unix or Windows), or as an Apache module.
By default, representations processed by PHP are not assigned validators, and are therefore uncacheable. However, developers can set HTTP headers by using the
Header()
function.
For example, this will create a Cache-Control header, as well as an Expires header three days in the future:
<?php Header("Cache-Control: must-revalidate"); $offset = 60 * 60 * 24 * 3; $ExpStr = "Expires: " . gmdate("D, d M Y H:i:s", time() + $offset) . " GMT"; Header($ExpStr); ?>
Remember that the
Header()
function MUST come before any other output.
As you can see, you’ll have to create the HTTP date for an
Expires
header by hand; PHP doesn’t provide a function to do it for you (although recent versions have made it easier; see the
PHP's date documentation
). Of course, it’s easy to set a
Cache-Control: max-age header
, which is just as good for most situations.
For more information, see the manual entry for header .
See also the
cgi_buffer
library, which automatically handles
ETag
generation and validation,
Content-Length
generation and gzip content-coding for PHP scripts with a one-line include.
Cold Fusion
Cold Fusion , by Macromedia is a commercial server-side scripting engine, with support for several Web servers on Windows, Linux and several flavors of Unix.
Cold Fusion makes setting arbitrary HTTP headers relatively easy, with the
CFHEADER
tag. Unfortunately, their example for setting an
Expires
header, as below, is a bit misleading.
<CFHEADER NAME="Expires" VALUE="#Now()#">
It doesn’t work like you might think, because the time (in this case, when the request is made) doesn’t get converted to a HTTP-valid date; instead, it just gets printed as a representation of Cold Fusion’s Date/Time object. Most clients will either ignore such a value, or convert it to a default, like January 1, 1970.
However, Cold Fusion does provide a date formatting function that will do the job;
GetHttpTimeString
. In combination with
DateAdd
, it’s easy to set Expires dates; here, we set a header to declare that representations of the page expire in one month;
<cfheader name="Expires" value="#GetHttpTimeString(DateAdd('m', 1, Now()))#">
You can also use the
CFHEADER
tag to set
Cache-Control: max-age
and other headers.
Remember that Web server headers are passed through in some deployments of Cold Fusion (such as CGI); check yours to determine whether you can use this to your advantage, by setting headers on the server instead of in Cold Fusion.
ASP and ASP.NET
When setting HTTP headers from ASPs, make sure you either place the Response method calls before any HTML generation, or use
Response.Buffer
to buffer the output. Also, note that some versions of IIS set a
Cache-Control: private
header on ASPs by default, and must be declared public to be cacheable by shared caches.
Active Server Pages, built into IIS and also available for other Web servers, also allows you to set HTTP headers. For instance, to set an expiry time, you can use the properties of the
Response
object;
<% Response.Expires=1440 %>
specifying the number of minutes from the request to expire the representation.
Cache-Control
headers can be added like this:
<% Response.CacheControl="public" %>
In ASP.NET,
Response.Expires
is deprecated; the proper way to set cache-related headers is with
Response.Cache
;
Response.Cache.SetExpires ( DateTime.Now.AddMinutes ( 60 ) ) ; Response.Cache.SetCacheability ( HttpCacheability.Public ) ;
See the MSDN documentation for more information.
References and Further Information
HTTP 1.1 Specification
The HTTP 1.1 spec has many extensions for making pages cacheable, and is the authoritative guide to implementing the protocol. See sections 13, 14.9, 14.21, and 14.25.
Web-Caching.com
An excellent introduction to caching concepts, with links to other online resources.
On Interpreting Access Statistics
Jeff Goldberg’s informative rant on why you shouldn’t rely on access statistics and hit counters.
REDbot
Examines HTTP resources to determine how they will interact with Web caches, and generally how well they use the protocol.
cgi_buffer Library
One-line include in Perl CGI, Python CGI and PHP scripts automatically handles ETag generation and validation, Content-Length generation and gzip Content-Encoding — correctly. The Python version can also be used as a wrapper around arbitrary CGI scripts.
About This Document
This document is Copyright ? 1998-2010 Mark Nottingham < mnot@pobox.com >.This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License .
All trademarks within are property of their respective holders.
Although the author believes the contents to be accurate at the time of publication, no liability is assumed for them, their application or any consequences thereof. If any misrepresentations, errors or other need for clarification is found, please contact the author immediately.
The latest revision of this document can always be obtained from http://www.mnot.net/cache_docs/
Translations are available in: Belarusian , Chinese , Czech , German , and French .
June 29, 2010

引用通告
以下是前來(lái)引用的鏈接: 面向站長(zhǎng)和網(wǎng)站管理員的Web緩存加速指南[翻譯] :
?
面向站長(zhǎng)和網(wǎng)站管理員的Web緩存加速指南[翻譯]
來(lái)自 筆記 by 車東
原文(英文)地址: http://www.mnot.net/cache_docs...
[閱讀更多細(xì)節(jié)]
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號(hào)聯(lián)系: 360901061
您的支持是博主寫(xiě)作最大的動(dòng)力,如果您喜歡我的文章,感覺(jué)我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對(duì)您有幫助就好】元
