@CopyLeft by ICANTH , I Can do ANy THing that I CAN THink !~
Author :WenHui ,WuHan University ,2012-6-15
?
PDF版閱讀地址 : http://www.docin.com/p1-424285718.html
?
普通自旋鎖
自旋鎖最常見的使用場景是創(chuàng)建一段臨界區(qū) :
static DEFINE_SPINLOCK(xxx_lock);
unsigned long flags;
spin_lock_irqsave(&xxx_lock, flags);
... critical section here ..
spin_unlock_irqrestore(&xxx_lock, flags);
自旋鎖使用時值得注意的是:對于采用使用自旋鎖以保證共享變量的存取安全時,僅當系統(tǒng)中 所有涉及 到存取該共享變量的程序部分都采用 成對的spin_lock、和spin_unlock 來進行操作才能保證其安全性。
NOTE! The spin-lock is safe only when you _also_ use the lock itself to do locking across CPU's, which implies that EVERYTHING that touches a shared variable has to agree about the spinlock they want to use.
在Linux2.6.15.5中,自旋體數(shù)據(jù)結構如下:
當配置CONFIG_SMP時,raw_spinlock_t才是一個含有slock變量的結構,該slock字段標識自旋鎖是否空閑狀態(tài),用以處理多CPU處理器并發(fā)申請鎖的情況;當未配置CONFIG_SMP時,對于單CPU而言,不會發(fā)生發(fā)申請自旋鎖,故raw_lock為空結構體。
當配置CONFIG_SMP和CONFIG_PREEMPT時,spinlock_t才會有break_lock字段,break_lock字段用于標記自旋鎖競爭狀態(tài),當break_lock = 0時表示沒有多于兩個的執(zhí)行路徑,當break_lock = 1時表示沒有其它進程在忙等待該鎖。當在SMP多CPU體系架構下有可能出現(xiàn)申請不到自旋鎖、空等的情況,但LINUX內(nèi)核必須保證在spin_lock的原子性,故在配置CONFIG_PREEMPT時必須禁止內(nèi)核搶占。
字段 |
描述 |
spin_lock_init(lock) |
一個自旋鎖時,可使用接口函數(shù)將其初始化為鎖定狀態(tài) |
spin_lock(lock) |
用于鎖定自旋鎖,如果成功則返回;否則循環(huán)等待自旋鎖變?yōu)榭臻e |
spin_unlock(lock) |
釋放自旋鎖lock,重新設置自旋鎖為鎖定狀態(tài) |
spin_is_locked(lock) |
判斷當前自旋鎖是否處于鎖定狀態(tài) |
spin_unlock_wait(lock) |
循環(huán)等待、直到自旋鎖lock變?yōu)榭捎脿顟B(tài) |
spin_trylock(lock) |
嘗試鎖定自旋鎖lock,如不成功則返回0;否則鎖定,并返回1 |
spin_can_lock(lock) |
判斷自旋鎖lock是否處于空閑狀態(tài) |
spin_lock和spin_unlock的關系如下:
可見,在 UP 體系架構 中,由于沒有必要有實際的鎖以防止多CPU搶占,spin操作僅僅是禁止和開啟內(nèi)核搶占。
LINUX 2.6.35版本,將spin lock實現(xiàn)更改為 ticket lock。spin_lock數(shù)據(jù)結構除了用于內(nèi)核調(diào)試之外,字段為: raw_spinlock rlock 。
ticket spinlock將rlock字段分解為如下兩部分:
Next是下一個票號,而Owner是允許使用自旋鎖的票號。加鎖時CPU取Next,并將rlock.Next + 1。將Next與Owner相比較,若相同,則加鎖成功;否則循環(huán)等待、直到Next = rlock.Owner為止。解鎖則直接將Owner + 1即可。
spin_lock和spin_unlock的調(diào)用關系如下:
?
普通自旋鎖源碼分析
源程序文件目錄關系圖
在/include/linux/spinlock.h中通過是否配置CONFIG_SMP項判斷導入哪種自旋鎖定義及操作:
?
004 /*
005 ? * include/linux/spinlock.h - generic spinlock/rwlock declarations
007 ? * here's the role of the various spinlock/rwlock related include files:
009 ? * on SMP builds:
011 ? *? asm/spinlock_types.h: contains the arch_spinlock_t/arch_rwlock_t and the
012 ? *??????????????????????? initializers
014 ? *? linux/spinlock_types.h:
015 ? *??????????????????????? defines the generic type and initializers
017 ? *? asm/spinlock.h:?????? contains the arch_spin_*()/etc. lowlevel
018 ? *??????????????????????? implementations, mostly inline assembly code
022 ? *? linux/spinlock_api_smp.h:
023 ? *??????????????????????? contains the prototypes for the _spin_*() APIs.
025 ? *? linux/spinlock.h:???? builds the final spin_*() APIs.
027 ? * on UP builds:
029 ? *? linux/spinlock_type_up.h:
030 ? *??????????????????????? contains the generic, simplified UP spinlock type.
031 ? *??????????????????????? (which is an empty structure on non-debug builds)
033 ? *? linux/spinlock_types.h:
034 ? *??????????????????????? defines the generic type and initializers
036 ? *? linux/spinlock_up.h:
037 ? *??????????????????????? contains the arch_spin_*()/etc. version of UP
038 ? *??????????????????????? builds. (which are NOPs on non-debug, non-preempt
039 ? *??????????????????????? builds)
041 ? *?? (included on UP-non-debug builds:)
043 ? *? linux/spinlock_api_up.h:
044 ? *??????????????????????? builds the _spin_*() APIs.
046 ? *? linux/spinlock.h:???? builds the final spin_*() APIs.
047 ? */
?
082 /*
083 ? * Pull the arch_spin*() functions/declarations (UP-nondebug doesnt need them):
084 ? */
085 #ifdef CONFIG_SMP
086 # include <asm/spinlock.h>
087 #else
088 # include < linux/spinlock_up.h >
089 #endif
064 typedef struct spinlock {
065 ???????? union {
066 ???????????????? struct raw_spinlock rlock ;
075 ???????? };
076 } spinlock_t ;
282 static inline void spin_lock ( spinlock_t * lock )
283 {
284 ???????? raw_spin_lock (& lock -> rlock );
285 }
169 #define raw_spin_lock ( lock )???? _raw_spin_lock ( lock )
?
?
322 static inline void spin_unlock ( spinlock_t * lock )
323 {
324 ???????? raw_spin_unlock (& lock -> rlock );
325 }
222 #define raw_spin_unlock ( lock )?????????? _raw_spin_unlock ( lock )
?
UP 體系架構
?
spin_lock函數(shù)在UP體系架構中最終實現(xiàn)方式為:
/include/linux/spinlock_api_up.h
052 #define _raw_spin_lock ( lock )??????????????????? __LOCK ( lock )
021 /*
022 ? * In the UP-nondebug case there's no real locking going on , so the
023 ? * only thing we have to do is to keep the preempt counts and irq
024 ? * flags straight, to suppress compiler warnings of unused lock
025 ? * variables, and to add the proper checker annotations:
026 ? */
027 #define __LOCK ( lock ) \
028 ?? do { preempt_disable (); __acquire ( lock ); (void)( lock ); } while (0)
052 #define _raw_spin_lock ( lock )??????????????????? __LOCK ( lock )
?
preempt_disable在未配置CONFIG_PREEMPT時為空函數(shù),否則禁止內(nèi)核搶占。而__acquire()用于內(nèi)核編譯過程中靜態(tài)檢查。(void)(lock)則是為避免編譯器產(chǎn)生lock未被使用的警告。
?
spin_unlock函數(shù)在UP體系架構中最終實現(xiàn)方式為:
039 #define __UNLOCK ( lock ) \
040 ?? do { preempt_enable (); __release ( lock ); (void)( lock ); } while (0)
?
SMP 體系架構-Tickect Spin Lock的實現(xiàn)方式
在Linux2.6.24中,自旋鎖由一個整數(shù)表示,當為1時表示鎖是空閑的,spin_lock()每次減少1,故 <=0時則表示有多個鎖在忙等待,但這將導致不公平性。自linux2.6.25開始,自旋鎖將整數(shù)拆為一個16位數(shù),結構如下:
該實現(xiàn)機制稱為“Ticket spinlocks”,Next字節(jié)表示下一次請求鎖給其分配的票號,而Owner表示當前可以取得鎖的票號,Next和Owner初始化為0。 當lock.Next = lock.Owner時,表示該鎖處于空閑狀態(tài) 。 spin_lock 執(zhí)行如下過程:
1、my_ticket = slock.next
2、slock.next++
3、wait until my_ticket = slock.owner
spin_unlock 執(zhí)行如下過程:
1、slock.owner++
但該鎖將導致一個問題:8個bit將只能最多表示255個CPU來競爭該鎖。故系統(tǒng)通過的方式,將實現(xiàn)兩個tickect_spin_lock和ticket_spin_unclock的版本:
058 #if ( NR_CPUS < 256)
059 #define TICKET_SHIFT 8
106 #else
107 #define TICKET_SHIFT 16
?
SMP 體系架構-SPIN LOCK (ticket_shif 8)
046 #ifdef CONFIG_INLINE_SPIN_LOCK
047 #define _raw_spin_lock ( lock ) __raw_spin_lock ( lock )
048 #endif
/include/linux/spinlock_api_smp.h:
140 static inline void __raw_spin_lock ( raw_spinlock_t * lock )
141 {
142 ???????? preempt_disable ();
143 ???????? spin_acquire (& lock -> dep_map , 0, 0, _RET_IP_ );
144 ???????? LOCK_CONTENDED ( lock , do_raw_spin_trylock , do_raw_spin_lock );
145 }
在__raw_spin_lock中,首先禁止內(nèi)核搶占,調(diào)用LOCK_CONTENED宏
391 #define LOCK_CONTENDED ( _lock , try, lock )??????????????????????? \
392 do {??????????????????????????????????????????????????????????? \
393 ???????? if (!try( _lock )) {????????????????????????????????????? \
394 ???????????????? lock_contended (&( _lock )-> dep_map , _RET_IP_ );??? \
395 ???????????????? lock ( _lock );??????????????????????????????????? \
396 ???????? }?????????????????????????????????????????????????????? \
397 ???????? lock_acquired (&( _lock )-> dep_map , _RET_IP_ );???????????????????? \
398 } while (0)
其中即在_raw_spin_lock中,即為首先調(diào)用do_raw_spin_trylock嘗試加鎖,若失敗則繼續(xù)調(diào)用do_raw_spin_lock進行加鎖。而do_raw_spin_xxx具體實現(xiàn)與平臺有關。
/include/linux/spinlock.h
136 static inline void do_raw_spin_lock ( raw_spinlock_t * lock ) __acquires ( lock )
137 {
138 ???????? __acquire ( lock );
139 ???????? arch_spin_lock (& lock -> raw_lock );
140 }
?
149 static inline int do_raw_spin_trylock ( raw_spinlock_t * lock )
150 {
151 ???????? return arch_spin_trylock (&( lock )-> raw_lock );
152 }
在X86平臺下, do_raw_spin_lock 和 do_raw_spin_trylock 實現(xiàn)為兩個函數(shù):
/arch/x86/include/asm/spinlock.h
188 static __always_inline void arch_spin_lock ( arch_spinlock_t * lock )
189 {
190 ???????? __ticket_spin_lock ( lock );
191 }
192
193 static __always_inline int arch_spin_trylock ( arch_spinlock_t * lock )
194 {
195 ???????? return __ticket_spin_trylock ( lock );
196 }
058 #if ( NR_CPUS < 256)
059 #define TICKET_SHIFT 8
061 static __always_inline void __ticket_spin_lock ( arch_spinlock_t * lock )
062 {
063 ???????? short inc = 0x0100;
064
065 ???????? asm volatile (
066 ???????????????? LOCK_PREFIX "xaddw %w0, %1\n"
067 ???????????????? "1:\t"
068 ???????????????? "cmpb %h0, %b0\n\t"
069 ???????????????? "je 2f\n\t"
070 ???????????????? "rep ; nop\n\t"
071 ???????????????? "movb %1, %b0\n\t"
072 ???????????????? /* don't need lfence here, because loads are in-order */
073 ???????????????? "jmp 1b\n"
074 ???????????????? "2:"
075 ???????????????? : "+Q" ( inc ), "+m" ( lock -> slock )
076 ???????????????? :
077 ???????????????? : "memory", "cc");
078 }
066 行 :LOCK_PREFIX在UP上為空定義,而在SMP上為Lock,用以保證從 066行~074行 為原子操作,強制所有CPU緩存失效。xaddw指令用法如下:
xaddw src, dsc ==
tmp = dsc
desc = dsc + src
src = tmp
XADDW語法驗證實驗:
xaddw使%0和%1按1個word長度交換相加,即:%0: inc → slock, %1: slock → slock + 0x0100。%1此時高字節(jié)Next + 1。xaddw使%0和%1內(nèi)容改變?nèi)缦拢?
068 行 :比較inc中自己的Next是否與Owner中ticket相等,若相等則獲取自旋鎖使用權、結束循環(huán)。
070 行 ~ 073行 :如果Owner不屬于自己,則執(zhí)行空語句,并重新讀取slock中的Owner,跳回至068行進行判斷。
為什么要用LOCK_PREFIX宏來代替直接使用lock指令的方式呢?解釋如下:為了避免在配置了CONFIG_SMP項編譯產(chǎn)生的SMP內(nèi)核、實際卻運行在UP系統(tǒng)上時系統(tǒng)執(zhí)行l(wèi)ock命令所帶來的開銷,系統(tǒng)創(chuàng)建在.smp_locks一張SMP alternatives table用以保存系統(tǒng)中所有l(wèi)ock指令的指針。當實際運行時,若從SMP→UP時,可以根據(jù).smp_locks lock 指針表通過熱補丁的方式將lock指令替換成nop指令。當然也可以實現(xiàn)系統(tǒng)運行時將鎖由UP→SMP的切換。具體應用可參見參考資料《Linux 內(nèi)核 LOCK_PREFIX 的含義》。
?
009 /*
010 ? * Alternative inline assembly for SMP.
011 ? *
012 ? * The LOCK_PREFIX macro defined here replaces the LOCK and
013 ? * LOCK_PREFIX macros used everywhere in the source tree.
014 ? *
015 ? * SMP alternatives use the same data structures as the other
016 ? * alternatives and the X86_FEATURE_UP flag to indicate the case of a
017 ? * UP system running a SMP kernel.? The existing apply_alternatives()
018 ? * works fine for patching a SMP kernel for UP.
019 ? *
020 ? * The SMP alternative tables can be kept after boot and contain both
021 ? * UP and SMP versions of the instructions to allow switching back to
022 ? * SMP at runtime, when hotplugging in a new CPU, which is especially
023 ? * useful in virtualized environments.
024 ? *
025 ? * The very common lock prefix is handled as special case in a
026 ? * separate table which is a pure address list without replacement ptr
027 ? * and size information.? That keeps the table sizes small.
028 ? */
029
030 #ifdef CONFIG_SMP
031 #define LOCK_PREFIX_HERE \
032 ???????????????? ".section .smp_locks,\"a\"\n"?? \
033 ???????????????? ".balign 4\n"?????????????????? \
034 ???????????????? ".long 671f - .\n" /* offset */ \
035 ???????????????? ".previous\n"?????????????????? \
036 ???????????????? "671:"
037
038 #define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "
039
040 #else /* ! CONFIG_SMP */
041 #define LOCK_PREFIX_HERE ""
042 #define LOCK_PREFIX ""
043 #endif
032 行 “.section .smp_locks, a”,表示以下代碼生成在.smp_locks段中,而“a”代表——allocatable。
033 行~034行 “.balign 4 .long 571f”,表示以4字節(jié)對齊、將671標簽的地址置于.smp_locks段中,而標簽671的地址即為:代碼段lock指令的地址。(其實就是lock指令的指針啦~~~)
033 行~034行 “.previous”偽指令,表示恢復以前section,即代碼段。故在 038行 將導致在代碼段生成lock指令。
LOCK_CONTENDED 時首先嘗試使用__ticket_spin_trylock對lock進行加鎖,若失敗則繼續(xù)使用__ticket_spin_lock進行加鎖。不直接調(diào)用__ticket_spin_lock而使用__ticket_spin_trylock的原因是:
trylock首先不會修改lock.slock的ticket,它只是通過再次檢查,1)將slock讀出,并判斷slock是否處于空閑狀態(tài);2)調(diào)用LOCK執(zhí)行原子操作,判斷當前slock的Next是否已經(jīng)被其它CPU修改,若未被修改則獲得該鎖,并將lock.slock.Next + 1。
spin_lock,無論如何,首先調(diào)用LOCK執(zhí)行原子性操作、聲明ticket;而trylock則首先進行slock.Next == slock.Owner的判斷,降低第二次比較調(diào)用LOCK的概率。
?
080 static __always_inline int __ticket_spin_trylock ( arch_spinlock_t * lock )
081 {
082 ???????? int tmp , new;
083
084 ???????? asm volatile("movzwl %2, %0\n\t"
085 ????????????????????? "cmpb %h0,%b0\n\t"
086 ????????????????????? "leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
087 ????????????????????? "jne 1f\n\t"
088 ????????????????????? LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
089 ????????????????????? "1:"
090 ????????????????????? "sete %b1\n\t"
091 ????????????????????? "movzbl %b1,%0\n\t"
092 ????????????????????? : "=&a" ( tmp ), "=&q" (new), "+m" ( lock -> slock )
093 ????????????????????? :
094 ????????????????????? : "memory", "cc");
095
096 ???????? return tmp ;
097 }
084 行 將lock.slock的值賦給tmp。
085 行 比較tmp.next == tmp.owner,判斷當前自旋鎖是否空閑。
086 行 leal指令( Load effective address ),實際上是movl的變形,“l(fā)eal 0x10 (%eax, %eax, 3), %edx” → “%edx = 0x10 + %eax + %eax * 3”,但leal卻不像movl那樣從內(nèi)存取值、而直接讀取寄存器。 086行 語句,根據(jù)REG_PTR_MODE不同配置,在X86平臺下為:“l(fā)eal 0x100(%k0), %1”,而在其它平臺為:“l(fā)eal 0x100(%q0), %1”,忽略占位符修飾“k”或“q”,則該行語句等價于:
“movl (%0 + 0x100),%1”,此時new = { tmp.Next + 1 , tmp.Owner }。
087 行 若tmp.next != tmp.owner,即自旋鎖不空閑,則跳到089行將0賦值給tmp并返回。
088 行 原子性地執(zhí)行操作cmpxchgw,用以檢測當前自旋鎖是否已被其它CPU修改lock.slock的Next域,若有競爭者則失敗、否則獲得該鎖并將Next + 1,這一系列操作是原子性的!cmpxchgw操作解釋如下:
the accumulator ( 8-32 bits ) with "dest". If equal the "dest" is loaded with "src", otherwise the accumulator is loaded with "dest".(在IA32下,%EAX即為累加器。)
所以,“cmpxchgw %w1, %2”等效于:
“tmp.Next == lock.slock.Next ? lock.slock = new : tmp = lock.slock”
若Next未發(fā)生變化,則將lock.slock更新為new, 實質(zhì)上是將slock的Next+1 。
090 行 執(zhí)行sete指令,若cmpxchgw或cmpb成功則將new的最低字節(jié)%b1賦值為1,否則賦值為0. sete的解釋為:
Sets the byte in the operand to 1 if the Zero Flag is set, otherwise sets the operand to 0.
091 行 movzbl( movz from byte to long )指令將%b1賦值給tmp最低字節(jié),且其它位補0.即將tmp置為0或1.
?
SMP 體系架構-SPIN UNLOCK (ticket_shif 8)
/include/linux/spinlock_api_smp.h
046 #ifdef CONFIG_INLINE_SPIN_LOCK
047 #define _raw_spin_lock ( lock ) __raw_spin_lock ( lock )
048 #endif
149 static inline void __raw_spin_unlock ( raw_spinlock_t * lock )
150 {
151 ???????? spin_release (& lock -> dep_map , 1, _RET_IP_ );
152 ???????? do_raw_spin_unlock ( lock );
153 ???????? preempt_enable ();
154 }
spin_unlock即最終調(diào)用do_raw_spin_unlock對自旋鎖進行釋放操作。
/include/linux/spinlock.h
136 static inline void do_raw_spin_lock ( raw_spinlock_t * lock ) __acquires ( lock )
137 {
138 ???????? __acquire ( lock );
139 ???????? arch_spin_lock (& lock -> raw_lock );
140 }
對于x86的IA32平臺,arch_spin_lock實現(xiàn)如下:
/arch/x86/include/asm/spinlock.h
198 static __always_inline void arch_spin_unlock ( arch_spinlock_t * lock )
199 {
200 ???????? __ticket_spin_unlock ( lock );
201 }
058 #if ( NR_CPUS < 256)
059 #define TICKET_SHIFT 8
099 static __always_inline void __ticket_spin_unlock ( arch_spinlock_t * lock )
100 {
101 ???????? asm volatile( UNLOCK_LOCK_PREFIX "incb %0"
102 ????????????????????? : "+m" ( lock -> slock )
103 ????????????????????? :
104 ????????????????????? : "memory", "cc");
105 }
101 行 將lock->slock的Owner + 1,表示可以讓下一個擁有牌號的CPU加鎖。
030 #if defined (CONFIG_X86_32) && \
031 ???????? ( defined (CONFIG_X86_OOSTORE) || defined (CONFIG_X86_PPRO_FENCE))
032 /*
033 ? * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
034 ? * (PPro errata 66, 92)
035 ? */
036 # define UNLOCK_LOCK_PREFIX LOCK_PREFIX
037 #else
038 # define UNLOCK_LOCK_PREFIX
039 #endif
?
參考資料
自旋鎖
《spinlocks.txt》,/Documentation/spinlocks.txt
《Ticket spinlocks》, http://lwn.net/Articles/267968/
《Linux x86 spinlock實現(xiàn)之分析》, http://blog.csdn.net/david_henry/article/details/5405093
《Linux 內(nèi)核 LOCK_PREFIX 的含義》, http://blog.csdn.net/ture010love/article/details/7663008
《The Intel 8086 / 8088/ 80186 / 80286 / 80386 / 80486 Instruction Set》: http://zsmith.co/intel.html
更多文章、技術交流、商務合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯(lián)系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
