SQL Server允許并發(fā)操作,BLOCKING是指在某一操作沒有完成之前,其他操作必須等待,以便于保證數(shù)據(jù)的完整性。BLOCKING的解決方法要查看BLOCKING的頭是什么,為什么BLOCKING頭上的語句執(zhí)行的很慢。通常來講只要我們能找到BLOCKING頭上的語句,我們總能夠想出各種各種的辦法,來提升性能,緩解或解決BLOCKING的問題。
?
但是問題的關(guān)鍵是,我們不知道BLOCKING什么時候會發(fā)生。用戶跟我們抱怨數(shù)據(jù)庫性能很差,等我們連上數(shù)據(jù)庫去查看的時候,那時候有可能BLOCKING可能就已經(jīng)過去了。性能又變好了。或者由于問題的緊急性,我們直接重新啟動服務(wù)器以恢復(fù)運營。但是問題并沒有最終解決,我們不知道下次問題會在什么時候發(fā)生。
?
BLOCKING問題的后果比較嚴重。因為終端用戶能直接體驗到。他們提交一個訂單的時候,無論如何提交不上去,通常幾秒之內(nèi)能完成的一個訂單提交,甚至要等待十幾分鐘,才能提交完成。更有甚者,極嚴重的BLOCKING能導(dǎo)致SQL Server停止工作。如下面的SQL ERRORLOG所表示,?在短短的幾分鐘之內(nèi),SPID數(shù)據(jù)從158增長到694,?并馬上導(dǎo)致SQL Server打了一個dump,?停止工作。我們很容易推斷出問題的原因是由于BLOCKING導(dǎo)致的,但是我們無法得知BLOCKING HEADER是什么,我們必須要等下次問題重現(xiàn)時,輔之以工具,才能得知BLOCKING HEADER在做什么事情。如果信息抓取時機不對,我們可能要等問題發(fā)生好幾次,才能抓到。這時候,客戶和經(jīng)理就會有抱怨了。因為我們的系統(tǒng)是生產(chǎn)系統(tǒng),問題每發(fā)生一次,都會對客戶帶來損失。
?
2011-06-01 16:22:30.98 spid1931??? Alert There are 158 Active database sessions which is too high.
2011-06-01 16:23:31.16 spid3248??? Alert There are 342 Active database sessions which is too high.
2011-06-01 16:24:31.06 spid3884??? Alert There are 517 Active database sessions which is too high.
2011-06-01 16:25:31.08 spid3688??? Alert There are 694 Active database sessions which is too high.
2011-06-01 16:26:50.93 Server????? Using 'dbghelp.dll' version '4.0.5'
2011-06-01 16:26:50.97 Server????? **Dump thread -?spid ?= 0, EC = 0x0000000000000000
2011-06-01 16:26:50.97 Server????? ***Stack Dump being sent to D:\MSSQL10.INSTANCE\MSSQL\LOG\SQLDump0004.txt
2011-06-01 16:26:50.97 Server????? * *******************************************************************************
2011-06-01 16:26:50.97 Server????? *
2011-06-01 16:26:50.97?Server ????? * BEGIN STACK DUMP:
2011-06-01 16:26:50.97 Server????? *?? 06/01/11 16:26:50?spid ?4124
2011-06-01 16:26:50.97 Server????? *
2011-06-01 16:26:50.97 Server????? * Deadlocked Schedulers
2011-06-01 16:26:50.97 Server????? *
2011-06-01 16:26:50.97 Server????? * *******************************************************************************
2011-06-01 16:26:50.97 Server????? * -------------------------------------------------------------------------------
2011-06-01 16:26:50.97 Server????? * Short Stack Dump
2011-06-01 16:26:51.01 Server????? Stack Signature for the dump is 0x0000000000000258
?
BLOCKING的信息抓取有很多種方法。這里羅列了幾種。并且對每種分析它的優(yōu)缺點。以便我們選擇。在枚舉方法之前,我們先簡單演示一下BLOCKING.
?
我們首先創(chuàng)建一個測試表:
DROP ? TABLE ?[TESTTABLE]
GO
?
CREATE ? TABLE ?[dbo] . [TESTTABLE] (
??????[ID] [int]? NULL,
??????[NAME] [nvarchar] ( 50 ) ? NULL
)
GO
?
INSERT ? INTO ?TESTTABLE? VALUES? ( 1 , ? 'aaaa' )
GO
?
然后打開一個查詢窗口,執(zhí)行下面的語句,?該語句修改一行數(shù)據(jù),并等待3分鐘,然后在結(jié)束transaction
BEGIN ? TRANSACTION
UPDATE ?TESTTABLE? SET ?[NAME]? = ? 'bbbb' ? WHERE ?[ID]? = ?1
WAITFOR ?? DELAY ? '00:03:00'
COMMIT ? TRANSACTION
?
這時候,如果打開另外一個查詢窗口,執(zhí)行下面的語句,下面的語句就會被BLOCK住。
UPDATE ?TESTTABLE? SET ?[NAME]? = ? 'cccc' ? WHERE ?[ID]? = ?1
?
?
方法一,?抓取SQL Profiler
======================
SQL Profiler里面包含大量的信息。其中有一個事件在Errors and Warnings->Blocked Process Report專門用來獲得blocking的情況。但是因為信息量比較大,而且我們并不能很好的估算在什么時候會產(chǎn)生blocking,另外在生產(chǎn)環(huán)境使用Profiler,?對性能可能會有影響,所以SQL Profiler并不是最合適的工具。我們在這里并不對它贅述。
?
方法二,?執(zhí)行查詢
================
如果我們檢查問題的時候,blocking還存在,那么,我們可以直接可以運行幾個查詢,得知BLOCKING HEADER的信息
?
SELECT ? * ? FROM ? sys . sysprocesses ? where ?spid > 50
?
上述查詢只是告訴我們,BLOCKING HEADER的頭是SPID=53,?但是并沒有告訴我們SPID=53在做什么事情。我們可以用下面的查詢,得到SPID=53的信息
DBCC ?INPUTBUFFER ( 53 )
?
我們可以把上述的兩個查詢合并起來,用下面的查詢:
?
SELECT ?SPID = p . spid ,
???????DBName? = ? convert ( CHAR ( 20 ), d . name ),
???????ProgramName? = ? program_name ,
???????LoginName? = ? convert ( CHAR ( 20 ), l . name ),
???????HostName? = ? convert ( CHAR ( 20 ), hostname ),
??????? Status ? = ?p . status ,
???????BlockedBy? = ?p . blocked ,
???????LoginTime? = ?login_time ,
???????QUERY? = ? CAST ( TEXT ? AS ? VARCHAR ( MAX ))
FROM ??? MASTER . dbo . sysprocesses ?p
??????? INNER ? JOIN ? MASTER . dbo . sysdatabases ?d
????????? ON ?p . dbid ? = ?d . dbid
??????? INNER ? JOIN ? MASTER . dbo . syslogins ?l
????????? ON ?p . sid ? = ?l . sid
??????? CROSS ? APPLY ? sys . dm_exec_sql_text ( sql_handle )
WHERE ??p . blocked? = ?0
??????? AND ? EXISTS ? ( SELECT ?1
??????????????????? FROM ??? MASTER .. sysprocesses ?p1
??????????????????? WHERE ??p1 . blocked? = ?p . spid )
?
?
這樣,一次執(zhí)行,就能告訴我們BLOCKING header的SPID信息,以及該SPID在做的語句。我們可以進一步研究該語句,以理解為什么該語句執(zhí)行很慢。
?
用這個方法有一個缺點,就是使用的時候,要求BLOCKING是存在的。如果BLOCKING已經(jīng)消失了,那么我們的方法就不管用了。
?
?
方法三,長期執(zhí)行一個BLOCKING SCRIPT
==================================
因為我們通常無法知道BLOCKING什么時候會產(chǎn)生,所以通常的辦法是我們長期運行一個BLOCKING SCRIPT,?這樣,等下次發(fā)生的時候,我們就會有足夠的信息。長期運行BLOCKING SCRIPT對性能基本上是沒有影響的。因為我們每隔10秒鐘抓取一次信息。缺點是,如果問題一個月才發(fā)生一次,那么,我們的BLOCKING日志信息會很大。所以這種方法適用于幾天之內(nèi)能重現(xiàn)問題。
?
運行方法如下:
?
如果要停止運行,我們按ctrl+c就可以了。
BLOCKING的信息存在log.out這個文件中
?
我們可以打開log.out這個文件,?會發(fā)現(xiàn)SPID 54被?SPID 53給Block住了。
?
?
而隨后,我們可以看到SPID=53在做什么事情:
?
?
下面是BLOCKING SCRIPT的腳本,?我們可以把它存為blocking.sql
?
use ? master
go
while ?1? = 1
begin
print ? 'Start time: ' ? + ? convert ( varchar ( 26 ), ? getdate (), ?121 )
Print ? 'Running processes'
select ?spid , ?blocked , ?waittype , ?waittime , ?lastwaittype , ?waitresource , ? dbid , uid , ?cpu , ?physical_io , ?memusage , ?login_time , ?last_batch ,
open_tran , ? status , ?hostname , ? program_name , ?cmd , ?net_library , ?loginame
from ? sysprocesses
--where (kpid <> 0 ) or (spid < 51)
-- Change it if you only want to see the working processes
print ? '*********lockinfor***********'
select ? convert ? ( smallint , ?req_spid ) ? As ?spid ,
rsc_dbid? As ? dbid ,
rsc_objid? As ? ObjId ,
rsc_indid? As ?IndId ,
substring ? ( v . name , ?1 , ?4 ) ? As ? Type ,
substring ? ( rsc_text , ?1 , ?16 ) ? as ? Resource ,
substring ? ( u . name , ?1 , ?8 ) ? As ?Mode ,
substring ? ( x . name , ?1 , ?5 ) ? As ? Status
from ? master . dbo . syslockinfo ,
master . dbo . spt_values v ,
master . dbo . spt_values x ,
master . dbo . spt_values u
where ? master . dbo . syslockinfo . rsc_type? = ?v . number
and ?v . type ? = ? 'LR'
and ? master . dbo . syslockinfo . req_status? = ?x . number
and ?x . type ? = ? 'LS'
and ? master . dbo . syslockinfo . req_mode? + ?1? = ?u . number
and ?u . type ? = ? 'L'
order ? by ?spid
print ? 'inputbuffer for running processes'
declare ?@spid? varchar ( 6 )
declare ?ibuffer? cursor ? fast_forward ? for
select ? cast ? ( spid? as ? varchar ( 6 )) ? as ?spid? from ? sysprocesses ? where ?spid? > 50
open ?ibuffer
fetch ? next ? from ?ibuffer? into ?@spid
while? ( @@fetch_status ? != ? - 1 )
begin
print ? ''
print ? 'DBCC INPUTBUFFER FOR SPID ' ? + ?@spid
exec? ( 'dbcc inputbuffer (' ? + ?@spid? + ? ')' )
fetch ? next ? from ?ibuffer? into ?@spid
end
deallocate ?ibuffer
waitfor ? delay ? '0:0:10'
End
?
這種方法的缺陷就是,log.out會比較巨大,會占用很大的空間,如果blocking一個月甚至更長時間才發(fā)生一次,那我們的這個方法就不太適宜。
?
?
方法四,我們用Agent Job來檢查BLOCKING
=====================================
長期運行一個BLOCKING SCRIPT的缺點是我們每隔一段時間,去查詢信息,但是大多數(shù)收集的信息是無用的。所以會導(dǎo)致日志文件巨大,對于一個生產(chǎn)系統(tǒng)來講,磁盤空間滿可不是個好事情,另外,有一些客戶對于用命令行來長期運行TSQL腳本有所顧忌,所以我們做了一個改進。這次,我們只收集有用的信息。對于無用的信息我們不關(guān)注。這樣能極大減少日志大小。
?
我們首先創(chuàng)建一個觀察數(shù)據(jù)庫,然后建立兩張表格?Blocking_sysprocesses和Blocking_SQLText,?建立一個存儲過程和一個Job,?該Job每隔一段時間去調(diào)用存儲過程。只有發(fā)現(xiàn)有blocking的,我們才記錄到表格Blocking_sysprocesses和Blocking_SQLText這兩個表格中。如果跟blocking無關(guān),我們就不對它進行記錄。下面是TSQL語句:
?
?
CREATE ? DATABASE ?[MonitorBlocking]
GO
?
USE ?[MonitorBlocking]
GO
?
CREATE ? TABLE ?Blocking_sysprocesses (
??????[spid]? smallint ,
??????[kpid]? smallint ,
??????[blocked]? smallint ,
??????[waitType]? binary ( 2 ),
??????[waitTime]? bigInt ,
??????[lastWaitType]? nchar ( 32 ),
??????[waitResource]? nchar ( 256 ),
??????[dbID]? smallint ,
??????[uid]? smallint ,
??????[cpu]? int ,
??????[physical_IO]? int ,
??????[memusage]? int ,
??????[login_Time]? datetime ,
??????[last_Batch]? datetime ,
??????[open_Tran]? smallint ,
??????[status]? nchar ( 30 ),
??????[sid]? binary ( 86 ),
??????[hostName]? nchar ( 128 ),
??????[program_Name]? nchar ( 128 ),
??????[hostProcess]? nchar ( 10 ),
??????[cmd]? nchar ( 16 ),
??????[nt_Domain]? nchar ( 128 ),
??????[nt_UserName]? nchar ( 128 ),
??????[net_Library]? nchar ( 12 ),
??????[loginName]? nchar ( 128 ),
??????[context_Info]? binary ( 128 ),
??????[sqlHandle]? binary ( 20 ),
??????[CapturedTimeStamp]? datetime
)
GO
CREATE ? TABLE ?[dbo] . [Blocking_SqlText] (
??????[spid] [smallint] ,
??????[sql_text] [nvarchar] ( 2000 ),
??????[Capture_Timestamp] [datetime]
)
GO
?
CREATE ? PROCEDURE ?[dbo] . [checkBlocking]
AS
BEGIN
?
SET ? NOCOUNT ? ON ;
SET ? TRANSACTION ? ISOLATION ? LEVEL ? READ ? UNCOMMITTED
?
declare ?@Duration??? int ? -- in milliseconds, 1000 = 1 sec
declare ?@now???????????? datetime
declare ?@Processes?? int
?
select ??@Duration? = ?100?? -- in milliseconds, 1000 = 1 sec
select ??@Processes? = ?0
?
select ?@now? = ? getdate ()
?
CREATE ? TABLE ?#Blocks_rg (
??????[spid]? smallint ,
??????[kpid]? smallint ,
??????[blocked]? smallint ,
??????[waitType]? binary ( 2 ),
??????[waitTime]? bigInt ,
??????[lastWaitType]? nchar ( 32 ),
??????[waitResource]? nchar ( 256 ),
??????[dbID]? smallint ,
??????[uid]? smallint ,
??????[cpu]? int ,
??????[physical_IO]? int ,
??????[memusage]? int ,
??????[login_Time]? datetime ,
??????[last_Batch]? datetime ,
??????[open_Tran]? smallint ,
??????[status]? nchar ( 30 ),
??????[sid]? binary ( 86 ),
??????[hostName]? nchar ( 128 ),
??????[program_Name]? nchar ( 128 ),
??????[hostProcess]? nchar ( 10 ),
??????[cmd]? nchar ( 16 ),
??????[nt_Domain]? nchar ( 128 ),
??????[nt_UserName]? nchar ( 128 ),
??????[net_Library]? nchar ( 12 ),
??????[loginName]? nchar ( 128 ),
??????[context_Info]? binary ( 128 ),
??????[sqlHandle]? binary ( 20 ),
??????[CapturedTimeStamp]? datetime
) ????
?????
INSERT ? INTO ?#Blocks_rg?
SELECT
??????[spid] ,
??????[kpid] ,
??????[blocked] ,
??????[waitType] ,
??????[waitTime] ,
??????[lastWaitType] ,
??????[waitResource] ,
??????[dbID] ,
??????[uid] ,
??????[cpu] ,
??????[physical_IO] ,
??????[memusage] ,
??????[login_Time] ,
??????[last_Batch] ,
??????[open_Tran] ,
??????[status] ,
??????[sid] ,
??????[hostName] ,
??????[program_name] ,
??????[hostProcess] ,
??????[cmd] ,
??????[nt_Domain] ,
??????[nt_UserName] ,
??????[net_Library] ,
??????[loginame] ,
??????[context_Info] ,
??????[sql_Handle] ,
??????@now? as ?[Capture_Timestamp]
FROM ? master. . sysprocesses ? where ?blocked? <> ?0
AND ?waitTime? > ?@Duration?????
?????
SET ?@Processes? = ? @@rowcount
?
INSERT ? into ?#Blocks_rg
SELECT
?
??????src . [spid] ,
??????src . [kpid] ,
??????src . [blocked] ,
??????src . [waitType] ,
??????src . [waitTime] ,
??????src . [lastWaitType] ,
??????src . [waitResource] ,
??????src . [dbID] ,
??????src . [uid] ,
??????src . [cpu] ,
??????src . [physical_IO] ,
??????src . [memusage] ,
??????src . [login_Time] ,
??????src . [last_Batch] ,
??????src . [open_Tran] ,
??????src . [status] ,
??????src . [sid] ,
??????src . [hostName] ,
??????src . [program_name] ,
??????src . [hostProcess] ,
??????src . [cmd] ,
??????src . [nt_Domain] ,
??????src . [nt_UserName] ,
??????src . [net_Library] ,
??????src . [loginame] ,
??????src . [context_Info] ,
??????src . [sql_Handle]
?????? , @now? as ?[Capture_Timestamp]
FROM ?? master. . sysprocesses ?src? inner ? join ?#Blocks_rg trgt
??????? on ?trgt . blocked? = ?src . [spid]
?
if ?@Processes? > ?0
BEGIN
?????? INSERT ?[dbo] . [Blocking_sysprocesses]
?????? SELECT ? * ? from ?#Blocks_rg
?????
DECLARE ?@SQL_Handle? binary ( 20 ), ?@SPID? smallInt ;
DECLARE ?cur_handle? CURSOR ? FOR ? SELECT ?sqlHandle , ?spid? FROM ?#Blocks_rg ;
OPEN ?cur_Handle
FETCH ? NEXT ? FROM ?cur_handle? INTO ?@SQL_Handle , ?@SPID
WHILE? ( @@FETCH_STATUS ? = ?0 )
BEGIN
?
INSERT ?[dbo] . [Blocking_SqlText]
SELECT ??????@SPID , ? CONVERT ( nvarchar ( 4000 ), ?[text] ) ? , @now? as ?[Capture_Timestamp] from ? :: fn_get_sql ( @SQL_Handle )
?
FETCH ? NEXT ? FROM ?cur_handle? INTO ?@SQL_Handle , ?@SPID
END
CLOSE ?cur_Handle
DEALLOCATE ?cur_Handle
?
END
?
DROP ? table ?#Blocks_rg
?
END
?
GO
?
?
?
?
USE ?msdb ;
GO
?
EXEC ?dbo . sp_add_job
?????? @job_name? = ? N'MonitorBlocking' ;
GO
?
EXEC ? sp_add_jobstep
?????? @job_name? = ? N'MonitorBlocking' ,
??????@step_name? = ? N'execute blocking script' , ?
??????@subsystem? = ? N'TSQL' ,
??????@command? = ? N'exec checkBlocking' ,
@database_name = N'MonitorBlocking';
GO ???
?
EXEC ? sp_add_jobSchedule
?????? @name? = ? N'ScheduleBlockingCheck' ,
??????@job_name? = ? N'MonitorBlocking' ,
??????@freq_type? = ?4 , ? -- daily
??????@freq_interval? = ?1 ,
??????@freq_subday_type? = ?4 ,
??????@freq_subday_interval? = ?1
?
EXEC ? sp_add_jobserver ? @job_name? = ? N'MonitorBlocking' , ?@server_name? = ? N'(local)'
?
當Blocking發(fā)生一段時間后,我們可以查詢下面的兩個表格,以得知當時問題發(fā)生時的blocking信息:
?
?
use ?MonitorBlocking
GO ???
SELECT ? * ? from ?Blocking_sqlText
SELECT ? * ? FROM ?Blocking_sysprocesses
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯(lián)系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
