ORACLE在數(shù)據(jù)倉(cāng)庫(kù)的應(yīng)用上,有很多針對(duì)分析數(shù)據(jù)和報(bào)表查詢的函數(shù)。分類如下:
?
?
Type Used For
Ranking |
Calculating ranks, percentiles, and n-tiles of the values in a result set. |
Windowing |
Calculating cumulative and moving aggregates.
Works with these functions:
and new statistical functions. Note that the
not supported in windowing functions except for
|
Reporting |
Calculating shares, for example, market share.
Works with these functions:
Note that the
functions that support
|
|
Finding a value in a row a specified number of rows from a current row. |
|
First or last value in an ordered group. |
Linear Regression |
Calculating linear regression and other statistics (slope, intercept, and so on). |
Inverse Percentile |
The value in a data set that corresponds to a specified percentile. |
Hypothetical Rank and Distribution |
The rank or percentile that a row would have if inserted into a specified data set. |
??
????其中 Windowing analytic function 的主要功能是 Calculating cumulative and moving aggregates,即累計(jì)計(jì)算求和。
Reporting 只是省略了WIDNOW ,即對(duì)分組后的全部數(shù)據(jù)求和,沒有累加效果,很適合對(duì)某個(gè)地區(qū)銷售百分額的查詢。
?
分析函數(shù)的語(yǔ)法格式如下:
?
analytic_function ::=
?
?
analytic_clause ::=
?
?
?
?
Processing order:
?
?
?
?
首先,SQL語(yǔ)句中JOINS,WHERE,GROUP BY HAVING Clause先執(zhí)行,然后為分析函數(shù)生成一個(gè)結(jié)果集,分析函數(shù)在此基礎(chǔ)上進(jìn)行計(jì)算,
如果有最后ORDER BY Clause,再進(jìn)行排序(分析函數(shù)內(nèi)部如有ORDER BY子句是在計(jì)算之前執(zhí)行的)。
?
Result set partitions:
?
對(duì)JOINS,WHERE,GROUP BY HAVING 字句后生成的結(jié)果集分組,如省略次子句,默認(rèn)對(duì)partition by 所有列
?
Window(重點(diǎn)):
?partition by后的每一行,都擁有一個(gè)視窗。可以把每一行想象成一個(gè)天真活潑的小男孩,視窗就是他們手中拿著的可以自動(dòng)伸長(zhǎng)縮短的魔法棍。
一個(gè)結(jié)果集就如同所有的小男孩排成一個(gè)長(zhǎng)隊(duì)。然后每個(gè)小男孩拿著魔法棍,往前或者往后或者前后同時(shí),想打誰(shuí)就打誰(shuí),只要輸入口令。
如
UNBOUNDED
PRECEDING,就是從最前面到自己,UNBOUNDED
FOLLOWING,就是從自己到最后面。
BETWEEN '起點(diǎn)' and '終點(diǎn)',就是以自己為中心從起點(diǎn)到終點(diǎn)的范圍。
但是魔法棒有個(gè)缺點(diǎn),只能范圍攻擊,不能單獨(dú)擊中某一個(gè)自己相鄰的除外。
打比方打到這里,
其實(shí)魔法棒不是最佳比喻,還是WINDOW視窗比較合適,正好是范圍的滑動(dòng)。
?
Current row:
就是當(dāng)前行啦,它作為視窗的參考點(diǎn),并包括在整個(gè)視窗的計(jì)算范圍內(nèi)。
?
詳細(xì)語(yǔ)法如下:
analytic_function([ arguments ])
?? OVER (analytic_clause)
where analytic_clause =
???? [ query_partition_clause ]
???? [ order_by_clause [ windowing_clause ] ]
and query_partition_clause =
??? PARTITION BY
????? { value_expr[, value_expr ]...
????? | ( value_expr[, value_expr ]... )
????? }
and windowing_clause =
???? { ROWS | RANGE }
???? { BETWEEN
?????? { UNBOUNDED PRECEDING
???? | CURRENT ROW
???? | value_expr { PRECEDING | FOLLOWING }
???? }
???? AND
???? { UNBOUNDED FOLLOWING
???? | CURRENT ROW
???? | value_expr { PRECEDING | FOLLOWING }
???? }
?? | { UNBOUNDED PRECEDING
???? | CURRENT ROW
???? | value_expr PRECEDING
???? }
?? }
RANGE:
定義為L(zhǎng)ogical Offset,即前后相隔可以通過(guò)具體的邏輯表達(dá)式表示,
如 range INTERVAL '1' DAY AND INTERVAL '2' DAY?就是當(dāng)前
行的ORDER BY 字段的時(shí)間前一天和后兩天的范圍。
用RANG的時(shí)候,RANG 后面可是NUMERIC INTERVAL和表達(dá)式 ORDER BY字段的類型必須和RANG后的類型一樣。
并且用RANG的時(shí)候,只能ORDER BY一個(gè)字段。
ROWS:
定義為Physical Offsets,即前后相隔只能是具體的行,如rows between 2 and 2就是當(dāng)前行的前兩行和后兩行。可以O(shè)RDER BY多個(gè)字段。
?
?? WINDOWS 子句只能在ORDER BY子句后出現(xiàn),如果不寫則默認(rèn)為 RANGE
BETWEEN
UNBOUNDED
PRECEDING
AND
CURRENT
ROW.
即WINDOWS都是基于分析子句中的OREDR BY,無(wú)ORDER BY的情況下,功能相當(dāng)于REPORTING FUNCTION。
分析函數(shù)只能出現(xiàn)在SELECT子句和ORDER BY子句中。
(這里插一句,跟本主題不相關(guān),只是突然想把在想的寫下來(lái),不管對(duì)不對(duì)。ORACLE在SQL執(zhí)行計(jì)劃中SELECT都放在最后,
在解析語(yǔ)句的時(shí)候,就解析到了分析函數(shù),當(dāng)執(zhí)行GROUP BY等后,
就會(huì)在RESULT SETS中根據(jù)解析到的分析函數(shù)語(yǔ)法,
對(duì)排序后的每一行數(shù)據(jù)進(jìn)行計(jì)算,最后在FETCH所有的行顯示出來(lái)。GROUP BY 后的統(tǒng)計(jì)函數(shù)也是,如解析出SUM后,
對(duì)分組后的結(jié)果集中每個(gè)組中的每一行數(shù)據(jù)依次FENTCH相加,如果在SUM中寫CASE WHEN就
能根據(jù)條件選擇每一行的加減乘除。
但GROUP 帶不出所有的行,而分析函數(shù)是基于每一行的。
哎,有些只能意會(huì)不能言傳,主要還是沒有實(shí)際的搞懂,要是能看到源代碼就好了:)
)
?
下面舉具體的例子說(shuō)明。
Windowing Functions with Logical Offset:
1.
?
SELECT t.time_id, TO_CHAR (SUM(amount_sold), '9,999,999,999') AS SALES, TO_CHAR(AVG(SUM(amount_sold)) OVER (ORDER BY t.time_id RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY FOLLOWING), '9,999,999,999') AS CENTERED_3_DAY_AVG FROM sales s, times t WHERE s.time_id=t.time_id AND t. calendar_week_number IN (51) AND calendar_year=1999 GROUP BY t.time_id ORDER BY t.time_id;
?
TIME_ID????????? SALES????????? CENTERED_3_DAY
---------?????????? -------------?????????--------------
20-DEC-99??????? 134,337??????? 106,676
21-DEC-99???????? 79,015??????? 102,539
22-DEC-99???????? 94,264???????? 85,342
23-DEC-99???????? 82,746???????? 93,322
24-DEC-99??????? 102,957???????? 82,937
25-DEC-99???????? 63,107???????? 87,062
26-DEC-99???????? 95,123???????? 79,115
每一行結(jié)果中帶出 當(dāng)前行的TIME_ID前后相隔一天的的SALES的值的平均值。第一行前面沒有行,平均值為后一行加當(dāng)前行除以2.當(dāng)找不到行的時(shí)候不會(huì)把那一行算入到計(jì)算內(nèi)。RANG后為INTERVAL,GROUP BY的字段必須為DATE類型。
2.
SELECT t_timekey, AVG(stock_price)
OVER (ORDER BY t_timekey RANGE
fn(t_timekey)
PRECEDING) av_price
FROM stock, time WHERE st_timekey = t_timekey
ORDER BY t_timekey;
??? RANG后可跟表達(dá)式,表達(dá)式可以為函數(shù)。
3.
SELECT time_id, daily_sum, SUM(daily_sum) OVER (ORDER BY time_id RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) AS current_group_sum FROM (SELECT time_id, channel_id, SUM(s.quantity_sold) AS daily_sum FROM customers c, sales s, countries WHERE c.cust_id=s.cust_id AND c.country_id = countries.country_id AND s.cust_id IN (638, 634, 753, 440 ) AND s.time_id BETWEEN '01-MAY-00' AND '13-MAY-00' GROUP BY time_id, channel_id);
?
TIME_ID??? DAILY_SUM CURRENT_GROUP_SUM
---------????????? ----------?????????? -----------------
06-MAY-00????????? 7???????????????? 7?????????????????????????????/* 7 */
10-MAY-00????????? 1???????????????? 9????????????????????????????? /* 7 + (1+1) */
10-MAY-00????????? 1???????????????? 9???????????????????????????????* 7 + (1+1) */
11-MAY-00????????? 2??????????????? 15???????????????????????????? /* 7 + (1+1) + (2+4) */
11-MAY-00????????? 4??????????????? 15???????????????????????????? /* 7 + (1+1) + (2+4) */
12-MAY-00????????? 1??????????????? 16?? ????????????????????????? /* 7 + (1+1) + (2+4) + 1 */
13-MAY-00????????? 2??????????????? 23???????????????????????????? /* 7 + (1+1) + (2+4) + 1 + (5+2) */
13-MAY-00????????? 5??????????????? 23???????????????????????????? /* 7 + (1+1) + (2+4) + 1 + (5+2) */
?
?
因?yàn)槭荓OGIC OFFCET,當(dāng)ORDER BY 字段中有相同值時(shí),會(huì)把相同字段的分析函數(shù)中要分析的值相加后在根據(jù)WINDOW值累計(jì)。
注意這個(gè)特性。要想不要這種效果,也就是無(wú)論ORDER BY字段是否有相同的值,都按照WINDOW值相加就要用ROWS了。
?
Windowing Aggregate Functions With Physical Offsets
1.
SELECT t.time_id, TO_CHAR(amount_sold, '9,999,999,999') AS INDIV_SALE, TO_CHAR(SUM(amount_sold) OVER (PARTITION BY t.time_id ORDER BY t.time_id ROWS UNBOUNDED PRECEDING), '9,999,999,999') AS CUM_SALES FROM sales s, times t, customers c WHERE s.time_id=t.time_id AND s.cust_id=c.cust_id AND t.time_id IN (TO_DATE('11-DEC-1999'), TO_DATE('12-DEC-1999')) AND c.cust_id BETWEEN 6500 AND 6600 ORDER BY t.time_id;
?
TIME_ID??????? INDIV_SALE??? CUM_SALES
---------???????????? ----------??? ---------
12-DEC-99???????? 23?????????? 23
12-DEC-99????????? 9?????????? 32
12-DEC-99???????? 14?????????? 46
12-DEC-99???????? 24?????????? 70
12-DEC-99???????? 19?????????? 89
?
ROWS UNBOUNDED PRECEDING 表示從第一行到當(dāng)前行的視窗范圍,即使ORDER BY字段的值一樣,也實(shí)現(xiàn)了第一行到當(dāng)前行的求和。
?
2.
SELECT c.cust_id, t.calendar_month_desc, TO_CHAR (SUM(amount_sold), '9,999,999,999') AS SALES, TO_CHAR(AVG(SUM(amount_sold)) OVER (ORDER BY c.cust_id, t.calendar_month_desc ROWS 2 PRECEDING), '9,999,999,999') AS MOVING_3_MONTH_AVG FROM sales s, times t, customers c WHERE s.time_id=t.time_id AND s.cust_id=c.cust_id AND t.calendar_year=1999 AND c.cust_id IN (6510) GROUP BY c.cust_id, t.calendar_month_desc ORDER BY c.cust_id, t.calendar_month_desc;
?
CUST_ID CALENDAR?????? SALES????????? MOVING_3_MONTH
----------??? --------????????????-------------- --------------
????? 6510 1999-04???????????? 125??????????? 125
????? 6510 1999-05?????????? 3,395????????? 1,760
????? 6510 1999-06?????????? 4,080????????? 2,533
????? 6510 1999-07?????????? 6,435????????? 4,637
????? 6510 1999-08?????????? 5,105????????? 5,207
????? 6510 1999-09?????????? 4,676????????? 5,405
????? 6510 1999-10?????????? 5,109????????? 4,963
????? 6510 1999-11???????????? 802????????? 3,529
前兩行到當(dāng)前行求AVG,注意,如果行沒有找到,將不計(jì)在內(nèi)。如第二行的AVG為(第一行+第二行的SALES值)/2,
第三行則為(第一行+第二行+第三行的SALES值)/3
?
如果不寫ORDER BY子句,便沒有視窗概念,功能為REPORTING FUNCTION
?
SELECT prod_category, country_region, sales FROM (SELECT SUBSTR(p.prod_category,1,8) AS prod_category, co.country_region, SUM(amount_sold) AS sales, MAX(SUM(amount_sold)) OVER (PARTITION BY prod_category) AS MAX_REG_SALES FROM sales s, customers c, countries co, products p WHERE s.cust_id=c.cust_id AND c.country_id=co.country_id AND s.prod_id =p.prod_id AND s.time_id = TO_DATE('11-OCT-2001') GROUP BY prod_category, country_region) WHERE sales = MAX_REG_SALES;
?
???查詢結(jié)果
PROD_CAT COUNTRY_REGION SALES MAX_REG_SALES -------- -------------------- ---------- ------------- Electron Americas 581.92 581.92 Hardware Americas 925.93 925.93 Peripher Americas 3084.48 4290.38 Peripher Asia 2616.51 4290.38 Peripher Europe 4290.38 4290.38 Peripher Oceania 940.43 4290.38 Software Americas 4445.7 4445.7 Software Asia 1408.19 4445.7 Software Europe 3288.83 4445.7 Software Oceania 890.25 4445.7
??
?
參考資料ORACLE document《Data Warehousing Guide 》,《SQL Language Reference 》
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號(hào)聯(lián)系: 360901061
您的支持是博主寫作最大的動(dòng)力,如果您喜歡我的文章,感覺我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對(duì)您有幫助就好】元
