參考:
http: // blog.csdn.net/qiaochao911/article/details/8613988
http: //xm-king.iteye.com/blog/1088422
HIVE分區,實際上是通過一個路徑來標識的,而不是在物理數據中。比如每天的數據,可能分區是pt=20121023這樣,那么路徑中它就會變成:/hdfs/path/pt= 20121023 / data_files。通過路徑來標識的好處是,如果我們需要取特定分區的數據,只需要把這個路徑下的數據取出來就可以了,不用掃描全部的數據。 HIVE默認是靜態分區。但是有時候可能需要動態創建不同的分區,比如商品信息,我想根據它是否在線分成兩個分區,這樣后續如果要取在線商品,就只需要從在線的分區獲取即可。動態分區可以通過下面的設置來打開: [sql] view plaincopy set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode= nonstrict; 然后代碼里就可以這么寫: [sql] view plaincopy insert overwrite table tbl_name partition( pt, if_online ) select field1, field2, ..., pt, if_online from tbl where xxx; 注意輸入字段的最后面必須是動態分區字段 。 看一下與靜態分區寫法的區別: [sql] view plaincopy insert overwrite table tbl_name partition( pt =20121023, if_online=1 ) select field1, field2, ..., fieldn from tbl where xxx; 動 態分區與靜態分區還有一個細微的差別是,靜態分區一定會創建分區,不管SELECT語句的結果有沒有數據。而動態分區,只有在SELECT結果的記錄數 > 0的時候,才會創建分區 。因此在不同的業務場景下,可能會選擇不同的方案。 另外使用動態分區時需要注意的比較重要的一點是,動態分區會為每一個分區分配reduce數。比如說你在腳本上面寫了: set mapred.reduce.tasks= 100 ; 并且有兩個分區:pt, if_online。如果結果集中pt = 20121023 ,if_online= 0 / 1 ,那么它就會為pt= 20121023 /if_online= 0 ,pt= 20121023 /if_online= 1各分配100個reduce。也就是說,namenode會同時處理200個文件的寫操作。這在分區值很多的情況下,會成為一個災難,容易直接把namenode給搞掛掉,是非常危險的。因此使用動態分區時,一定要清楚地知道產生的動態分區值,并且合理地設置reduce數量。 hive可以通過partitions將表粗粒度劃分為不同的目錄來提高查詢的效率,例如包含 時間戳的日志文件 ,如果我們按照時間來把日志文件分在不同的目錄下,那么相同日期的記錄就會存儲在同一個分區目錄下面,那我們就可以更高效率地查詢特定某個時間的記錄了 hive (economy) > describe stocks; OK exchange string symbol string ymd string price_open float price_high float price_low float price_close float volumn int price_adj_close float Time taken: 0.108 seconds hive (economy) > alter table partition_stocks add if not exists > partition(exchange= ' NASDAQ ' ,symbol= ' ABXA ' ); OK Time taken: 1.834 seconds hive (economy) > show partitions partition_stocks; OK exchange =NASDAQ/symbol= ABXA Time taken: 0.155 seconds hive (economy) > insert overwrite table partition_stocks partition(exchange= ' NASDAQ ' , > symbol= ' ABXA ' ) > select s.ymd,s.price_open,s.price_high,s.price_low,s.price_close,s.volumn, > s.price_adj_close from stocks s > where s.exchange= ' NASDAQ ' and s.symbol= ' ABXA ' ; Total MapReduce jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there ' s no reduce operator Starting Job = job_201303271617_0021, Tracking URL = http: // localhost:50030/jobdetails.jsp?jobid=job_201303271617_0021 Kill Command = /home/landen/UntarFile/hadoop- 1.0 . 4 /libexec/../bin/hadoop job - kill job_201303271617_0021 Hadoop job information for Stage- 1 : number of mappers: 2 ; number of reducers: 0 2013 - 03 - 31 20 : 50 : 01 , 631 Stage- 1 map = 0 %, reduce = 0 % 2013 - 03 - 31 20 : 50 : 10 , 656 Stage- 1 map = 16 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 11 , 659 Stage- 1 map = 16 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 12 , 662 Stage- 1 map = 16 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 13 , 665 Stage- 1 map = 20 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 14 , 681 Stage- 1 map = 20 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 15 , 684 Stage- 1 map = 20 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 16 , 687 Stage- 1 map = 52 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 17 , 720 Stage- 1 map = 52 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 18 , 723 Stage- 1 map = 52 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 19 , 787 Stage- 1 map = 57 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 20 , 791 Stage- 1 map = 57 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 21 , 797 Stage- 1 map = 57 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 22 , 800 Stage- 1 map = 84 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 23 , 803 Stage- 1 map = 84 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 24 , 807 Stage- 1 map = 84 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 25 , 837 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 26 , 841 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 27 , 844 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 28 , 848 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 29 , 852 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 30 , 884 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 31 , 888 Stage- 1 map = 100 %, reduce = 100 %, Cumulative CPU 21.0 sec MapReduce Total cumulative CPU time: 21 seconds 0 msec Ended Job = job_201303271617_0021 Ended Job = 61034062 , job is filtered out (removed at runtime). Ended Job = 914476855 , job is filtered out (removed at runtime). Launching Job 3 out of 3 Number of reduce tasks is set to 0 since there ' s no reduce operator Starting Job = job_201303271617_0022, Tracking URL = http: // localhost:50030/jobdetails.jsp?jobid=job_201303271617_0022 Kill Command = /home/landen/UntarFile/hadoop- 1.0 . 4 /libexec/../bin/hadoop job - kill job_201303271617_0022 Hadoop job information for Stage- 3 : number of mappers: 1 ; number of reducers: 0 2013 - 03 - 31 20 : 50 : 40 , 727 Stage- 3 map = 0 %, reduce = 0 % 2013 - 03 - 31 20 : 50 : 46 , 740 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 47 , 743 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 48 , 746 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 49 , 749 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 50 , 753 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 51 , 760 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 52 , 763 Stage- 3 map = 100 %, reduce = 100 %, Cumulative CPU 1.05 sec MapReduce Total cumulative CPU time: 1 seconds 50 msec Ended Job = job_201303271617_0022 Loading data to table economy.partition_stocks partition (exchange =NASDAQ, symbol= ABXA) Deleted hdfs: // localhost:9000/home/landen/UntarFile/hive-0.10.0/user/hive/warehouse/economy.db/partition_stocks/exchange=NASDAQ/symbol=ABXA Partition economy.partition_stocks{exchange=NASDAQ, symbol=ABXA} stats: [num_files: 1 , num_rows: 0 , total_size: 67679 , raw_data_size: 0 ] Table economy.partition_stocks stats: [num_partitions: 1 , num_files: 1 , num_rows: 0 , total_size: 67679 , raw_data_size: 0 ] 1608 Rows loaded to partition_stocks MapReduce Jobs Launched: Job 0 : Map: 2 Cumulative CPU: 21.0 sec HDFS Read: 481098497 HDFS Write: 67679 SUCCESS Job 1 : Map: 1 Cumulative CPU: 1.05 sec HDFS Read: 67989 HDFS Write: 67679 SUCCESS Total MapReduce CPU Time Spent: 22 seconds 50 msec OK Time taken: 66.143 seconds hive (economy) > select * from partition_stocks > where exchange = ' NASDAQ ' and symbol = ' ABXA ' ; 2005 - 03 - 16 7.97 7.97 7.91 7.92 300200 7.92 NASDAQ ABXA 2005 - 03 - 15 7.92 7.97 7.92 7.97 42500 7.97 NASDAQ ABXA 2005 - 03 - 14 7.93 7.99 7.92 7.92 109700 7.92 NASDAQ ABXA 2005 - 03 - 11 7.96 7.96 7.91 7.94 188200 7.94 NASDAQ ABXA 2005 - 03 - 10 7.99 8.0 7.93 7.96 1275900 7.96 NASDAQ ABXA 2005 - 03 - 09 8.01 8.02 7.78 8.0 198600 8.0 NASDAQ ABXA 2005 - 03 - 08 8.1 8.1 8.0 8.01 81600 8.01 NASDAQ ABXA 2005 - 03 - 07 8.05 8.11 8.05 8.11 70300 8.11 NASDAQ ABXA 2005 - 03 - 04 8.19 8.19 8.05 8.12 116300 8.12 NASDAQ ABXA 2005 - 03 - 03 8.09 8.2 8.09 8.2 64200 8.2 NASDAQ ABXA 2005 - 03 - 02 8.29 8.45 8.18 8.2 197600 8.2 NASDAQ ABXA 2005 - 03 - 01 8.07 8.35 8.06 8.25 332000 8.25 NASDAQ ABXA 2005 - 02 - 28 8.0 8.09 7.81 8.05 241000 8.05 NASDAQ ABXA 2005 - 02 - 25 7.55 8.3 7.55 8.1 1114100 8.1 NASDAQ ABXA 2005 - 02 - 24 7.01 7.3 7.0 7.3 174100 7.3 NASDAQ ABXA 2005 - 02 - 23 7.22 7.22 6.9 7.06 542100 7.06 NASDAQ ABXA 2005 - 02 - 22 7.56 7.56 7.25 7.26 209000 7.26 NASDAQ ABXA 2005 - 02 - 18 7.7 7.8 7.58 7.58 1525500 7.58 NASDAQ ABXA 2005 - 02 - 17 7.41 7.7 7.41 7.65 75200 7.65 NASDAQ ABXA 2005 - 02 - 16 7.51 7.51 7.38 7.45 1297600 7.45 NASDAQ ABXA 2005 - 02 - 15 7.58 7.61 7.41 7.56 248400 7.56 NASDAQ ABXA 2005 - 02 - 14 7.78 7.81 7.53 7.6 1487200 7.6 NASDAQ ABXA 2005 - 02 - 11 7.92 7.92 7.73 7.88 265400 7.88 NASDAQ ABXA 2005 - 02 - 10 8.03 8.03 7.91 7.93 205100 7.93 NASDAQ ABXA
Now Let's learn about dynamic partitions properties:
1. hive.exec.dynamic.partition = false by default. Set to true to enable dynamic partitioning;
2. hive.exec.dynamic.partition.mode = strict by default. Set to nonstrict to enable all partitions to be determined dynamiclly;
3. hive.exec.max.dynamic.partition.pernode = 100 by default. The maximum number of dynamic partitions that can be created by mapper or reducer. Raises a fatal error if
one mapper or reducer attempts to create more than the threshold;
4. hive.exec.max.dynamic.partitions = +1000 by default. The total number of dynamic partitions that can be created by one statement with dynamic partitioning. Raises a fatal error
if the limit is exceeded;
5. hive.exec.max.created.files = 100000 by default. The maximum total number of files that can be created globally. A Hadoop counter is used to track the number of files created.
Raises a fatal error if the limit is exceeded.
?
更多文章、技術交流、商務合作、聯系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
