在线视频一区二区三区,色综合久久久久久久久久 ,激情综合丁香五月

? （1）建student & student1 表：（hive 托管）
create table student(id INT, age INT, name STRING)
partitioned by(stat_date STRING)
clustered by(id) sorted by(age) into 4 buckets
row format delimited fields terminated by ',';

create table studentrc(id INT, age INT, name STRING)
partitioned by(stat_date STRING)
clustered by(id) sorted by(age) into 4 buckets
row format delimited fields terminated by ',' stored as rcfile;

create table studentlzo(id INT, age INT, name STRING)
partitioned by(stat_date STRING)
clustered by(id) sorted by(age) into 4 buckets
row format delimited fields terminated by ',' stored as rcfile;

文件格式 textfile， sequencefile， rcfile
（2）設置環境變量：
set hive.enforce.bucketing = true;
（3）插入數據：
? LOAD DATA local INPATH '/home/hadoop/hivetest1.txt' OVERWRITE INTO TABLE student partition(stat_date="20120802");

(CPU使用率很高)
from student
insert overwrite table student1 partition(stat_date="20120802")
select id,age,name where stat_date="20120802" sort by age;

查看數據
select id, age, name from student? distribute by id ; // distribute相當于mapreduce中的key

抽選數據(一般測試的情況下使用)
select * from student tablesample(bucket 1 out of 2 on id);
TABLESAMPLE(BUCKET x OUT OF y)
其中, x必須比y小, y必須是在創建表的時候bucket on的數量的因子或者倍數, hive會根據y的大小來決定抽樣多少, 比如原本分了32分, 當y=16時, 抽取32/16=2分, 這時TABLESAMPLE(BUCKET 3 OUT OF 16) 就意味著要抽取第3和第16+3=19分的樣品. 如果y=64，這要抽取 32/64=1/2份數據, 這時TABLESAMPLE(BUCKET 3 OUT OF 64) 意味著抽取第3份數據的一半來進行.

rcfile操作

// 導入(gzip壓縮)
set hive.enforce.bucketing=true;
set hive.exec.compress.output=true; ?
set mapred.output.compress=true; ?
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; ?
set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec; ?
from student
insert overwrite table studentrc partition(stat_date="20120802") ?
select id,age,name where stat_date="20120802" sort by age;

// lzo壓縮
set hive.io.rcfile.record.buffer.size = 16777216; // 16 * 1024 * 1024
set io.file.buffer.size = 131072; // 緩沖區大小 128 * 1024

set hive.enforce.bucketing=true;
set hive.exec.compress.output=true; ?
set mapred.output.compress=true; ?
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec; ?
set io.compression.codecs=com.hadoop.compression.lzo.LzoCodec; ?
from student
insert overwrite table studentlzo partition(stat_date="20120802") ?
select id,age,name where stat_date="20120802" sort by age;

// sequencefile導入
set hive.exec.compress.output=true; ?
set mapred.output.compress=true; ?
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; ?
set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec; ?
insert overwrite table studentseq select * from student;

hive中使用rcfile

更多文章、技術交流、商務合作、聯系博主

微信掃碼或搜索：z360901061

微信掃一掃加我為好友

QQ號聯系： 360901061

您的支持是博主寫作最大的動力，如果您喜歡我的文章，感覺我的文章對您有幫助，請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧，狠狠點擊下面給點支持吧，站長非常感激您！手機微信長按不能支付解決辦法：請將微信支付二維碼保存到相冊，切換到微信，然后點擊微信右上角掃一掃功能，選擇支付二維碼完成支付。

【本文對您有幫助就好】元

2元

5元

10元

20元

自定義

亚洲免费在线-亚洲免费在线播放-亚洲免费在线观看-亚洲免费在线观看视频-亚洲免费在线看-亚洲免费在线视频