亚洲免费在线-亚洲免费在线播放-亚洲免费在线观看-亚洲免费在线观看视频-亚洲免费在线看-亚洲免费在线视频

Hive安裝及使用攻略

系統(tǒng) 2569 0

?

hadoop-hive

?

前言

?

Hive是Hadoop一個程序接口,Hive讓數(shù)據(jù)分析人員快速上手,Hive使用了類SQL的語法,Hive讓JAVA的世界變得簡單而輕巧,Hive讓Hadoop普及到了程序員以外的人。

從Hive開始,讓分析師們也能玩轉(zhuǎn)大數(shù)據(jù)。

?

1. Hive介紹

Hive是建立在Hadoop上的數(shù)據(jù)倉庫基礎(chǔ)構(gòu)架。它提供了一系列的工具,可以用來進行數(shù)據(jù)提取轉(zhuǎn)化加載(ETL),這是一種可以存儲、查詢和分析存儲在 Hadoop 中的大規(guī)模數(shù)據(jù)的機制。Hive 定義了簡單的類 SQL 查詢語言,稱為 HQL,它允許熟悉 SQL 的用戶查詢數(shù)據(jù)。同時,這個語言也允許熟悉 MapReduce 開發(fā)者的開發(fā)自定義的 mapper 和 reducer 來處理內(nèi)建的 mapper 和 reducer 無法完成的復(fù)雜的分析工作。

Hive 沒有專門的數(shù)據(jù)格式。 Hive 可以很好的工作在 Thrift 之上,控制分隔符,也允許用戶指定數(shù)據(jù)格式

上面內(nèi)容摘自 百度百科(http://baike.baidu.com/view/699292.htm)

hive與關(guān)系數(shù)據(jù)庫的區(qū)別:

  • 數(shù)據(jù)存儲不同:hive基于hadoop的HDFS,關(guān)系數(shù)據(jù)庫則基于本地文件系統(tǒng)
  • 計算模型不同:hive基于hadoop的mapreduce,關(guān)系數(shù)據(jù)庫則基于索引的內(nèi)存計算模型
  • 應(yīng)用場景不同:hive是OLAP數(shù)據(jù)倉庫系統(tǒng)提供海量數(shù)據(jù)查詢的,實時性很差;關(guān)系數(shù)據(jù)庫是OLTP事務(wù)系統(tǒng),為實時查詢業(yè)務(wù)服務(wù)
  • 擴展性不同:hive基于hadoop很容易通過分布式增加存儲能力和計算能力,關(guān)系數(shù)據(jù)庫水平擴展很難,要不斷增加單機的性能

2. Hive安裝

Hive是基于Hadoop開發(fā)的數(shù)據(jù)倉庫產(chǎn)品,所以首先我們要先有Hadoop的環(huán)境。

rhive

?

目錄

  1. Hive的安裝
  2. Hive的基本使用:CRUD
  3. Hive交互式模式
  4. 數(shù)據(jù)導(dǎo)入
  5. 數(shù)據(jù)導(dǎo)出
  6. Hive查詢HiveQL
  7. Hive視圖
  8. Hive分區(qū)表

?

1. Hive的安裝

系統(tǒng)環(huán)境
裝好hadoop的環(huán)境后,我們可以把Hive裝在namenode機器上(c1)。

下載: hive-0.9.0.tar.gz
解壓到: /home/cos/toolkit/hive-0.9.0

hive配置

      
        
~ cd /home/cos/toolkit/hive-0.9.0
~ cp hive-default.xml.template hive-site.xml
~ cp hive-log4j.properties.template hive-log4j.properties

      
    

修改hive-site.xml配置文件
把Hive的元數(shù)據(jù)存儲到MySQL中

      
        
~ vi conf/hive-site.xml

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://c1:3306/hive_metadata?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>

      
    

修改hive-log4j.properties

      
        #log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
      
    

設(shè)置環(huán)境變量

      
        
~ sudo vi /etc/environment

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/cos/toolkit/ant184/bin:/home/cos/toolkit/jdk16/bin:/home/cos/toolkit/maven3/bin:/home/cos/toolkit/hadoop-1.0.3/bin:/home/cos/toolkit/hive-0.9.0/bin"

JAVA_HOME=/home/cos/toolkit/jdk16
ANT_HOME=/home/cos/toolkit/ant184
MAVEN_HOME=/home/cos/toolkit/maven3

HADOOP_HOME=/home/cos/toolkit/hadoop-1.0.3
HIVE_HOME=/home/cos/toolkit/hive-0.9.0

CLASSPATH=/home/cos/toolkit/jdk16/lib/dt.jar:/home/cos/toolkit/jdk16/lib/tools.jar

      
    

在hdfs上面,創(chuàng)建目錄

      
        
$HADOOP_HOME/bin/hadoop fs -mkidr /tmp
$HADOOP_HOME/bin/hadoop fs -mkidr /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

      
    

在MySQL中創(chuàng)建數(shù)據(jù)庫

      
        
create database hive_metadata;
grant all on hive_metadata.* to hive@'%' identified by 'hive';
grant all on hive_metadata.* to hive@localhost identified by 'hive';
ALTER DATABASE hive_metadata CHARACTER SET latin1;

      
    

手動上傳mysql的jdbc庫到hive/lib

      
        
~ ls /home/cos/toolkit/hive-0.9.0/lib
mysql-connector-java-5.1.22-bin.jar

      
    

啟動hive

      
        
#啟動metastore服務(wù)
~ bin/hive --service metastore &
Starting Hive Metastore Server

#啟動hiveserver服務(wù)
~ bin/hive --service hiveserver &
Starting Hive Thrift Server

#啟動hive客戶端
~ bin/hive shell
Logging initialized using configuration in file:/root/hive-0.9.0/conf/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201211141845_1864939641.txt

hive> show tables
OK

      
    

查詢MySQL數(shù)據(jù)庫中的元數(shù)據(jù)

      
        
~ mysql -uroot -p
mysql> use hive_metadata;
Database changed

mysql> show tables;
+-------------------------+
| Tables_in_hive_metadata |
+-------------------------+
| BUCKETING_COLS          |
| CDS                     |
| COLUMNS_V2              |
| DATABASE_PARAMS         |
| DBS                     |
| IDXS                    |
| INDEX_PARAMS            |
| PARTITIONS              |
| PARTITION_KEYS          |
| PARTITION_KEY_VALS      |
| PARTITION_PARAMS        |
| PART_COL_PRIVS          |
| PART_PRIVS              |
| SDS                     |
| SD_PARAMS               |
| SEQUENCE_TABLE          |
| SERDES                  |
| SERDE_PARAMS            |
| SORT_COLS               |
| TABLE_PARAMS            |
| TBLS                    |
| TBL_COL_PRIVS           |
| TBL_PRIVS               |
+-------------------------+
23 rows in set (0.00 sec)

      
    

Hive已經(jīng)成功安裝,下面是hive的使用攻略。

2. Hive的基本使用

1. 進入hive控制臺

      
        
~ cd /home/cos/toolkit/hive-0.9.0

~ bin/hive shell
Logging initialized using configuration in file:/home/cos/toolkit/hive-0.9.0/conf/hive-log4j.properties
Hive history file=/tmp/cos/hive_job_log_cos_201307160003_95040367.txt
hive>

      
    

新建表

      
        
#創(chuàng)建數(shù)據(jù)(文本以tab分隔)
~ vi /home/cos/demo/t_hive.txt

16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34

#創(chuàng)建新表
hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.489 seconds

#導(dǎo)入數(shù)據(jù)t_hive.txt到t_hive表
hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive ;
Copying data from file:/home/cos/demo/t_hive.txt
Copying file: file:/home/cos/demo/t_hive.txt
Loading data to table default.t_hive
Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive
OK
Time taken: 0.397 seconds

      
    

查看表和數(shù)據(jù)

      
        
#查看表 
hive> show tables;
OK
t_hive
Time taken: 0.099 seconds

#正則匹配表名
hive>show tables '*t*';
OK
t_hive
Time taken: 0.065 seconds

#查看表數(shù)據(jù)
hive> select * from t_hive;
OK
16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34
Time taken: 0.264 seconds

#查看表結(jié)構(gòu)
hive> desc t_hive;
OK
a       int
b       int
c       int
Time taken: 0.1 seconds

      
    

修改表

      
        
#增加一個字段
hive> ALTER TABLE t_hive ADD COLUMNS (new_col String);
OK
Time taken: 0.186 seconds
hive> desc t_hive;
OK
a       int
b       int
c       int
new_col string
Time taken: 0.086 seconds

#重命令表名
~ ALTER TABLE t_hive RENAME TO t_hadoop;
OK
Time taken: 0.45 seconds
hive> show tables;
OK
t_hadoop
Time taken: 0.07 seconds

      
    

刪除表

      
        
hive> DROP TABLE t_hadoop;
OK
Time taken: 0.767 seconds

hive> show tables;
OK
Time taken: 0.064 seconds

      
    

3. Hive交互式模式

  • quit,exit: ?退出交互式shell
  • reset: 重置配置為默認值
  • set <key>=<value> : 修改特定變量的值(如果變量名拼寫錯誤,不會報錯)
  • set?:? 輸出用戶覆蓋的hive配置變量
  • set -v : 輸出所有Hadoop和Hive的配置變量
  • add FILE[S] *,?add JAR[S] *,?add ARCHIVE[S] * : 添加 一個或多個 file, jar, archives到分布式緩存
  • list FILE[S],?list JAR[S],?list ARCHIVE[S] : 輸出已經(jīng)添加到分布式緩存的資源。
  • list FILE[S] *,?list JAR[S] *,list ARCHIVE[S] * : 檢查給定的資源是否添加到分布式緩存
  • delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 從分布式緩存刪除指定的資源
  • ! <command> :??從Hive shell執(zhí)行一個shell命令
  • dfs <dfs command> : ?從Hive shell執(zhí)行一個dfs命令
  • <query string> : 執(zhí)行一個Hive 查詢,然后輸出結(jié)果到標準輸出
  • source FILE <filepath>: ?在CLI里執(zhí)行一個hive腳本文件

4. 數(shù)據(jù)導(dǎo)入

還以剛才的t_hive為例。

      
        #創(chuàng)建表結(jié)構(gòu)
hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

      
    

從操作本地文件系統(tǒng)加載數(shù)據(jù)(LOCAL)

      
        
hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive ;
Copying data from file:/home/cos/demo/t_hive.txt
Copying file: file:/home/cos/demo/t_hive.txt
Loading data to table default.t_hive
Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive
OK
Time taken: 0.612 seconds

#在HDFS中查找剛剛導(dǎo)入的數(shù)據(jù)
~ hadoop fs -cat /user/hive/warehouse/t_hive/t_hive.txt

16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34

      
    

從HDFS加載數(shù)據(jù)

      
        
創(chuàng)建表t_hive2
hive> CREATE TABLE t_hive2 (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

#從HDFS加載數(shù)據(jù)
hive> LOAD DATA INPATH '/user/hive/warehouse/t_hive/t_hive.txt' OVERWRITE INTO TABLE t_hive2;
Loading data to table default.t_hive2
Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2
OK
Time taken: 0.325 seconds

#查看數(shù)據(jù)
hive> select * from t_hive2;
OK
16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34
Time taken: 0.287 seconds

      
    

從其他表導(dǎo)入數(shù)據(jù)

      
        
hive> INSERT OVERWRITE TABLE t_hive2 SELECT * FROM t_hive ;

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201307131407_0002, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0002
Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-07-16 10:32:41,979 Stage-1 map = 0%,  reduce = 0%
2013-07-16 10:32:48,034 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:49,050 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:50,068 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:51,082 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:52,093 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:53,102 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.03 sec
2013-07-16 10:32:54,112 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.03 sec
MapReduce Total cumulative CPU time: 1 seconds 30 msec
Ended Job = job_201307131407_0002
Ended Job = -314818888, job is filtered out (removed at runtime).
Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-32-31_323_5732404975764014154/-ext-10000
Loading data to table default.t_hive2
Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2
Table default.t_hive2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]
7 Rows loaded to t_hive2
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.03 sec   HDFS Read: 273 HDFS Write: 56 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 30 msec
OK
Time taken: 23.227 seconds

hive> select * from t_hive2;
OK
16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34
Time taken: 0.134 seconds

      
    

創(chuàng)建表并從其他表導(dǎo)入數(shù)據(jù)

      
        
#刪除表
hive> DROP TABLE t_hive;

#創(chuàng)建表并從其他表導(dǎo)入數(shù)據(jù)
hive> CREATE TABLE t_hive AS SELECT * FROM t_hive2 ;

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201307131407_0003, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0003
Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-07-16 10:36:48,612 Stage-1 map = 0%,  reduce = 0%
2013-07-16 10:36:54,648 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:55,657 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:56,666 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:57,673 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:58,683 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2013-07-16 10:36:59,691 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.13 sec
MapReduce Total cumulative CPU time: 1 seconds 130 msec
Ended Job = job_201307131407_0003
Ended Job = -670956236, job is filtered out (removed at runtime).
Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10001
Moving data to: hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive
Table default.t_hive stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]
7 Rows loaded to hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10000
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.13 sec   HDFS Read: 272 HDFS Write: 56 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 130 msec
OK
Time taken: 20.13 seconds

hive> select * from t_hive;
OK
16      2       3
61      12      13
41      2       31
17      21      3
71      2       31
1       12      34
11      2       34
Time taken: 0.109 seconds

      
    

僅復(fù)制表結(jié)構(gòu)不導(dǎo)數(shù)據(jù)

      
        
hive> CREATE TABLE t_hive3 LIKE t_hive;
hive> select * from t_hive3;
OK
Time taken: 0.077 seconds

      
    

從MySQL數(shù)據(jù)庫導(dǎo)入數(shù)據(jù)
我們將在介紹Sqoop時講。

5. 數(shù)據(jù)導(dǎo)出

從HDFS復(fù)制到HDFS其他位置

      
        
~ hadoop fs -cp /user/hive/warehouse/t_hive /

~ hadoop fs -ls /t_hive
Found 1 items
-rw-r--r--   1 cos supergroup         56 2013-07-16 10:41 /t_hive/000000_0

~ hadoop fs -cat /t_hive/000000_0
1623
611213
41231
17213
71231
11234
11234

      
    

通過Hive導(dǎo)出到本地文件系統(tǒng)

      
        
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/t_hive' SELECT * FROM t_hive;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201307131407_0005, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0005
Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-07-16 10:46:24,774 Stage-1 map = 0%,  reduce = 0%
2013-07-16 10:46:30,823 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:31,833 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:32,844 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:33,856 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:34,865 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:35,873 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec
2013-07-16 10:46:36,884 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.87 sec
MapReduce Total cumulative CPU time: 870 msec
Ended Job = job_201307131407_0005
Copying data to local directory /tmp/t_hive
Copying data to local directory /tmp/t_hive
7 Rows loaded to /tmp/t_hive
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 0.87 sec   HDFS Read: 271 HDFS Write: 56 SUCCESS
Total MapReduce CPU Time Spent: 870 msec
OK
Time taken: 23.369 seconds

#查看本地操作系統(tǒng)
hive> ! cat /tmp/t_hive/000000_0;
hive> 1623
611213
41231
17213
71231
11234
11234


      
    

6. Hive查詢HiveQL

注:以下代碼將去掉map,reduce的日志輸出部分。

普通查詢:排序,列別名,嵌套子查詢

      
        
hive> FROM (
    >   SELECT b,c as c2 FROM t_hive
    > ) t
    > SELECT t.b, t.c2
    > WHERE b>2
    > LIMIT 2;
12      13
21      3

      
    

連接查詢:JOIN

      
        
hive> SELECT t1.a,t1.b,t2.a,t2.b
    > FROM t_hive t1 JOIN t_hive2 t2 on t1.a=t2.a
    > WHERE t1.c>10;

1       12      1       12
11      2       11      2
41      2       41      2
61      12      61      12
71      2       71      2

      
    

聚合查詢1:count, avg

      
        
hive> SELECT count(*), avg(a) FROM t_hive;
7       31.142857142857142

      
    

聚合查詢2:count, distinct

      
        
hive> SELECT count(DISTINCT b) FROM t_hive;
3

      
    

聚合查詢3:GROUP BY, HAVING

      
        
#GROUP BY
hive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c
16.0    2       3
56.0    2       62
11.0    2       34
61.0    12      13
1.0     12      34
17.0    21      3

#HAVING
hive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c HAVING sum(c)>30
56.0    2       62
11.0    2       34
1.0     12      34

      
    

7. Hive視圖

Hive視圖和數(shù)據(jù)庫視圖的概念是一樣的,我們還以t_hive為例。

      
        
hive> CREATE VIEW v_hive AS SELECT a,b FROM t_hive where c>30;
hive> select * from v_hive;
41      2
71      2
1       12
11      2

      
    

刪除視圖

      
        
hive> DROP VIEW IF EXISTS v_hive;
OK
Time taken: 0.495 seconds

      
    

8. Hive分區(qū)表

分區(qū)表是數(shù)據(jù)庫的基本概念,但很多時候數(shù)據(jù)量不大,我們完全用不到分區(qū)表。Hive是一種OLAP數(shù)據(jù)倉庫軟件,涉及的數(shù)據(jù)量是非常大的,所以分區(qū)表在這個場景就顯得非常重要!!

下面我們重新定義一個數(shù)據(jù)表結(jié)構(gòu):t_hft

創(chuàng)建數(shù)據(jù)

      
        
~ vi /home/cos/demo/t_hft_20130627.csv
000001,092023,9.76
000002,091947,8.99
000004,092002,9.79
000005,091514,2.2
000001,092008,9.70
000001,092059,9.45

~ vi /home/cos/demo/t_hft_20130628.csv
000001,092023,9.76
000002,091947,8.99
000004,092002,9.79
000005,091514,2.2
000001,092008,9.70
000001,092059,9.45

      
    

創(chuàng)建數(shù)據(jù)表

      
        
DROP TABLE IF EXISTS t_hft;
CREATE TABLE t_hft(
SecurityID STRING,
tradeTime STRING,
PreClosePx DOUBLE
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

      
    

創(chuàng)建分區(qū)數(shù)據(jù)表
根據(jù)業(yè)務(wù):按天和股票ID進行分區(qū)設(shè)計

      
        
DROP TABLE IF EXISTS t_hft;
CREATE TABLE t_hft(
SecurityID STRING,
tradeTime STRING,
PreClosePx DOUBLE
) PARTITIONED BY (tradeDate INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

      
    

導(dǎo)入數(shù)據(jù)

      
        
#20130627
hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hft_20130627.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130627);
Copying data from file:/home/cos/demo/t_hft_20130627.csv
Copying file: file:/home/cos/demo/t_hft_20130627.csv
Loading data to table default.t_hft partition (tradedate=20130627)

#20130628
hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hft_20130628.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130628);
Copying data from file:/home/cos/demo/t_hft_20130628.csv
Copying file: file:/home/cos/demo/t_hft_20130628.csv
Loading data to table default.t_hft partition (tradedate=20130628)


      
    

查看分區(qū)表

      
        
hive> SHOW PARTITIONS t_hft;
tradedate=20130627
tradedate=20130628
Time taken: 0.082 seconds

      
    

查詢數(shù)據(jù)

      
        
hive> select * from t_hft where securityid='000001';
000001  092023  9.76    20130627
000001  092008  9.7     20130627
000001  092059  9.45    20130627
000001  092023  9.76    20130628
000001  092008  9.7     20130628
000001  092059  9.45    20130628

hive> select * from t_hft where tradedate=20130627 and PreClosePx<9;
000002  091947  8.99    20130627
000005  091514  2.2     20130627

      
    

Hive基于使用完成,這些都是日常的操作。后面我會繼續(xù)講一下,HiveQL優(yōu)化及Hive的運維。

?

Hive安裝及使用攻略


更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主

微信掃碼或搜索:z360901061

微信掃一掃加我為好友

QQ號聯(lián)系: 360901061

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。

【本文對您有幫助就好】

您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描上面二維碼支持博主2元、5元、10元、自定義金額等您想捐的金額吧,站長會非常 感謝您的哦!!!

發(fā)表我的評論
最新評論 總共0條評論
主站蜘蛛池模板: 婷婷视频在线 | 国产麻豆精品aⅴ免费观看 国产麻豆精品hdvideoss | 神马不卡 | 68久久久久欧美精品观看 | 精品一区二区三区视频在线观看 | 久久激情综合色丁香 | 在线亚洲欧洲国产综合444 | 91福利免费 | 欧美日本国产 | 天天看天天射天天碰 | 久久精品综合免费观看 | 国产精品人人 | 波多野结衣一区二区三区四区 | 激情综合视频 | 青青草国产免费一区二区 | 久久精品免费视频6 | 麻豆久久精品免费看国产 | 亚洲国产精品综合久久20 | 久久这里只有精品免费播放 | 欧美社区| 夜色成人免费观看 | 日韩国产欧美一区二区三区 | 四虎影视在线看免费观看 | 2021国内精品久久久久影院 | 中文字幕专区高清在线观看 | 久久手机精品视频 | 日本视频在线观看不卡高清免费 | 亚洲综合久久久久久中文字幕 | 中文字幕一区精品 | 四虎精品在线观看 | 国产午夜偷精品偷伦 | 国产日韩三级 | 60岁妇女毛片 | 亚洲人人草 | 国产极品福利视频在线观看 | 国产精品伊人 | 亚洲 欧洲 另类 综合 自拍 | 九九九视频| 日日摸夜夜添夜夜添欧美毛片 | 私人小影院在线 观看 | 一本色道久久综合亚洲精品 |