?
1安裝Hive
1.1下載解壓
wget http://mirrors.cnnic.cn/apache/hive/stable/hive-0.10.0.tar.gz
tar xzvfhive-0.10.0.tar.gz
1.2配置環(huán)境變量
exportHIVE_HOME=/usr/local/src/hive-0.10.0
export PATH=$HIVE_HOME/bin:$PATH
1.3建立Hive倉庫目錄
hadoop fs -mkdir/tmp
hadoop fs -mkdir/user/hive/warehouse
hadoop fs -chmodg+w /tmp
hadoop fs -chmodg+w /user/hive/warehouse
1.4啟動命令行
通過hive命令進入命令行,操作與MySQL的命令行類似:
2安裝Sqoop
2.1下載解壓
下載適合Hadoop 0.20版本的Sqoop:
wget http://mirrors.cnnic.cn/apache/sqoop/1.4.3/sqoop-1.4.3.bin__hadoop-0.20.tar.gz
tar -xvf sqoop-1.4.3.bin__hadoop-0.20.tar.gz
2.2配置環(huán)境變量
export SQOOP_HOME=/usr/local/src/sqoop-1.4.3.bin__hadoop-0.20
export PATH=$SQOOP_HOME/bin:$PATH
export HADOOP_COMMON_HOME=/home/admin/hadoop-0.20.2
export HADOOP_MAPRED_HOME=/home/admin/hadoop-0.20.2
3用Sqoop導(dǎo)入數(shù)據(jù)到HIVE
3.1導(dǎo)入HDFS
我們從MySQL數(shù)據(jù)庫中導(dǎo)入一張表的數(shù)據(jù)來測試一下Sqoop是否配置成功。首先上傳mysql-connector-java-5.1.23.jar到sqoop的lib文件夾下,然后在sqoop/bin下執(zhí)行下列命令:
sqoop import--connect jdbc:mysql://ip/database --table tb1 --username user -P
===============================================================================
Warning: /usr/lib/hbase does not exist!HBase imports will fail.
Please set $HBASE_HOME to the root of yourHBase installation.
Enter password:
13/06/07 16:51:46 INFOmanager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/06/07 16:51:46 INFO tool.CodeGenTool: Beginning codegeneration
13/06/07 16:51:48 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `tb1` AS t LIMIT 1
13/06/07 16:51:48 INFO manager.SqlManager:Executing SQL statement: SELECT t.* FROM `tb1` AS t LIMIT 1
13/06/07 16:51:48 INFOorm.CompilationManager: HADOOP_MAPRED_HOME is /home/admin/hadoop-0.20.2
13/06/07 16:51:48 INFOorm.CompilationManager: Found hadoop core jar at:/home/admin/hadoop-0.20.2/hadoop-0.20.2-core.jar
Note:/tmp/sqoop-root/compile/44c4b6c5ac57de04b487eb90633ac33e/tb1.java uses oroverrides a deprecated API.
Note: Recompile with -Xlint:deprecation fordetails.
13/06/07 16:51:54 INFO orm.CompilationManager:Writing jar file:/tmp/sqoop-root/compile/44c4b6c5ac57de04b487eb90633ac33e/tb1.jar
13/06/07 16:51:54 WARNmanager.MySQLManager: It looks like you are importing from mysql.
13/06/07 16:51:54 WARNmanager.MySQLManager: This transfer can be faster! Use the --direct
13/06/07 16:51:54 WARNmanager.MySQLManager: option to exercise a MySQL-specific fast path.
13/06/07 16:51:54 INFOmanager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
13/06/07 16:51:54 INFO mapreduce.ImportJobBase:Beginning import of tb1
13/06/07 16:51:57 INFOdb.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM`tb1`
13/06/07 16:51:59 INFO mapred.JobClient:Running job: job_201306071651_0001
13/06/07 16:52:00 INFOmapred.JobClient:? map 0% reduce 0%
13/06/07 16:52:38 INFOmapred.JobClient:? map 50% reduce 0%
13/06/07 16:52:44 INFOmapred.JobClient:? map 100% reduce 0%
13/06/07 16:52:46 INFO mapred.JobClient:Job complete: job_201306071651_0001
13/06/07 16:52:46 INFO mapred.JobClient:Counters: 5
13/06/07 16:52:46 INFOmapred.JobClient:?? Job Counters
13/06/07 16:52:46 INFOmapred.JobClient:???? Launched map tasks=2
13/06/07 16:52:46 INFOmapred.JobClient:?? FileSystemCounters
13/06/07 16:52:46 INFO mapred.JobClient:???? HDFS_BYTES_WRITTEN=212
13/06/07 16:52:46 INFOmapred.JobClient:?? Map-Reduce Framework
13/06/07 16:52:46 INFOmapred.JobClient:???? Map input records=2
13/06/07 16:52:46 INFOmapred.JobClient:???? Spilled Records=0
13/06/07 16:52:46 INFO mapred.JobClient:???? Map output records=2
13/06/07 16:52:46 INFOmapreduce.ImportJobBase: Transferred 212 bytes in 51.383 seconds (4.1259bytes/sec)
13/06/07 16:52:46 INFOmapreduce.ImportJobBase: Retrieved 2 records.
===============================================================================
?
數(shù)據(jù)文件默認(rèn)被導(dǎo)入到當(dāng)前用戶文件夾下表名對應(yīng)的文件夾了:
?
Sqoop默認(rèn)會同時啟動四個Map任務(wù)來加速數(shù)據(jù)導(dǎo)入,可以通過-m 1命令來強制只啟動一個map任務(wù),這樣就只會在HDFS中生成一個數(shù)據(jù)文件了。因為tb1表目前就兩條數(shù)據(jù),所以一共產(chǎn)生兩個文件,查看下生成的文件內(nèi)容:
3.2創(chuàng)建Hive表
首先在hive命令行中創(chuàng)建tb1表。注意hive支持的數(shù)據(jù)類型有限,并且 一定要設(shè)置表的分隔符為逗號 ,否則Hive默認(rèn)分隔符為Ctrl+A。
CREATE TABLE tb1(
? id int,
?......
) row format delimited fields terminated by ‘,’ ;
?
也可以通過下面的命令讓Sqoop根據(jù)MySQL表結(jié)構(gòu)自動創(chuàng)建出Hive表:
sqoop create-hive-table --connect jdbc:mysql://ip/database --table tb1 --hive-table tb1 --username user -P
3.3導(dǎo)入Hive
現(xiàn)在導(dǎo)入HDFS中的文件到Hive,注意Hive從HDFS導(dǎo)入數(shù)據(jù)后,會將HDFS中的文件/user/root/tb1移動到/user/hive/tb1:
???????? LOADDATA INPATH '/user/root/tb1/part-m-*' OVERWRITE INTO TABLE tb1
3.4一條強大的命令
上面的從MySQL導(dǎo)出數(shù)據(jù)到HDFS、創(chuàng)建Hive表格、導(dǎo)入數(shù)據(jù)到Hive三步,可以直接用一條Sqoop命令完成:
sqoop import--connect jdbc:mysql://ip/database --table tb1 --username user -P ?--hive-import
?
4用HiveQL做分析
待續(xù)......?
參考資料
Hive安裝
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
?
http://sqoop.apache.org/docs/1.99.1/Installation.html
?
?
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯(lián)系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
