高清无码一区二区在线观看吞精,一区二区三区四区免费视频,国产亚洲人成网站天堂岛

版本：

Python：3.6.4 與 2.7.3 均適配

一、hbase表介紹

表名：people
列族：basic_info、other_info
rowkey：隨機的兩位數 + 當前時間戳，并要確保該rowkey在表數據中唯一。
列定義：name、age、sex、edu、tel、email、country。

二、實現

rowkey：
- 隨機的兩位數：使用random.randint(00, 99)，然后使用 zfill(2) 補位，比如數字“1”補位為”01”。
- 生成當前時間的13位時間戳：int(time.time())
- rowkey為隨機的兩位數與時間戳拼湊而成，并確保rowkey唯一。
name：
- 使用 string.capwords() 將字符串首字母大寫，其余字母小寫。
- 使用 random.sample() 截取指定位數的任意字符串作為 name
age：
- random.randint(18, 60) ：18 ~ 60歲
sex：
- random.choice()
edu：
- random.choice()
telphone：
- random.choice() 與 random.sample() 的聯合使用
email：
- random.sample() 與 random.choice() 的聯合使用
country：
- random.choice()

三、代碼

以下為 python 生成 hbase 測試數據的全部代碼，generatedata.py 文件內容如下：

            
              
                # -- coding: utf-8 --
              
              
                ###########################################
              
              
                # rowkey：隨機的兩位數 + 當前時間戳，并要確保該rowkey在表數據中唯一。
              
              
                # 列定義：name、age、sex、edu、tel、email、country。
              
              
                # 0001,tom,17,man,,176xxxxxxxx,,China
              
              
                # 0002,mary,23,woman,college,,cdsvo@163.com,Japan
              
              
                # 0003,sam,18,man,middle,132xxxxxxxx,,America
              
              
                # 0004,Sariel,26,,college,178xxxxxxxx,12345@126.com,China
              
              
                ###########################################
              
              
                import
              
               random

              
                import
              
               string

              
                import
              
               sys


              
                # 大小寫字母
              
              
alphabet_upper_list 
              
                =
              
               string
              
                .
              
              ascii_uppercase
alphabet_lower_list 
              
                =
              
               string
              
                .
              
              ascii_lowercase



              
                # 隨機生成指定位數的字符串
              
              
                def
              
              
                get_random
              
              
                (
              
              instr
              
                ,
              
               length
              
                )
              
              
                :
              
              
                # 從指定序列中隨機獲取指定長度的片段并組成數組，例如:['a', 't', 'f', 'v', 'y']
              
              
    res 
              
                =
              
               random
              
                .
              
              sample
              
                (
              
              instr
              
                ,
              
               length
              
                )
              
              
                # 將數組內的元素組成字符串
              
              
    result 
              
                =
              
              
                ''
              
              
                .
              
              join
              
                (
              
              res
              
                )
              
              
                return
              
               result



              
                # 創建名字
              
              
                def
              
              
                get_random_name
              
              
                (
              
              length
              
                )
              
              
                :
              
              
    name 
              
                =
              
               string
              
                .
              
              capwords
              
                (
              
              get_random
              
                (
              
              alphabet_lower_list
              
                ,
              
               length
              
                )
              
              
                )
              
              
                return
              
               name



              
                # 獲取年齡
              
              
                def
              
              
                get_random_age
              
              
                (
              
              
                )
              
              
                :
              
              
                return
              
              
                str
              
              
                (
              
              random
              
                .
              
              randint
              
                (
              
              
                18
              
              
                ,
              
              
                60
              
              
                )
              
              
                )
              
              
                # 獲取性別
              
              
                def
              
              
                get_random_sex
              
              
                (
              
              
                )
              
              
                :
              
              
                return
              
               random
              
                .
              
              choice
              
                (
              
              
                [
              
              
                "woman"
              
              
                ,
              
              
                "man"
              
              
                ]
              
              
                )
              
              
                # 獲取學歷
              
              
                def
              
              
                get_random_edu
              
              
                (
              
              
                )
              
              
                :
              
              
    edu_list 
              
                =
              
              
                [
              
              
                "primary"
              
              
                ,
              
              
                "middle"
              
              
                ,
              
              
                "college"
              
              
                ,
              
              
                "master"
              
              
                ,
              
              
                "court academician"
              
              
                ]
              
              
                return
              
               random
              
                .
              
              choice
              
                (
              
              edu_list
              
                )
              
              
                # 獲取電話號碼
              
              
                def
              
              
                get_random_tel
              
              
                (
              
              
                )
              
              
                :
              
              
    pre_list 
              
                =
              
              
                [
              
              
                "130"
              
              
                ,
              
              
                "131"
              
              
                ,
              
              
                "132"
              
              
                ,
              
              
                "133"
              
              
                ,
              
              
                "134"
              
              
                ,
              
              
                "135"
              
              
                ,
              
              
                "136"
              
              
                ,
              
              
                "137"
              
              
                ,
              
              
                "138"
              
              
                ,
              
              
                "139"
              
              
                ,
              
              
                "147"
              
              
                ,
              
              
                "150"
              
              
                ,
              
              
                "151"
              
              
                ,
              
              
                "152"
              
              
                ,
              
              
                "153"
              
              
                ,
              
              
                "155"
              
              
                ,
              
              
                "156"
              
              
                ,
              
              
                "157"
              
              
                ,
              
              
                "158"
              
              
                ,
              
              
                "159"
              
              
                ,
              
              
                "186"
              
              
                ,
              
              
                "187"
              
              
                ,
              
              
                "188"
              
              
                ]
              
              
                return
              
               random
              
                .
              
              choice
              
                (
              
              pre_list
              
                )
              
              
                +
              
              
                ''
              
              
                .
              
              join
              
                (
              
              random
              
                .
              
              sample
              
                (
              
              
                '0123456789'
              
              
                ,
              
              
                8
              
              
                )
              
              
                )
              
              
                # 獲取郵箱名
              
              
                def
              
              
                get_random_email
              
              
                (
              
              length
              
                )
              
              
                :
              
              
    alphabet_list 
              
                =
              
               alphabet_lower_list 
              
                +
              
               alphabet_upper_list
    email_list 
              
                =
              
              
                [
              
              
                "163.com"
              
              
                ,
              
              
                "126.com"
              
              
                ,
              
              
                "qq.com"
              
              
                ,
              
              
                "gmail.com"
              
              
                ]
              
              
                return
              
               get_random
              
                (
              
              alphabet_list
              
                ,
              
               length
              
                )
              
              
                +
              
              
                "@"
              
              
                +
              
               random
              
                .
              
              choice
              
                (
              
              email_list
              
                )
              
              
                # 獲取國籍
              
              
                def
              
              
                get_random_country
              
              
                (
              
              
                )
              
              
                :
              
              
    country_list 
              
                =
              
              
                [
              
              
                "Afghanistan"
              
              
                ,
              
              
                "Anguilla"
              
              
                ,
              
              
                "Australie"
              
              
                ,
              
              
                "Barbados"
              
              
                ,
              
              
                "China"
              
              
                ,
              
              
                "Brisil"
              
              
                ,
              
              
                "Colombie"
              
              
                ,
              
              
                "France"
              
              
                ,
              
              
                "Irlande"
              
              
                ,
              
              
                "Russie"
              
              
                ,
              
              
                "Suisse"
              
              
                ,
              
              
                "America"
              
              
                ,
              
              
                "Zaire"
              
              
                ,
              
              
                "Vanuatu"
              
              
                ,
              
              
                "Turquie"
              
              
                ,
              
              
                "Togo"
              
              
                ,
              
              
                "Suisse"
              
              
                ,
              
              
                "Sri Lanka"
              
              
                ,
              
              
                "Porto Rico"
              
              
                ,
              
              
                "Pirou"
              
              
                ]
              
              
                return
              
               random
              
                .
              
              choice
              
                (
              
              country_list
              
                )
              
              
                # 放置生成的并且不存在的rowkey
              
              
rowkey_tmp_list 
              
                =
              
              
                [
              
              
                ]
              
              
                # 制作rowkey
              
              
                def
              
              
                get_random_rowkey
              
              
                (
              
              
                )
              
              
                :
              
              
                import
              
               time
    pre_rowkey 
              
                =
              
              
                ""
              
              
                while
              
              
                True
              
              
                :
              
              
                # 獲取00~99的兩位數字，包含00與99
              
              
        num 
              
                =
              
               random
              
                .
              
              randint
              
                (
              
              
                00
              
              
                ,
              
              
                99
              
              
                )
              
              
                # 獲取當前10位的時間戳
              
              
        timestamp 
              
                =
              
              
                int
              
              
                (
              
              time
              
                .
              
              time
              
                (
              
              
                )
              
              
                )
              
              
                # str(num).zfill(2)為字符串不滿足2位，自動將該字符串補0
              
              
        pre_rowkey 
              
                =
              
              
                str
              
              
                (
              
              num
              
                )
              
              
                .
              
              zfill
              
                (
              
              
                2
              
              
                )
              
              
                +
              
              
                str
              
              
                (
              
              timestamp
              
                )
              
              
                if
              
               pre_rowkey 
              
                not
              
              
                in
              
               rowkey_tmp_list
              
                :
              
              
            rowkey_tmp_list
              
                .
              
              append
              
                (
              
              pre_rowkey
              
                )
              
              
                break
              
              
                return
              
               pre_rowkey



              
                # 生成一條數據
              
              
                def
              
              
                get_random_record
              
              
                (
              
              
                )
              
              
                :
              
              
                return
              
               get_random_rowkey
              
                (
              
              
                )
              
              
                +
              
              
                ","
              
              
                +
              
               get_random_name
              
                (
              
              
                5
              
              
                )
              
              
                +
              
              
                ","
              
              
                +
              
               get_random_age
              
                (
              
              
                )
              
              
                +
              
              
                ","
              
              
                +
              
               get_random_sex
              
                (
              
              
                )
              
              
                +
              
              
                ","
              
              
                +
              
               get_random_edu
              
                (
              
              
                )
              
              
                +
              
              
                ","
              
              
                +
              
               get_random_tel
              
                (
              
              
                )
              
              
                +
              
              
                ","
              
              
                +
              
               get_random_email
              
                (
              
              
                10
              
              
                )
              
              
                +
              
              
                ","
              
              
                +
              
               get_random_country
              
                (
              
              
                )
              
              
                # 將記錄寫到文本中
              
              
                def
              
              
                write_record_to_file
              
              
                (
              
              
                )
              
              
                :
              
              
                # 覆蓋文件內容，重新寫入
              
              
    f 
              
                =
              
              
                open
              
              
                (
              
              sys
              
                .
              
              argv
              
                [
              
              
                1
              
              
                ]
              
              
                ,
              
              
                'w'
              
              
                )
              
              
    i 
              
                =
              
              
                0
              
              
                while
              
               i 
              
                <
              
              
                int
              
              
                (
              
              sys
              
                .
              
              argv
              
                [
              
              
                2
              
              
                ]
              
              
                )
              
              
                :
              
              
        record 
              
                =
              
               get_random_record
              
                (
              
              
                )
              
              
        f
              
                .
              
              write
              
                (
              
              record
              
                )
              
              
                # 換行寫入
              
              
        f
              
                .
              
              write
              
                (
              
              
                '\n'
              
              
                )
              
              
        i 
              
                +=
              
              
                1
              
              
                print
              
              
                (
              
              
                "完成{0}條數據存儲"
              
              
                .
              
              
                format
              
              
                (
              
              i
              
                )
              
              
                )
              
              
    f
              
                .
              
              close
              
                (
              
              
                )
              
              
                if
              
               __name__ 
              
                ==
              
              
                "__main__"
              
              
                :
              
              
    write_record_to_file
              
                (
              
              
                )

輸出 100000 條數據到 /tmp/hbase_data.txt 文件中，執行以下命令：

            
              python generatedata.py /tmp/hbase_data.txt 100000

參數解釋：

要執行的 python 文件：generatedata.py
文件輸出路徑：/tmp/hbase_data.txt
100000：要生成數據的總數量

為避免數據過大導致熱點和數據傾斜問題，預先設置 HBase 表為10個 Region，對應表的創建命令為：

            
              create 
              
                'default:people'
              
              , 
              
                {
              
              NAME
              
                =
              
              
                >
              
              
                'basic_info'
              
              
                }
              
              , 
              
                {
              
              NAME
              
                =
              
              
                >
              
              
                'other_info'
              
              
                }
              
              , SPLITS
              
                =
              
              
                >
              
              
                [
              
              
                '10|'
              
              ,
              
                '20|'
              
              ,
              
                '30|'
              
              ,
              
                '40|'
              
              ,
              
                '50|'
              
              ,
              
                '60|'
              
              ,
              
                '70|'
              
              ,
              
                '80|'
              
              ,
              
                '90|'
              
              
                ]

接下來我們可以利用這份測試數據對 HBase 相關功能進行測試與練習。

更多文章、技術交流、商務合作、聯系博主

微信掃碼或搜索：z360901061

微信掃一掃加我為好友

QQ號聯系： 360901061

您的支持是博主寫作最大的動力，如果您喜歡我的文章，感覺我的文章對您有幫助，請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧，狠狠點擊下面給點支持吧，站長非常感激您！手機微信長按不能支付解決辦法：請將微信支付二維碼保存到相冊，切換到微信，然后點擊微信右上角掃一掃功能，選擇支付二維碼完成支付。

【本文對您有幫助就好】元

2元

5元

10元

20元

自定義

亚洲免费在线-亚洲免费在线播放-亚洲免费在线观看-亚洲免费在线观看视频-亚洲免费在线看-亚洲免费在线视频

python生成hbase測試數據說明

一、hbase表介紹

二、實現

三、代碼