Python爬取豆瓣電影,最簡單,最暴力,直接搞Api
首先是api地址(地址去官網(wǎng)溜達(dá)一圈很容易就找到):
requests
.
get
(
'https://movie.douban.com/j/search_subjects?type=movie&tag={}&sort=recommend&page_limit={}&page_start=0'
.
format
(
tag
,
page
)
使用requests發(fā)送get請求拿到j(luò)son數(shù)據(jù)( 一次可以抓很多條,所以沒必要循環(huán)抓,User-Agent我只準(zhǔn)備了一個即可 ),導(dǎo)入json包,解析json數(shù)據(jù),這里需要將編碼改為utf-8,否則會亂碼
{
"subjects"
:
[
{
"rate"
:
"8.7"
,
"cover_x"
:
1500
,
"title"
:
"寄生蟲"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/27010768\/"
,
"playable"
:
false
,
"cover"
:
"https://img3.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2561439800.webp"
,
"id"
:
"27010768"
,
"cover_y"
:
2138
,
"is_new"
:
false
}
,
{
"rate"
:
"7.7"
,
"cover_x"
:
1000
,
"title"
:
"極限逃生"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/30210691\/"
,
"playable"
:
false
,
"cover"
:
"https://img3.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2563546656.webp"
,
"id"
:
"30210691"
,
"cover_y"
:
1425
,
"is_new"
:
false
}
,
{
"rate"
:
"7.5"
,
"cover_x"
:
1080
,
"title"
:
"愛哭鬼上學(xué)記"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/34781114\/"
,
"playable"
:
false
,
"cover"
:
"https://img1.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2564498289.webp"
,
"id"
:
"34781114"
,
"cover_y"
:
1599
,
"is_new"
:
true
}
,
{
"rate"
:
"6.2"
,
"cover_x"
:
2000
,
"title"
:
"大地震"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/34800551\/"
,
"playable"
:
true
,
"cover"
:
"https://img3.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2568281066.webp"
,
"id"
:
"34800551"
,
"cover_y"
:
2667
,
"is_new"
:
true
}
,
{
"rate"
:
"7.9"
,
"cover_x"
:
3043
,
"title"
:
"騾子"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/30135113\/"
,
"playable"
:
false
,
"cover"
:
"https://img1.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2563626309.webp"
,
"id"
:
"30135113"
,
"cover_y"
:
4500
,
"is_new"
:
false
}
,
{
"rate"
:
"5.9"
,
"cover_x"
:
4000
,
"title"
:
"X戰(zhàn)警:黑鳳凰"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/26667010\/"
,
"playable"
:
false
,
"cover"
:
"https://img3.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2555886490.webp"
,
"id"
:
"26667010"
,
"cover_y"
:
5915
,
"is_new"
:
false
}
,
{
"rate"
:
"7.9"
,
"cover_x"
:
3600
,
"title"
:
"疾速備戰(zhàn)"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/26909790\/"
,
"playable"
:
false
,
"cover"
:
"https://img3.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2551393832.webp"
,
"id"
:
"26909790"
,
"cover_y"
:
5550
,
"is_new"
:
false
}
,
{
"rate"
:
"7.5"
,
"cover_x"
:
1872
,
"title"
:
"安娜"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/27166976\/"
,
"playable"
:
false
,
"cover"
:
"https://img3.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2560205995.webp"
,
"id"
:
"27166976"
,
"cover_y"
:
2808
,
"is_new"
:
false
}
,
{
"rate"
:
"7.7"
,
"cover_x"
:
1500
,
"title"
:
"惡人傳"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/30211551\/"
,
"playable"
:
false
,
"cover"
:
"https://img3.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2555084871.webp"
,
"id"
:
"30211551"
,
"cover_y"
:
2145
,
"is_new"
:
false
}
,
{
"rate"
:
"6.0"
,
"cover_x"
:
1020
,
"title"
:
"掃毒2天地對決"
,
"url"
:
"https:\/\/movie.douban.com\/subject\/30171425\/"
,
"playable"
:
true
,
"cover"
:
"https://img3.doubanio.com\/view\/photo\/s_ratio_poster\/public\/p2561172733.webp"
,
"id"
:
"30171425"
,
"cover_y"
:
1428
,
"is_new"
:
false
}
]
}
最后將數(shù)據(jù)放入數(shù)組中,通過
pyecharts
實現(xiàn)數(shù)據(jù)可視化,生成html文件,
當(dāng)然可能不是很好看,自己可以再調(diào)整比如居中之類的(我這里是手動改了生成之后的html部分代碼)
,如圖:
下面貼上完整代碼:
import
json
import
requests
from
example
.
commons
import
Faker
from
pyecharts
import
options
as
opts
from
pyecharts
.
charts
import
Bar
def
conn
(
page
,
tag
)
:
result
=
requests
.
get
(
'https://movie.douban.com/j/search_subjects?type=movie&tag={}&sort=recommend&page_limit={}&page_start=0'
.
format
(
tag
,
page
)
,
headers
=
{
'User-Agent'
:
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
)
print
(
result
.
content
.
decode
(
'utf-8'
)
)
x
=
json
.
loads
(
result
.
content
.
decode
(
'utf-8'
)
)
list
=
x
.
get
(
'subjects'
)
fenshu
=
[
]
name
=
[
]
for
i
in
list
:
print
(
i
.
get
(
'rate'
)
)
fenshu
.
append
(
i
.
get
(
'rate'
)
)
print
(
i
.
get
(
'title'
)
)
name
.
append
(
i
.
get
(
'title'
)
)
bar
=
Bar
(
)
bar
.
add_xaxis
(
name
)
bar
.
add_yaxis
(
'分?jǐn)?shù)'
,
fenshu
,
stack
=
"stack1"
,
color
=
Faker
.
rand_color
(
)
)
bar
.
reversal_axis
(
)
bar
.
set_global_opts
(
title_opts
=
opts
.
TitleOpts
(
title
=
"影視評分"
)
,
datazoom_opts
=
opts
.
DataZoomOpts
(
orient
=
"vertical"
)
)
bar
.
set_series_opts
(
label_opts
=
opts
.
LabelOpts
(
is_show
=
False
)
,
markpoint_opts
=
opts
.
MarkPointOpts
(
data
=
[
opts
.
MarkPointItem
(
type_
=
"max"
,
name
=
"最大值"
)
,
opts
.
MarkPointItem
(
type_
=
"min"
,
name
=
"最小值"
)
,
opts
.
MarkPointItem
(
type_
=
"average"
,
name
=
"平均值"
)
,
]
)
)
bar
.
render
(
'douban.html'
)
if
__name__
==
'__main__'
:
# 第一個參數(shù)是一次抓多少條數(shù)據(jù)(比較大我試過幾千),從0開始抓,第二個參數(shù)是抓什么類型,下面的names是可抓取類型,替換即可
conn
(
100
,
'最新'
)
names
=
[
'熱門'
,
'最新'
,
'經(jīng)典'
,
'可播放'
,
'豆瓣高分'
,
'冷門佳片'
,
'華語'
,
'歐美'
,
'日本'
,
'動作'
,
'喜劇'
,
'愛情'
,
'科幻'
,
'懸疑'
,
'恐怖'
,
'成長'
]
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯(lián)系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
