?
什么是pyQuery:
強大又靈活的網頁解析庫。如果你覺得正則寫起來太麻煩(我不會寫正則),如果你覺得 BeautifulSoup的語法太難記,如果你熟悉JQuery的語法,那么PyQuery就是你最佳的選擇。
pyQuery的安裝pip3 install pyquery即可安裝啦。
pyQuery的基本用法:
初始化:
字符串初始化:
# !/usr/bin/env python # -*- coding: utf-8 -*- html = """The Dormouse's story """ from pyquery import PyQuery as pq doc = pq(html) print (doc( ' a ' ))The Dormouse's story
Once upon a time there were three little sisters;and thier names were Lacie and Title ; and they lived at the boottom of a well.
...
運行結果:
URL初始化:
# !/usr/bin/env python # -*- coding: utf-8 -*- # URL初始化 from pyquery import PyQuery as pq doc = pq( ' http://www.baidu.com ' ) print (doc( ' input ' ))
運行結果:
文件初始化:
# !/usr/bin/env python # -*- coding: utf-8 -*- # 文件初始化 from pyquery import PyQuery as pq doc = pq(filename= ' baidu.html ' ) print (doc( ' title ' ))
運行結果:
?選擇方式和jquery一致,id、name、class都是如此,還有很多都和jquery一致。
基本CSS選擇器:
# !/usr/bin/env python # -*- coding: utf-8 -*- # Css選擇器 html = """The Dormouse's story """ from pyquery import PyQuery as pq doc = pq(html) print (doc( ' .title ' ))The Dormouse's story
Once upon a time there were three little sisters;and thier names were Lacie and Title ; and they lived at the boottom of a well.
...
運行結果:
查找元素:
子元素:
# !/usr/bin/env python # -*- coding: utf-8 -*- # 子元素 html = """The Dormouse's story """ from pyquery import PyQuery as pq doc = pq(html) items = doc( ' .title ' ) print (type(items)) print (items) p = items.find( ' b ' ) print (type(p)) print (p)The Dormouse's story
Once upon a time there were three little sisters;and thier names were Lacie and Title ; and they lived at the boottom of a well.
...
該代碼為查找id為title的標簽,我們可以看到id為title的標簽有兩個一個是p標簽,一個是a標簽,然后我們再使用find方法,查找出我們需要的p標簽,運行結果:
這里需要注意的是,我們所使用的find是查找每一個元素內部的標簽.
children:
# !/usr/bin/env python # -*- coding: utf-8 -*- # 子元素 html = """The Dormouse's story """ from pyquery import PyQuery as pq doc = pq(html) items = doc( ' .title ' ) print (items.children())The Dormouse's story
Once upon a time there were three little sisters;and thier names were Lacie and Title ; and they lived at the boottom of a well.
...
運行結果:
也可以在children()內添加選擇器條件:
# !/usr/bin/env python # -*- coding: utf-8 -*- # 子元素 html = """The Dormouse's story """ from pyquery import PyQuery as pq doc = pq(html) items = doc( ' .title ' ) print (items.children( ' b ' ))The Dormouse's story
Once upon a time there were three little sisters;and thier names were Lacie and Title ; and they lived at the boottom of a well.
...
輸出結果和上面的一致。
?父元素:
# !/usr/bin/env python # -*- coding: utf-8 -*- # 子元素 html = """The Dormouse's story """ from pyquery import PyQuery as pq doc = pq(html) items = doc( ' #link1 ' ) print (items) print (items.parent())The Dormouse's story
Once upon a time there were three little sisters;and thier names were Lacie and Title ; and they lived at the boottom of a well.
...
運行結果:
這里只輸出一個父元素。這里我們用parents方法會給予我們返回所有父元素,祖先元素
# !/usr/bin/env python # -*- coding: utf-8 -*- # 祖先元素 html = """The Dormouse's story Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title
...
""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' #link1 ' ) print (items) print (items.parents( ' body ' ))
運行結果:
兄弟元素:
# !/usr/bin/env python # -*- coding: utf-8 -*- # 兄弟元素 html = """The Dormouse's story Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title
...
""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' #link1 ' ) print (items) print (items.siblings( ' #link2 ' ))
運行結果:
上面就把查找元素的方法都說了,下面我來看一下如何遍歷元素。
遍歷
# !/usr/bin/env python # -*- coding: utf-8 -*- # 兄弟元素 html = """The Dormouse's story Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title
...
""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' a ' ) for k,v in enumerate(items.items()): print (k,v)
運行結果:
?獲取信息:
獲取屬性:
# !/usr/bin/env python # -*- coding: utf-8 -*- # 獲取屬性 html = """The Dormouse's story Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title
...
""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' a ' ) print (items) print (items.attr( ' href ' )) print (items.attr.href)
運行結果:
獲得文本:
# !/usr/bin/env python # -*- coding: utf-8 -*- # 獲取屬性 html = """The Dormouse's story Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title
...
""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' a ' ) print (items) print (items.text()) print (type(items.text()))
運行結果:
獲得HTML:
# !/usr/bin/env python # -*- coding: utf-8 -*- # 獲取屬性 html = """The Dormouse's story Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title
...
""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' a ' ) print (items.html())
運行結果:
DOM操作:
addClass、removeClass
# !/usr/bin/env python # -*- coding: utf-8 -*- # DOM操作,addClass、removeClass html = """The Dormouse's story Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title
...
""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' #link2 ' ) print (items) items.addClass( ' addStyle ' ) # add_class print (items) items.remove_class( ' sister ' ) # removeClass print (items)
運行結果:
attr、css:
# !/usr/bin/env python # -*- coding: utf-8 -*- # DOM操作,attr,css html = """The Dormouse's story Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title
...
""" from pyquery import PyQuery as pq doc = pq(html) items = doc( ' #link2 ' ) items.attr( ' name ' , ' addname ' ) print (items) items.css( ' width ' , ' 100px ' ) print (items)
可以給予新的屬性,如果原來有該屬性,會覆蓋掉原有的屬性
運行結果:
remove:
# !/usr/bin/env python # -*- coding: utf-8 -*- # DOM操作,remove html = """Hello World""" from pyquery import PyQuery as pq doc = pq(html) wrap = doc( ' .wrap ' ) print (wrap.text()) wrap.find( ' p ' ).remove() print ( " remove以后的數據 " ) print (wrap)This is a paragraph.
運行結果:
還有很多其他的DOM方法,想了解更多的小伙伴可以閱讀其官方文檔,地址:https://pyquery.readthedocs.io/en/latest/api.html
偽類選擇器:
# !/usr/bin/env python # -*- coding: utf-8 -*- # DOM操作,偽類選擇器 html = """The Dormouse's story Once upo a time were three little sister;and theru name were Elsie Lacie and Title Title
...
""" from pyquery import PyQuery as pq doc = pq(html) # print(doc) wrap = doc( ' a:first-child ' ) # 第一個標簽 print (wrap) wrap = doc( ' a:last-child ' ) # 最后一個標簽 print (wrap) wrap = doc( ' a:nth-child(2) ' ) # 第二個標簽 print (wrap) wrap = doc( ' a:gt(2) ' ) # 比2大的索引 標簽 即為 0 1 2 3 4 從0開始的 不是1 print (wrap) wrap = doc( ' a:nth-child(2n) ' ) # 第 2的整數倍 個標簽 print (wrap) wrap = doc( ' a:contains(Lacie) ' ) # 包含Lacie文本的標簽 print (wrap)
這里不在詳細的一一列舉了,了解更多CSS選擇器可以查看官方文檔,由W3C提供地址:http://www.w3school.com.cn/css/index.asp
到這里我們就把pyQuery的使用方法大致的說完了,想了解更多,更詳細的可以閱讀官方文檔,地址:https://pyquery.readthedocs.io/en/latest/
上述代碼地址:https://gitee.com/dwyui/pyQuery.git
感謝大家的閱讀,不正確的地方,還希望大家來斧正,鞠躬,謝謝。
更多文章、技術交流、商務合作、聯系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號聯系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元
