基於bs4庫的HTML內容遍歷方法

html基本格式（樹型格式）：

遍歷方式：下行遍歷（根節點到葉節點），上行遍歷（葉節點到根節點），平行遍歷

標籤樹的下行遍歷：

屬性說明

.contents

子節點的列表，將所有兒子節點存入列表

.children

子節點的迭代型別，與.contents類似，用於迴圈遍歷兒子節點

.descendants

子孫節點的迭代型別。包含所有子孫節點，用於迴圈遍歷

下行遍歷舉例;

soup.head#

獲取head節點,返回

soup.head.contents#

獲取兒子節點，返回[this is a python demo page]

soup.body.contents#

body標籤的contents資訊

len(soup.body.contents)

#兒子節點的個數，發現有5個，返回5

soup.body.contents[1]#

利用列表操作獲得特定兒子節點的資訊

#返回the demo python introduces several python courses.

#另兩種遍歷方法

'''利用迴圈遍歷兒子節點或子孫節點

for child in soup.body.children:

print(child)

或者for child in soup.body.descendants:

print(child)

'''

標籤樹的上行遍歷：

兩個屬性：

.parent 節點的父親標籤

.parents 節點先輩標籤的迭代型別，用於迴圈遍歷先輩節點

上行遍歷

soup.title.parent#

title的父親是head#返回

soup.html.parent#

html的父親是自己

#
標籤樹上行遍歷**彙總
soup=beautifulsoup(demo,'
html.parser')
for parent in
soup.a.parents:
if parent is none:#
如果先輩是none
print
(parent)
else
:        
print
(parent.name)#返回
'''p
body
html
[document]
'''

標籤樹的平行遍歷

屬性說明

.next_sibling

返回按照html文字順序的下乙個平行節點標籤

.previous_sibling

返回按照html文字順序的上乙個平行節點標籤

.next_siblings

迭代型別，返回按照html文字順序的後續所有平行節點標籤

.previous_siblings

迭代型別，返回按照html文字順序的前續所有平行節點標籤

所有的平行遍歷必須發生在同乙個父親節點下

soup.a.next_sibling#

返回' and '

soup.a.next_sibling.next_sibling

#返回 advanced python

soup.a.previous_sibling

soup.a.previous_sibling.previous_sibling

標籤樹的平行遍歷

#遍歷後續節點

for sibling in

soup.a.next_siblings:

(sibling)

#遍歷前續節點

for sibling in

soup.a.previous_siblings:

print(sibling)

基於bs4的html格式化和編碼

格式化：

當我們使用soup.prettify()語句時，prettiffy()會給html檔案加上換行符，使得檔案按規則合適輸出

我們也可以單獨對某乙個標籤做prettify()處理，比如soup.a.prettify()

編碼：btf-8編碼，使用python3就可以保證不用轉換。

基於bs4庫的HTML標籤遍歷方法

html可以看做一棵標籤樹屬性說明 contents 將該標籤所有的兒子節點存入列表 children 子節點的迭代型別，和contents類似，用於遍歷兒子節點 descendants 子孫節點的迭代型別，包含所有的子孫跌點，用於迴圈遍歷 import requests from bs4 imp...

基於bs4庫的HTML查詢方法

find all name,attrs,recursive,string,kwargs 返回乙個列表型別，內部儲存查詢的結果對標籤名稱的檢索字串 import requests from bs4 import beautifulsoup r requests.get demo r.text sou...

基於bs4的網頁遊歷

1.html的基本格式 1.下行遊歷。1.1 contents import requests r requests.get demo r.text from bs4 import beautifulsoup soup beautifulsoup demo,html.parser print sou...

基於bs4庫的HTML內容遍歷方法

基於bs4庫的HTML標籤遍歷方法

基於bs4庫的HTML查詢方法

基於bs4的網頁遊歷

相關推薦