BeautifulSoup4的基本使用

序：beautifulsoup是python解析html非常好用的第三方庫！

pip install beautifulsoup4

from bs4 import beautifulsoup
html_str = ""
"        
"""    soup = beautifulsoup(html_str,'html.parser')
#html物件  text文字去掉標籤
# print(soup)
# print(soup.text)
# 物件型別
# print(type(soup))
# 查詢a標籤 .text列印a的內容
# print(soup.find('a'))
# print(soup.find('a').text)
# 查詢a標籤 class=baidu的
# 查詢id=lagou
# print(soup.find(id='lagou'))
# 查詢title='mmm' 前邊可以寫具體找哪個標籤
# print(soup.find(title='mmm'))
# find_all 找所有  返回乙個list 陣列型別
# print(soup.find_all('a'))
# print(soup.find_all('a')[0]) #第乙個
all_a = soup.find_all('a')
for item in
all_a:
ifitem:
# print(item.attrs)
print(item.attrs['href']) #dict型別

Beautiful Soup4的簡單使用

beautiful soup是乙個python庫 beautiful soup 是乙個可以從html或xml檔案中提取資料的python庫.它能夠通過你喜歡的轉換器實現慣用的文件導航,查詢,修改文件的方式.如果想使用當然需要先安裝beautiful soup 命令 pip install beaut...

網頁爬蟲 BeautifulSoup4模組介紹

2 beautifulsoup4處理標籤方法 3 正規表示式 4 其它 pip install beautifulsoup4import bs4 引入urllib.request模組 import urllib.request html.read 為urllib.request.urlopen 方法...

爬蟲筆記關於Beautiful Soup 4

再使用beautiful soup 4時遇到了一些問題，找到了解決方法，通過本博文將遇到的問題和解決方法記錄下來，方便回顧也希望能幫助大家解決類似問題。遇到這個錯誤的原因是文件包含以完全不同的編碼編寫的文字這時候需要待解析文字的指定編碼方式，通常可以在網頁原始碼中找到網頁的編碼方式，就像下圖接...

BeautifulSoup4的基本使用

Beautiful Soup4的簡單使用

網頁爬蟲 BeautifulSoup4模組介紹

爬蟲筆記 關於Beautiful Soup 4

相關推薦

爬蟲筆記關於Beautiful Soup 4