使用Beautiful Soup扒取指定標題

#
coding:utf-8
import
requests
from bs4 import
beautifulsoup
base_lib='
html5lib
'ua='
'headers=
url='
'resp = requests.get(url, headers=headers) #
向指定的url發出請求得到響應物件resp
text = resp.text.encode('
iso-8859-1
').decode('
utf-8
') #
通過resp物件的text可以得到響應文字，但需要字元編碼的轉換
bs = beautifulsoup(text, base_lib)  #
如果不使用html5lib,系統缺省會使用lxml，beautiful soup就是html解析器
divs=bs.select('
div.col.middle-column-home > div
') #
獲取符合css選擇器內容,得到乙個list
for div in divs[:10]: #
只取前10個（pc端）分類，後面是移動端
h4s=div.select('
h4') #
從每個分類中找出h4標題
for h4 in
h4s:
print h4.text

從菜鳥教程中扒取的標題截圖：

BeautifulSoup 安裝使用

linux環境 1.安裝方法一解壓 tar xzvf beautifulsoup4 4.2.0.tar.gz 安裝進入解壓後的目錄 python setup.py build sudo python setup.py install 方法二快速安裝 ubuntu sudo apt get i...

BeautifulSoup使用相關知識

1基礎使用，獲取某一內容的h1標籤 2複雜html解析 print name.get text get text 清除標籤，只保留內容 4通過網際網路採集外鏈 from urllib.request import urlopen from bs4 import beautifulsoup imp...

使用BeautifulSoup解析HTML

通過css屬性來獲取對應的標籤，如下面兩個標籤可以通過class屬性抓取網頁上所有的紅色文字，具體如下 from urllib.request import urlopen from bs4 import beautifulsoup html urlopen bsobj beautifulsou...

使用Beautiful Soup扒取指定標題

BeautifulSoup 安裝使用

BeautifulSoup使用相關知識

使用BeautifulSoup解析HTML

相關推薦