網路爬蟲之BeautifulSoup入門（四）

5.帶更多引數的find方法

官方文件給出的find方法的引數如下：find( name , attrs , recursive , string , **kwargs )，總體來看和find_all方法的引數沒什麼不同，在這裡仍以示例的方法給出常見的使用方法：

兩種方法的使用大致相同，注意以下兩種寫法都可以且輸出結果一致，但顯然使用find方法更方便。

soup.find_all('title', limit=1)
# [the dormouse's story]
soup.find('title')
# the dormouse's story

在這裡一定要注意：find_all方法的返回值為列表，而find直接返回結果；同時在沒有找到目標時，find_all返回空的列表，而find將返回none。

6. 輸出格式及編碼

- 使用prettify方法可以將beautifulsoup物件格式化輸出，這在大型專案內是非常有用的。當然也可以對物件的某乙個tag節點使用該方法，如下：

markup = 'i linked to example.com
'soup = beautifulsoup(markup)
print(soup.a.prettify())
# #  i linked to
#  #   example.com
#  #

若只想得到結果字串，而不注重格式的話，可以使用str方法，如下：

str(soup.a)
#'href="">i linked to example.comi>
a>'

7.get_text()

若想得到tag中包含的文字內容，可以使用get_text()方法，如下：

soup.get_text()
u'\ni linked to example.com\n'
soup.i.get_text()
u'example.com'

python 網路爬蟲 beautifulsoup

1.安裝beautifulsoup 2.使用beautifulsoup快速建立格式 from bs4 import beautifulsoup html 名字年齡性別地點小一 28 北京 soup beatifulsoup html print soup.prettify 3.使用beaut...

從零開始學網路爬蟲之BeautifulSoap

之前我們介紹了正規表示式，可能有的小夥伴也對寫正規表示式的用法還不夠熟練，沒關係，我們還有乙個更強大的工具，叫beautiful soup，它可以與requests配合使用，在獲得網頁原始碼後進行分析，實在是很方便。這一節就讓我們一就一起來學習一下beautiful soup。beautiful s...

爬蟲處理資料的方式（三）BeautifulSoup

使用beautifulsoup提取資料 from bs4 import beautifulsoup html html soup beautifulsoup html,lxml 建立乙個物件，接受html和解析方式 soup.a 拿到a標籤所有的內容，包括 soup.a.string 拿到a標籤裡面...

網路爬蟲之BeautifulSoup入門（四）

python 網路爬蟲 beautifulsoup

從零開始學網路爬蟲之BeautifulSoap

爬蟲處理資料的方式（三）BeautifulSoup

相關推薦