BeautifulSoup模組的簡單使用

可以通過dir(beautifulsoup.beautifulsoup)檢視其有什麼函式，如果想知道某個函式的含義可以使用help(beautifulsoup.beautifulsoup.find)來檢視其官方文件。

可以使用pprint來整輸出，使用dir和help之前一定要import beautifulsoup。

# -*- coding:utf8 -*-
import urllib
import urllib2
import beautifulsoup
import re
htmlsource = urllib.urlopen("").read(200000)
soup = beautifulsoup.beautifulsoup(htmlsource)
#輸出print soup.head
#輸出...
print soup.head.title
#會返回乙個列表，每個列表元素都是... 
tags = soup.findall('a')
print tags
print '京東放養的爬蟲'
#取中間包含的元素，如果有href則輸出
for item in soup.fetch('a',href=true):
print item['href']
#找到所有的,如果其中href元素中含有taobao則輸出
for a in soup.findall('a',href=true):
if re.findall('taobao', a['href']):
print "found the url:", a['href']
#輸出中間class屬性等於j_tanx mod，只輸出第乙個
print str(soup.find("div",))

爬蟲 BeautifulSoup 模組

二根據這個dom樹就可以按照節點的名稱屬性和文字搜尋節點 find all 方法會搜尋出所有滿足要求的節點，find 方法只會搜尋出第乙個滿足要求的節點兩個方法的引數一模一樣三得到節點以後，就可以訪問它的名稱屬性文字。a為標籤名稱超連結 href，class為屬性，顯示在頁面上的是p...

BeautifulSoup模組解析html

beautiful soup是乙個的三方模組，用於從html頁面提取資訊用於這個目的時，它比正規表示式更好用安裝及匯入 pip install beautifulsoup4 安裝import bs4 匯入接下來開始學習這個模組 bs.beautifulsoup 函式呼叫時需要乙個字串，其中包含...

資料解析模組BeautifulSoup簡單使用

1 準備測試頁面test.html html head title the dormouse s story title head body p class title b the dormouse s story b p p class story once upon a time there w...

BeautifulSoup模組的簡單使用

爬蟲 BeautifulSoup 模組

BeautifulSoup模組解析html

資料解析模組BeautifulSoup簡單使用

相關推薦