Python1 糗事百科笑話爬取

剛剛入門python，一切都是摸索階段。把自己的問題記錄下來避免之後再犯相同的錯誤。運氣好或許還能幫助有緣人( ⊙o⊙ )

python的簡單入門主要需要看兩個文件：requests和bs4。鏈結如下：

requests

bs4在真正開始寫自己的第乙個爬蟲的例項前，在網上看了其他許多大神的教程。這裡是我第乙個複製練習的例子：

大神教程

這裡主要就是參考第乙個**例項，基本沒有變化。

def
get_data
(html ):
final = 
bs = beautifulsoup(html, "html.parser")
body = bs.body
content_left = body.find(id = 'content-left') #找到該頁總框
contents = content_left.find_all('div',class_ = 'article block untagged mb15')#找到所有內容框
for content in contents: #對每個故事進行遍歷
temp = 
author = content.find('div',class_='author clearfix')#找到使用者
user_name = content.find("h2").string#獲取使用者名稱
data = content.find(class_ = 'content')
story = data.find('span').get_text()#找到笑話內容
good = numbers[0].string + '好笑'
#獲取點讚數
return final

這裡主要是對bs4的運用。但是需要注意story這個資料，通過chrome的f12可以看到這個部分有時候包含了

標籤。這是因為有些使用者使用了換行符。如果再使用.string將得不到結果。這裡可以用get_text()獲取整個內容。

python 爬取糗事百科

step 1 構建乙個提取糗事百科笑話的函式import urllib2 import urllib import re import thread import time import sys reload sys sys.setdefaultencoding utf 8 defgetpage p...

Python爬取糗事百科

一引入模組因為urlopen功能比較簡單，所以設定 ip需引入proxyhandler和build opener模組，ip的獲取可以上西祠查詢 import re from urllib.request import request,build opener,proxyhandler base...

Python 爬取糗事百科

coding utf 8 import urllib2 import urllib import re class qiushi def init self self.page 1 從網頁獲取糗事 def getqiushis self,page url page 偽裝瀏覽器 user agent ...

Python1 糗事百科笑話爬取

python 爬取糗事百科

Python爬取糗事百科

Python 爬取糗事百科

相關推薦