python爬蟲之嗅事百科笑話

基礎爬蟲參考：

建議閱讀這個，寫的很清晰易懂

說明文件：

直接執行，會將嗅事百科第1頁到第20頁的笑話內容爬下來；

例如：

嗅事百科有很多笑話，這裡我沒有處理；

# -*- coding:utf-8 -*-
from scrapy import selector
import urllib2
import sys
from time import sleep
reload(sys)
sys.setdefaultencoding('utf-8')
page = 1
for x in range(1, 20):
url = ''+str(page)
headers = 
try:
request = urllib2.request(url, headers = headers)
response = urllib2.urlopen(request)
except urllib2.urlerror, e:
if hasattr(e, 'code'):
print e.code
if hasattr(e, 'reason'):
print e.reason
#  selector的使用參考：
sel = selector(text=response.read(), type="html")
with open(r'c:\users\wang zuo\desktop\test.txt', 'a') as f:
# 通過 xpath來選擇title標籤內的文字:
for x in sel.xpath('//div[@class = "content"]/text()').extract():
f.write(x)
sleep(0.5)
page += 1

爬蟲實戰嗅事百科段子多頁爬取

假如我們想爬取糗事百科 http ww qiushibaike.com 上的段子，也可以編寫對應的python網路爬蟲實現。本專案糗事百科網路爬蟲的實現思路及步驟如下分析各頁間的規律，構造變數，並可以通過for迴圈實現多頁內容的爬取構建乙個自定義函式，專門用來實現爬取某個網頁上的段子，包括兩...

python爬蟲糗事百科

coding utf 8 import urllib2 import re 工具類 class tools object remove n re.compile r n replace br re.compile r remove ele re.compile r re.s rs 引數，要進行替換的...

Python爬蟲糗事百科

如果沒有這兩個庫在命令列任意位置下前提是你已經配置好了環境，這個網上大把，自行google pip install requests,pip install bs4 import beautifulsoup import requests from bs4 import beautifulsou...

python爬蟲之嗅事百科笑話

爬蟲實戰 嗅事百科段子多頁爬取

python爬蟲糗事百科

Python爬蟲 糗事百科

相關推薦

爬蟲實戰嗅事百科段子多頁爬取

Python爬蟲糗事百科