爬蟲實戰抓取糗事百科前10頁資料

# -*- coding:utf-8 -*-
import urllib2
import re
import lxml.html as html
def get_url(url): #封裝一次url的請求，獲得3個引數
'    header =  #不使用headers傳引數將無法獲取到頁面資料
try:
req = urllib2.request(url,headers=header)
response = urllib2.urlopen(req)
content = response.read().decode('utf-8')
# print content
tree = html.fromstring(content) #將原始碼進行轉化，這樣就可以通過tree來使用xpath，
aa = tree.xpath('//*[@id="content-left"]//div[@class="article block untagged mb15"]')
wenben = tree.xpath('//text()')
wenben = "".join(wenben).strip() #獲取當前頁面中文字，
meiyetiaoshu = len(aa)#獲取當前頁面中有幾條資料
return tree, meiyetiaoshu, wenben,
except urllib2.urlerror, e:
if hasattr(e, 'code'):
print e.code
if hasattr(e, 'reason'):
print e.reason
#抓取每一頁中的所有條數資料
def page_one(tree,wenben,meiyetiaoshu):
for i in range(1,meiyetiaoshu+1):
zuozhe = tree.xpath('//*[@id="content-left"]//div[{}]/div[1]/a[2]/@title'.format(i))
if not zuozhe: #對匿名使用者的處理，
zuozhe = tree.xpath('//*[@id="content-left"]//div[{}]/div[1]/span[2]/h2/text()'.format(i))
print i,zuozhe[0] #獲取得到使用者名稱，
duanzi = tree.xpath('//*[@id="content-left"]//div[{}]/a/div/span/text()'.format(i))
print duanzi[0].strip() #獲取得到使用者說的笑話資訊，
sub_zuozhe= re.sub(u'(\*|\(|\)|\~|\^)','',zuozhe[0]) #裡面有特殊字元，將其中的特殊字元替換掉
wenben = re.sub(u'(\*|\(|\)|\~|\^)','',wenben) #將文字中的特殊字元頁替換掉，
haoxiao_group = re.search(u'%s[\d\d]+?(\d+ 好笑)'%sub_zuozhe,wenben)
#抓取前10頁資料，
for page in range(1,11):
url = ''%str(page) #前10頁的url
tree, meiyetiaoshu, wenben = get_url(url)
page_one(tree=tree,wenben=wenben,meiyetiaoshu=meiyetiaoshu)

爬蟲實戰糗事百科

閒來無聊，在網上按照教程寫了乙個python爬蟲，就是竊取資料然後儲存下來爬蟲實戰糗事百科。從糗百上爬取段子，然後輸出到console，我改了一下儲存到了資料庫。不扯沒用的，直接上這是爬取得部分 usr bin python coding utf 8 import urllib import u...

Python爬蟲實戰糗事百科

前面我們已經說了那麼多基礎知識了，下面我們做個實戰專案來挑戰一下吧。這次就用前面學的urllib和正規表示式來做，python爬蟲爬取糗事百科的小段子。爬取前我們先看一下我們的目標 1.抓取糗事百科熱門段子 2.過濾帶有的段子首先我們確定好頁面的url，糗事百科的是但是這個url不方便我們後面...

爬蟲實戰（二）爬取糗事百科段子

源為 from urllib.request import request,urlopen import requests import re import time def gethtml url headers 設定虛擬headers資訊 request request url,headers...

爬蟲實戰 抓取糗事百科前10頁資料

爬蟲實戰 糗事百科

Python爬蟲實戰 糗事百科

爬蟲實戰（二） 爬取糗事百科段子

相關推薦

爬蟲實戰抓取糗事百科前10頁資料

爬蟲實戰糗事百科

Python爬蟲實戰糗事百科

爬蟲實戰（二）爬取糗事百科段子