python微信爬蟲

import urllib.request
import re
import time
import urllib.error
# 自定義函式，功能為使用**伺服器爬乙個**
def use_proxy(proxy_addr, url):
# 異常處理機制
try:
req = urllib.request.request(url)  # request模擬瀏覽器
req.add_header = ('user-agent', 'mozilla/5.0 (windows nt 10.0; …) gecko/20100101 firefox/63.0')
proxy = urllib.request.proxyhandler()  # **伺服器
if hasattr(e, 'code'):  # 判斷是否有狀態碼
print(e.code)
if hasattr(e, 'reason'):  # 判斷是否有原因這個屬性
print(e.reason)
# 若為urlerror異常，延時10秒執行
time.sleep(10)
except exception as e:
print('exception:' + str(e))
# 若為exception異常，延時1秒執行
time.sleep(1)
key = 'python'
# 設定**伺服器，該**伺服器有可能失效，讀者需要換成新的有效**伺服器
proxy = '127.0.0.1:8888'
# 爬多少頁
for i in range(0, 10):
key = urllib.request.quote(key)
thispageurl = '' + key + '&type=2&page=' + str(i)
thispagedata = use_proxy(proxy, thispageurl)
pat1 = 'rs1 = re.compile(pat1, re.s).findall(str(thispagedata))
if (len(rs1) == 0):
print('此次（' + str(i) + '頁）沒成功')
continue
for j in range(0, len(rs1)):
thisurl = rs1[j]
thisurl = thisurl.replace('amp;', '')
file = 'e:/image/第' + str(i) + '頁第' + str(j) + '篇文章.html'
thisdata = use_proxy(proxy, thisurl)
print(len(thisdata))
try:
fh = open(file, 'wb')
fh.write(thisdata)
fh.close()
print('第' + str(i) + '頁第' + str(j) + '篇文章成功')
except exception as e:
print(e)
print('第' + str(i) + '頁第' + str(j) + '篇文章失敗')

python 微信爬蟲 python 微信爬蟲例項

import urllib.request import urllib.parse import urllib.error import re,time import queue import threading operner urllib.request.build opener operner...

微博爬蟲python 微博爬蟲 python

本文爬取的是m站的微博內容，基於python 2.7 一微博內容爬取 1.要爬取的微博首頁 2.手機微博是看不到翻頁，是一直往下載入的，但是其json格式的資料仍然以翻頁的形式呈現。3.開啟開發者工具，向下翻頁面，可以在network下的xhr的響應檔案中，找到json檔案的如通過分析發現每個...

關於微信指數爬蟲

1，普通條件欄位很好理解，就是size，page，keyword之類的，大多是控制資料庫的查詢條件，並且明文傳輸沒有加密。2，所以加密條件欄位就應該是有過加密的字段，例如passwd e10adc3949ba59abbe56e057f20f883e，密碼通常是要加密的，而且理論上應該是使用不可逆的加...

python微信爬蟲

python 微信爬蟲 python 微信爬蟲例項

微博爬蟲python 微博爬蟲 python

關於微信指數爬蟲

相關推薦