Python3 爬蟲抓取百度貼吧

前言

天象獨行

import
os,urllib.request,urllib.parse
'''測試要求：
1；輸入吧名，首頁，結束頁進行爬蟲。
2；建立乙個以吧名為名字的資料夾，裡面是每一頁的html的內容，檔名格式：吧名_page.html
'''url = "
"ba_name = input("")
home_page = int(input("
請輸入首頁："))
end_page = int(input("
請輸入結束頁："))
#建立乙個路徑變數：
path = "
c:\\users\\aaron\\documents\\python3-test
"os.makedirs(path)
'''pn = 0  第一頁
pn = 50 第二頁
pn = 100 第三頁
。。。。
pn = (n-1)*50 第n頁
'''for page in range(home_page,end_page+1):
#構造請求引數字典
data =
#構造請求hearders頭
#構造請求引數
url_get =urllib.parse.urlencode(data)
#構造請求url
url_get = url +url_get
#請求url
request =urllib.request.urlopen(url_get)
#建立乙個檔名
filename = ba_name + '
_' + str(page) + '
.html'#
拼接檔案路徑
filepath = path + '
\\' +filename
print
(filepath)
#寫入內容
with open(filepath,'wb'
) as fp:
fp.write(request.read())

執行結果：

3 百度貼吧爬蟲

被寫檔案坑了一晚上，因為自己寫了writefile 但是呼叫的是writefile 剛好python裡面有writefile 所以剛好不報錯！coding utf 8 created on 2018 7月12號 author sss 型別 get請求 from pip.vendor.distlib....

Python3爬蟲爬取百度貼吧

1.需求分析為了爬取貼吧中樓主所發表的帖子，並把內容提取出來儲存到txt檔案中。2.全部這份寫的比較早，所以裡面提取內容基本上用的全是正規表示式，並沒有呼叫一些非常高階的包。如下 coding utf 8 import urllib.request import urllib.parse im...

Python爬蟲百度貼吧

get請求 from urllib import request import urllib import time 第一頁第二頁 2 1 50 第三頁 3 1 50 第四頁 4 1 50 第n頁 n 1 50 推測第一頁 headers 根據url傳送請求，獲取伺服器響應檔案 defloadpa...

Python3 爬蟲 抓取百度貼吧

3 百度貼吧爬蟲

Python3爬蟲爬取百度貼吧

Python爬蟲 百度貼吧

相關推薦

Python3 爬蟲抓取百度貼吧

Python爬蟲百度貼吧