python爬蟲爬取多頁內容

前幾天零組資料庫發文關閉，第乙個念頭是可惜，想著趕緊把資料儲存下來，卻發現爬蟲已經忘得差不多了，，，趕緊複習一波。

不多說，pycharm，啟動！

不知道爬啥，隨便找個網頁吧~

url：

首選獲取目標**html頁面

f12提取請求頭資訊，這裡我們只需ua即可根據網頁

meta標籤設定編碼格式

**如下：

import requests
from lxml import etree
defget_image()
:    base_url =
""headers =
#獲取響應資料
response = requests.get(base_url,headers=headers)
response_data = response.content.decode(
'gbk'
)# response_code = response.status_code
# print(response_code)
#儲存資料
with
open
('wall.html'
,'w'
,encoding=
'gbk'
)as f:
f.write(response_data)
get_image(
)

本地開啟驗證：

是沒有問題的。

不羅嗦了，直接上完整**：

import requests
from lxml import etree
defget_image()
:try
:        headers =
#儲存檔案路徑
path =
"c://users/administrator/desktop/image/"
#獲取響應資料
response = requests.get(url,headers=headers)
response_data = response.content.decode(
'gbk'
)#判斷是否有響應
# response_code = response.status_code
# print(response_code)
#儲存資料
#with open('wall.html','w',encoding='gbk')as f:
#  f.write(response_data)
#資料解析
#1.將資料解析為html
parse_data = etree.html(response_data)
#2.將需要的內容以字段的形式賦值給item
item_list = parse_data.xpath(
'//div/ul/li/a/img/@src'
)#用for迴圈遍歷整個列表並儲存
for item in item_list:
final_data = requests.get(item,headers=headers)
.content
with
open
(path + item[-7
:],'wb'
)as f:
f.write(final_data)
#print(item)
except
:print
('error'
)def
get_page()
:#取前10頁
urls =
["".format
(str
(i))
for i in
range(1
,11)]
#輸出驗證
#print(urls)
return urls
if __name__ ==
'__main__'
:#主函式
get_page(
)for url in get_page():
get_image(
)

執行結果：

簡單總結為幾個流程：

1.獲取目標**，填充請求頭。

2.用urllib或requests儲存資料。

3.用，正則，beautifulsoup，xpath解析資料。

4.儲存資料。

多頁爬取資料

beautifulsoup自動將輸入文件轉換為unicode編碼，輸出文件轉換為utf 8編碼。你不需要考慮編碼方式，除非文件沒有指定乙個編碼方式，這時，beautifulsoup就不能自動識別編碼方式。這時，你只需要說明一下原始編碼方式就ok。引數用lxml就可以，需要另行安裝並載入。beauti...

爬蟲爬取多頁資料

最近在寫乙個簡單的爬蟲,最開始使用的是bs4工具,但是後面接觸到xpath,覺得這個比較適合我哈哈.然後用xpath又重新寫了一遍,其中讓我困擾的還是多頁爬取,ip老是被封.網上找了很多方法,大多數都是說要建立乙個ip池,迴圈爬取多頁資料的時候,就換ip這樣就不會被封了.然後 ip有兩種,乙個要付費...

5 簡單python爬蟲爬取新聞頁

python爬蟲例項爬取新聞實現過程，先爬首頁，通過正規表示式獲取所有新聞鏈結，然後依次爬各新聞，並儲存到本地 import urllib.request import re data urllib.request.urlopen read data2 data.decode utf 8 ign...

python爬蟲爬取多頁內容

多頁爬取資料

爬蟲 爬取多頁資料

5 簡單python爬蟲 爬取新聞頁

相關推薦

爬蟲爬取多頁資料

5 簡單python爬蟲爬取新聞頁