Python 爬蟲實戰內建模組urllib介紹

請求方法

結語

那麼接下來就正式進入正題了

urllib.request模組是python中的內建模組。所以在我們使用它時就不用再去重新安裝了。urllib.request模組中又很多的類和方法,下面我來說一下他們：

import urllib.request
url =
''url
resopnse = urllib.request.urlopen(url)

urllib.request.request(url, headers=『字典』) : 作用也是傳送請求獲取響應物件，但是與urllib.requset.urlopen()有所不同。當我們在傳送請求時，如果不需要過多的引數傳遞，則可以使用urllib.request.urlopen(),否則就使用urllib.request.requset().但是，如果還需要多響應物件做後續操作，就需要urllib.request.urlopen()的包裝。

import urllib.request
url =
''headers =
#傳送請求並獲取響應物件
request = urllib.request.request(url, headers=headers)
#對request進行包裝
response = urllib.request.urlopen(request)
#獲取響應內容
html = response.read(
)# 獲取網頁原始碼
html2 = response.read(
).decode(
'utf-8'
)# 獲取『utf-8'格式的**
print
(html)
print
(html2)

對響應物件的操作以下方法需要urllib.request.urlopen()方法的包裝才能使用

urllib.parse模組主要是用來編碼用的。一般是在對訪問url進行組合是用到

quote() ：也是進行編碼的，與urlencode()不同的是，該方法可以不用字典結構儲存搜尋內容，不過url組裝時，相同部分與urlencode()有所不同

在我們請求網頁時，我們有兩種方式進行。

post 請求：

在request()方法中需要新增data引數，data引數是乙個字典。裡面包含了一些需要傳送給伺服器的資訊

表單資料以bytes型別提交，不能是str型別

import urllib
import urllib.request
import os
import re
# class
spider
:def
__init__
(self)
:        self.headers =
defrequest_url
(self, url)
:        requset = urllib.request.request(url=url, headers=self.headers)
response = urllib.request.urlopen(requset)
html = response.read(
).decode(
'utf-8'
)        self.parse_html(html)
defparse_html
(self, html)
:        picurllist = re.findall(r'"objurl":"(.*?)"'
, html)
self.parse_pic_urls(picurllist)
defparse_pic_urls
(self, picurllist)
:for picurl in picurllist:
picname = picurl.split(
'/')[-
1]self.store_date(picname, picurl)
defcreate_dir
(self)
:try
:            os.mkdir(
'./images'
)except fileexistserror as e:
print
(e)        os.chdir(
'./images'
)def
store_date
(self, picname, picurl)
:try
:            response = urllib.request.urlopen(picurl)
picture = response.read(
)except exception as e:
print
(e)else
:with
open
(picname,
'w+b'
)as f:
print
(picname)
f.write(picture)
if __name__ ==
'__main__'
:    baseurl =
''start =
eval
(input
("請輸入起始頁："))
end =
eval
(input
("請輸入結束頁："))
spider = spider(
)    spider.create_dir(
)for page in
range
(start, end +1)
:        url = baseurl +
str(page *10)
print
('開始爬取第'
+str
(page)
+'頁'
				python內建模組 Python 內建模組
內建模組 python有一套很有用的標準庫 standard library 標準庫會隨著python直譯器，一起安裝在你的電腦中的。它是python的 乙個組成部分。這些標準庫是python為你準備好的利器，可以讓程式設計事半功倍。常用標準庫 標準庫 說明 builtins 內建函式預設載入 os...
				python爬蟲實戰
python python基礎 python快速教程 python學習路線圖 python大資料學習之路 python爬蟲實戰 python pandas技巧系 量化小講堂 python機器學習入門資料梳理 學習群 大資料 python資料探勘2 323876621 r r語言知識體系 怎樣學習r ...
				Python內建模組
os.remove 刪除檔案 os.unlink 刪除檔案 os.rename 重新命名檔案 os.listdir 列出指定目錄下所有檔案 os.curdir 返回當前目錄 os.pardir 獲取當前目錄的父目錄字串名 os.chdir 改變當前工作目錄 os.getcwd 獲取當前檔案路徑 os...

Python 爬蟲實戰 內建模組urllib介紹

python內建模組 Python 內建模組

python爬蟲實戰

Python內建模組

相關推薦

Python 爬蟲實戰內建模組urllib介紹