python之urllib的基本使用

urllib是我們用來網路請求的乙個第三方庫，可以制定url,獲取網頁資料

import urllib.request
# 傳送乙個get請求
def getreq()
:# 引入urllib庫中的request模組
# 呼叫request中的urlopen方法
# timeout設定超時時間
response = urllib.request.
urlopen
("", timeout=1)
# 由於respon是乙個物件位址，我們需要呼叫read()來獲取資料 使用urf-8解碼
# print(response.read().decode(
"utf-8"))
print
(response.status)  # 獲取返回狀態資訊碼
print
(response.
getheaders()
)  # 獲取響應頭
print
(response.
getheader
("content-type"
))  # 獲取響應頭某個具體資料
import urllib.parse
# 傳送乙個post請求
def postreq()
:# 使用urllib.parse進行引數封裝
data = urllib.parse.
urlencode()
# 使用二進位製流進行編碼
param =
bytes
(data, encoding=
"utf-8"
)    response = urllib.request.
urlopen
("", data=param)
print
(response.
read()
.decode
("utf-8"))
# 忽略https證書
# 請求豆瓣
def reqdouban()
:try
:        url =
""headers =
data =
bytes
(urllib.parse.
urlencode()
, encoding=
"utf-8"
)# 設定url，設定請求資料，設定請求頭，設定請求方式
("請求失敗"
)reqdouban
()

python爬蟲之urllib 二

urllib.error可以接收urllib.request產生的異常，urllib.error有三個方法，如下 urlerror是oserror的乙個子類，httperror是urlerror的乙個子類，伺服器上http的響應會返回乙個狀態碼，根據這個http狀態碼，我們可以知道我們的訪問是否成功...

python爬蟲之urllib 四

每個都會定義robots.txt 檔案，這個檔案可以告訴網路爬蟲爬取該時存在哪些限制。作為良好網民以及其他人利益，一般上遵從這些限制。如何檢視這個檔案？可以通過在目標站點或網域名稱後面加上 robots.txt 進行訪問。例如目標站點的 robots.txt 檔案就是 robots.tx...

Python模組之urllib模組

py2.x urllib庫 urllin2庫 py3.x urllib庫變化在pytho2.x中使用import urllib2 對應的，在python3.x中會使用import urllib.request，urllib.error。在pytho2.x中使用import urllib 對應的，...

python之urllib的基本使用

python爬蟲之urllib 二

python爬蟲之urllib 四

Python模組之urllib模組

相關推薦