爬蟲 Python爬蟲學習筆記之Urllib庫

1.urllib.request開啟和讀取url

2.urllib.error包含urllib.request各種錯誤的模組

3.urllib.parse解析url

4.urllib.robotparse解析**robots.txt檔案

傳送get請求

# 引入urlopen庫 用於開啟網頁
from urllib.request import urlopen
# 獲取內容
html = urlopen(
'')# 讀取返回的內容
response = html.read(
)# 輸出(豆瓣首頁二進位制源**)
print
(response)
# 對二進位制內容進行解碼
# 輸出(豆瓣首頁文字**)
print
(response.decode(
'utf-8'
))

實現效果

傳遞url引數

# 引入requests請求模組
import urllib.request
# 引入**解析模組
import urllib.parse
payload =
requst_url =
'/search'
# 對要新增的url引數進行編碼
payload_encode = urllib.parse.urlencode(payload)
# 構造實際請求的url
url = requst_url +
'?'+ payload_encode
# 請求直接返回的是二進位制
response = urllib.request.urlopen(url)
# 解碼並輸出
print
(response.read(
).decode(
'utf-8'
))

實現效果

模擬瀏覽器傳送get請求

# 引入requests請求模組
import urllib.request
# 引入**解析模組
import urllib.parse
url =
''headers =
# 構造請求 在url中新增user-agent
request = urllib.request.request(url, headers=headers)
response = urllib.request.urlopen(request)
.read(
)

post傳送乙個請求

# 可以在下面直接呼叫request和parse
from urllib import request, parse
post_data = parse.urlencode([(
'key1'
,'v1'),
('k2'
,'v2')]
)# 構造請求url,構造乙個request物件
url = request.request(
'')# 新增headers
url.add_header(
'user-agent',)
response = request.urlopen(url, data=post_data.encode(
'utf-8'))
.read(
)print
(response)

urljoin函式

# 使用urljoin拼接正確url
from urllib.parse import urljoin
urljoin(
'','/passport/login'
)

Python學習筆記之爬蟲

爬蟲排程端啟動爬蟲，停止爬蟲，監視爬蟲運況網頁解析器 beautiful soup 語法例如以下對應的 1 建立beautifulsoap物件 2 搜尋節點 find all,find 3 訪問節點資訊文件字串，解析器，指定編碼utf 8 print 獲取所有的連線 links soup...

Python之網路爬蟲學習筆記

大資料時代資料獲取的方式 1 企業生產的使用者資料大型網際網路公司有海量使用者，所以他們積累資料有天然的優勢有資料意識的中小企業，也開始積累資料。2 資料管理諮詢公司通常這樣的公司有很龐大的資料採集團隊，一般會通過市場調研問卷調查固定的樣本檢測和各行各業的公司進行合作專家對話資料積累很...

python爬蟲學習筆記之requests庫

通用框架 r requests.get url r 表示response物件，包含爬蟲返回的內容。屬性說明r.status code http請求的返回狀態，200表示連線成功，404表示失敗 r.texthttp r.encoding 從http header中猜測的響應內容編碼方式從內容中...

爬蟲 Python爬蟲學習筆記之Urllib庫

Python學習筆記之爬蟲

Python之網路爬蟲學習筆記

python爬蟲學習筆記之requests庫

相關推薦