python爬蟲系列（二）標準庫的使用（A）

urllib庫是python中的乙個功能強大的，用於操作url。python2和python3中用法基本相同，但是。python2中分為urllib和urllib2庫。下面列出常見的變化有： 
1.python2.x使用import urllib2-->python3.x使用import urllib.request, urllib.error
2.python2.x使用import urllib -->python3.x使用import urllib.request, urllib.error,urllib.parse
3.python2.x使用import urlparse-->python3.x使用import urllib.parse
4.python2.x使用import urllib2.urlopen-->python3.x使用import urllib.request.urlopen
5.python2.x使用import urllib2.quote-->python3.x使用import urllib.request.quote
6.python2.x使用cookielib.cookiejar-->python3.x使用 http.cookiejar
7.python2.x使用import urllib2.request-->python3.x使用import urllib.request.request
大概也就這幾種常用到的模組。希望能認真的區分，這能讓你輕鬆面對兩個版本的python。

ps：本人一直使用py3.5版本，所以如版本不同，請自行按照一中介紹的進行切換。
小試牛刀：
print (file.read().decode('gbk'))

**難點解析：這段**雖然小，但卻涵蓋了常用到的爬蟲模組了。通過urllib.request傳送請求，把要登陸的使用者和密碼在通過urlencode的解析後構建post請求物件。

在這裡要強調–> 我們訪問的每乙個網際網路頁面都是通過http協議進行。而http 協議是乙個無狀態協議。所謂的無狀態協議就是無法維持會話之間的狀態。為了保持會話的暢通，cookie和session應運而生。

所以，這裡的**進行了cookie的物件的設定。即cjar = http.cookiejar.cookiejar()。然後自己建立了乙個opener物件，攜帶cookier物件。

注：urlencode的作用：接受引數形式為：[(key1, value1), (key2, value2),…] 和

返回的是形如key2=value2&key1=value1字串。

urllib.urlencode()

『name=%e8%80%81%e7%8e%8b&***=%e7%94%b7』

@關於cookie和session的使用，在接下來的系列中會詳細介紹。如果有什麼問題可以一起**一下。

python爬蟲系列 requests庫

前一篇文章中，我們學習了怎麼檢視儲存在網頁中的資訊，但要怎麼把這些資訊從網上抓取下來呢？接下來我們就來解決這個問題。讓我們一起走進requests。requests是用python語言基於urllib編寫的，採用的是apache2 licensed開源協議的http庫，requests它會比urll...

Python爬蟲系列 51job爬蟲（二）

利用for迴圈爬取多頁資料並匯出到excel 匯入一些工具包 import requests from lxml import etree from pandas import dataframe import pandas as pd jobinfoall dataframe for i in r...

爬蟲系列二

6.3.re庫的match物件 6.4.貪婪匹配和最小匹配七練習一切為了資料探勘的準備在中國大學mooc 上學習的北京理工大學嵩天老師的免費爬蟲課程課件，簡單易懂，感興趣的戳嵩天老師爬蟲課程。侵刪六正規表示式編譯將符合正規表示式語法的字串轉化為正規表示式特徵,只有在compile之...

python爬蟲系列（二） 標準庫的使用（A）

python爬蟲系列 requests庫

Python爬蟲系列 51job爬蟲（二）

爬蟲系列二

相關推薦

python爬蟲系列（二）標準庫的使用（A）