Python網頁抓取

#
coding:utf-8
import urllib   #
匯入模組
print dir(urllib)   #
檢視urllib方法
print help(urllib.urlopen)  #
檢視幫助文件
url="
"#定義**
html=urllib.urlopen(url)   #
開啟url
print html.read()   #
urlopen有乙個方法是read()
#解決編碼問題
print html.read().decode("
gb2312
").encode("
utf-8")
#忽略不能識別的內容
print html.read().decode("
gbk",'
ignore
').encode("
utf-8
")  
#獲取頭部資訊
print
html.info()
#獲取狀態碼
print
html.getcode()
#獲取url位址
print
html.geturl()
#urllib.urlretrieve(url,"
f:\\1.txt")
##3、乙個函式呼叫，可以任意定義函式的行為(要保證函式有3個引數)
#3.1 到目前為止傳遞的資料塊數量
#3.2 每個資料塊的大小，單位byte，位元組
#3.3 遠端檔案大小
#函式定義
defcallback(a,b,c):
"""這裡是注釋
"""#
關閉開啟的檔案，這是很重要的！
html.close()  
#判斷內容
code=html.getcode()
#判斷型別
print
type(code)
if code==200:
print"正常
"else
:    
print
"網頁異常
"

Python抓取網頁

在python中，使用urllib2這個元件來抓取網頁。coding utf 8 urllib2是python的乙個獲取urls uniform resource locators 的元件。import urllib2 它以urlopen函式的形式提供了乙個非常簡單的介面 response urll...

python抓取網頁過程

準備過程 1.抓取網頁的過程準備好http請求 http request 提交對應的請求獲得返回的響應 http response 獲得網頁原始碼 2.get還是post 3.headers 可選在某些情況下，直接抓取是被禁止的，此時需要提供乙個headers來告訴對方我不是機械人例如 1 ...

python 網頁內容抓取

使用模組 import urllib2 import urllib 普通抓取例項 usr bin python coding utf 8 import urllib2 url 建立request物件 request urllib2.request url 傳送請求，獲取結果 try response...

Python網頁抓取

Python抓取網頁

python抓取網頁過程

python 網頁內容抓取

相關推薦