抓取網頁例項:
importurllib.request
file = urllib.request.urlopen('')
data = file.readlines()
with
open('c:/users/python/desktop/myhtml/my1.html'
,'wb') as
f:for
i in
data:
f.write(i)
file.read()讀取檔案的全部內容,與readlines不同的是,read會把讀取的內容付給乙個字串變數。
file.readline()讀取檔案的一行內容。
瀏覽器的模擬:headers屬性(出現403錯誤)
方法一:使用build_opener()修改報頭
importurllib.request
url =
""headers = ("user-agent"
,"mozilla/5.0 (windows nt 6.1;wow64) "
"(khtml, like gecko) chrome/38.0.2125.122 "
"safari/537.36 se"
"2.x metasr 1.0")
opener = urllib.request.build_opener()
opener.addheaders = [headers]
data = opener.open(url).readlines()
with
open('c:/users/python/desktop/myhtml/my2.html'
,'wb') as
f:for
i in
data:
f.write(i)
方法二:使用add_header()新增報頭
import超時設定urllib.request
url =
""req = urllib.request.request(url)
req.add_header("user-agent"
,"mozilla/5.0 (windows nt 6.1;wow64)"
"(khtml, like gecko) chrome/38.0.2125.122"
" safari/537.36 se 2.x metasr 1.0")
data = urllib.request.urlopen(req).readlines()
with
open('c:/users/python/desktop/myhtml/my2.html'
,'wb') as
f:for
i in
data:
f.write(i)
importhttp請求urllib.request
for
i in
range(1
,100):
try:
file = urllib.request.urlopen(""
,timeout
= 0.1) #timeout
修改超時時間
data = file.readlines()
print(len(data))
except
exception
as e:
print("
出現異常
-->"
+str(e))
1.get請求
importurllib.request
keywd =
"hello"
url =
"/s?wd="
+keywd
req = urllib.request.request(url)
data = urllib.request.urlopen(req).readlines()
with
open('c:/users/python/desktop/myhtml/my3.html'
,'wb') as
f:for
i in
data:
f.write(i)
importurllib.request
keywd = "國家
"#當keywd
為中文時,對
keywd
進行編碼
key_code = urllib.request.quote(keywd)
url =
"/s?wd="
+key_code
req = urllib.request.request(url)
data = urllib.request.urlopen(req).readlines()
with
open('c:/users/python/desktop/myhtml/my4.html'
,'wb') as
f:for
i in
data:
f.write(i)
url命名與反轉url
2.在cms應用的views.py檔案裡輸入如下 return httpresponse cms首頁 def login request return httpresponse cms登入頁面 3.在front應用的views.py檔案裡輸入如下 return httpresponse 前台首頁 d...
python異常處理及Url編碼
url編碼 import traceback import urllib.parse s besttest 自動化測試 print urllib.parse.quote s url編碼 print urllib.parse.quote plus s url編碼,src print urllib.pa...
說說靜態URL與動態URL
的url是優化的基礎,一般情況下就是減少動態引數 降低層級 偽靜態 規範搜尋變數對應的引數等幾種方法,特別是在企業站點,這種操作就相對更加簡單了。而從 長期運營的角度來講,越早解決越好,不然往後拖就會成為制約 發展和產品開發的決定性因素。百幫網路特分享如下內容 3 b3 d f.g n n e.首先...