urllib簡單使用

urllib簡介

爬取網頁

讀取內容常見的有3種方式:

import urllib.request
html  = urllib.request.urlopen(
'')html.readline(
)html.read(
4096
)html.readlines(
)

有些檔案比較大,需要像讀取檔案一樣,每次讀取一部分資料

import urllib.request
html	= urllib.request.urlopen(
'')fobj =
open
('/tmp/python.pdf'
,'ab'
)while
true
:	data = html.read(
4096)if
not data:
break
fobj.write(data)
fobj.close(
)

模擬客戶端

import urllib.request
url=
''header =
html=urllib.request.request(url,headers=header)
data=urllib.request.urlopen(request)
.read(
)

import wget

import os

import re

from urllib import request

defget_web

(url, fname)

: headers =

r = request.request(url, headers=headers)

js_index = request.urlopen(r)

with

open

(fname,

'wb'

)as fobj:

while

true

: data = js_index.read(

4096)if

not data:

break

fobj.write(data)

defget_urls

(fname, patt)

: patt_list =

cpatt = re.

compile

(patt)

with

open

(fname)

as fobj:

for line in fobj:

m = cpatt.search(line)

if m:))

return patt_list

if __name__ ==

'__main__'

:# 將存到dst目錄，如果目錄不存在則建立

dst =

'/my/jianshu'

ifnot os.path.exists(dst)

: os.mkdir(dst)

get_web(

'','/my/jianshu/js.html'

)# 在網頁中找到所有的位址

urllib高階

資料編碼

>>
>	urllib.request.quote(
'hello world!'
)'hello%20world%21'
>>
>	urllib.request.unquote(
'hello%20world%21'
)'hello world!'

http異常處理

urllib簡單介紹

urllib簡介 1.urllib模組是python的乙個請求模組 2.python2中是urllib和urllib2相結合實現請求的傳送.python3中統一為urllib庫 3.urllib是python內建的請求庫,其包含4個模組 1 request模組模擬傳送請求 2 error模組異常...

urllib簡單網頁抓取

urllib包抓取網頁，處理url，包含模組用urllib實現簡單的網頁抓取 coding utf 8 from urllib import request import chardet if name main response request.urlopen html response.re...

urllib使用詳解

urllib.parse.urlencode query 將query字典轉換為url路徑中的查詢字串urllib.parse parse qs qs 將qs查詢字串格式資料轉換為python的字典urllib.request.urlopen url,data none 傳送http請求，如果dat...

urllib簡單使用

urllib簡單介紹

urllib簡單網頁抓取

urllib使用詳解

相關推薦