Python獲取網頁Html文字

通過urllib2包，根據url獲取網頁的html文字內容並返回

#
coding:utf-8
import
requests, json, time, re, os, sys, time
import
urllib2
#設定為utf-8模式
reload(sys)
sys.setdefaultencoding( 
"utf-8")
defgethtml(url):
response =urllib2.urlopen(url)
html =response.read()
#可以根據編碼格式進行編碼
#html = unicode(html,'utf-8')
return
html 
url = '
'print gethtml(url)

或者

def
gethtml(url):
#使用將urllib2.request()例項化,需要訪問的url位址則作為request例項的引數
request =urllib2.request(url)
#request物件作為urlopen()方法的引數,傳送給伺服器並接收響應的類檔案物件
response =urllib2.urlopen(request)
#類檔案物件支援檔案物件操作方法
#如read()方法讀取返回檔案物件的全部內容並將其轉換成字串格式並賦值給html
html =response.read()
#可以根據編碼格式進行編碼
#html = unicode(html,'utf-8')
return
html 
url = '
'print gethtml(url)

def
gethtml(url):
#構造ua
ua_header = 
#url連同headers一起構造request請求,這個請求將附帶ie9.0瀏覽器的user-agent
request = urllib2.request(url,headers=ua_header)
#設定超時時間
response = urllib2.urlopen(request,timeout=60)
html =response.read()
return
html
url = '
'print gethtml(url)

新增header屬性：

def
gethtml(url):
ua = 
request =urllib2.request(url)
#也可以通過request.add_header()新增/修改乙個特定的header
request.add_header("
connection
","keep-alive") 
response =urllib2.urlopen(request)
html =response.read()
#檢視響應碼
print
'相應碼為:
',response.code
#也可以通過request.get_header()檢視header資訊
print
"connection:
",request.get_header("
connection")
#或者print request.get_header(header_name = "
connection")
#print html 
return html

新增隨機ua

#
coding:utf-8
import
requests, json, time, re, os, sys, time
import
urllib2
import
random
#設定為utf-8模式
reload(sys)
sys.setdefaultencoding( 
"utf-8")
defgethtml(url):
#定義ua池,每次隨機取出乙個值
ua_list = ["
mozilla/5.0 (macintosh; intel mac os x 10.6; rv2.0.1) gecko/20100101 firefox/4.0.1
","mozilla/5.0 (windows nt 6.1; rv2.0.1) gecko/20100101 firefox/4.0.1
","opera/9.80 (macintosh; intel mac os x 10.6.8; u; en) presto/2.8.131 version/11.11
","opera/9.80 (windows nt 6.1; u; en) presto/2.8.131 version/11.11
",""
]    user_agent =random.choice(ua_list)
#print user_agent
request =urllib2.request(url)
request.add_header(
"connection
","keep-alive")
request.add_header(
"user-agent
",user_agent)
response = urllib2.urlopen(request,data=none,timeout=60)
html =response.read()
#print '響應碼為:',response.code
#print 'url:',response.geturl()
#print 'info:',response.info()

Python 獲取 html 網頁內容

一篇基礎文章，不講爬蟲。單純的獲取標籤元素的值操作網頁。用到了 selenium 包。這個包需要給瀏覽器安裝驅動，不同的瀏覽器需要的驅動不同。環境搭建參考需要注意，windows版本的驅動檔案.exe需要放在python.exe所在的目錄下，環境變數才能生效別問我為什麼，我也不知道打狐瀏覽...

獲取網頁html內容

獲取網頁html內容今天寫個簡單的程式，根據指定的 url 來抓取相應的網頁內容，然後存入本地檔案。這個程式會涉及到網路請求和檔案操作等知識點，下面是實現二讀取資源資料 body byte body,err ioutil.readall res.body 關閉資源流 res.body.clos...

python 獲取網頁內容 python

詳細內容 python用做資料處理還是相當不錯的，如果你想要做爬蟲，python是很好的選擇，它有很多已經寫好的類包，只要呼叫，即可完成很多複雜的功能。contents page.read 獲得了整個網頁的內容也就是源 print contents url代表 contents代表所對應的源 ur...

Python獲取網頁Html文字

Python 獲取 html 網頁內容

獲取網頁html內容

python 獲取網頁內容 python

相關推薦