Requests中文亂碼解決方案

分析：

r = requests.get(「

**r.text返回的是unicode型的資料。

使用r.content返回的是bytes型的資料。

也就是說，如果你想取文字，可以通過r.text。

如果想取，檔案，則可以通過r.content。**

方法1：使用r.content，得到的是bytes型，再轉為str

url='
'r =requests.get(url)
html=r.content
html_doc=str(html,'
utf-8
') #
html_doc=html.decode("utf-8","ignore")
print(html_doc)

方法2：使用r.text

requests 會自動解碼來自伺服器的內容。大多數 unicode 字符集都能被無縫地解碼。請求發出後，requests 會基於 http 頭部對響應的編碼作出有根據的推測。當你訪問 r.text 之時，requests 會使用其推測的文字編碼。你可以找出 requests 使用了什麼編碼，並且能夠使用 r.encoding 屬性來改變它.

但是requests庫的自身編碼為: r.encoding = 『iso-8859-1』

可以 r.encoding 修改編碼

url='
'r=requests.get(url)
r.encoding='
utf-8
'print(r.text)

方法1：r.content為bytes型，則open時需要open(filename,」wb」)

方法2：r.content為bytes型，轉為str後儲存

r = requests.get("
")html=r.content
html_doc=str(html,'
utf-8
') #
html_doc=html.decode("utf-8","ignore")
#print(html_doc)
with open('
test5.html
','w
',encoding="
utf-8
") as f:
f.write(html_doc)

方法3：r.text為str，可以直接儲存

r=requests.get("
")r.encoding='
utf-8
'html=r.text
with open(
'test6.html
','w
',encoding="
utf-8
") as f:
f.write(html)

#
-*-coding:utf8-*-
import
requests
from lxml import
etree
url="
"r=requests.get(url)
r.encoding="
utf-8
"html=r.text
#print(html)
selector =etree.html(html)
title=selector.xpath('
//title/text()')
print (title[0])

以上的方法雖然不會出現亂碼，但是儲存下來的網頁，不顯示，只顯示文字。而且開啟速度慢，找到了一篇部落格，提出了乙個終極方法，非常棒。

來自部落格的解決方案：

#
-*-coding:utf8-*-
import
requests
req = requests.get("
")if req.encoding == '
iso-8859-1':
encodings =requests.utils.get_encodings_from_content(req.text)
ifencodings:
encoding =encodings[0]
else
:    
#encode_content = req.content.decode(encoding, 'replace').encode('utf-8', 'replace')
global
encode_content
encode_content = req.content.decode(encoding, '
replace
') #
如果設定為replace，則會用?取代非法字元；
print
(encode_content)
with open(
'test.html
','w
',encoding='
utf-8
') as f:
f.write(encode_content)

以上文章**於

python 解決requests中文亂碼

import requests 爬取陽光電影 html requests.get print html.text 執行發現，列印亂碼 href html gndy jddy 20160320 50541.html imdb 8 400 a href html gndy jddy 20200627 6...

requests 中文亂碼

由於 requests 庫的簡潔與強大，正在被大量使用，目前我也在使用，並逐漸喜歡上它。但有時會出現中文亂碼問題，需要進行解決。result requests.get result text result.text以上述語句獲得頁面內容時很有可能會出現亂碼，原因是 result.text 返回的是u...

linux php mysql 中文亂碼解決方案

本公司mysql資料庫預設的編碼是utf8,如果這種編碼與你的php網頁不一致,可能就會造成mysql亂碼.mysql中建立表時會讓你選擇一種編碼,如果這種編碼與你的網頁編碼不一致,也可能造成mysql亂碼.mysql建立表時新增欄位是可以選擇編碼的,如果這種編碼與你的網頁編碼不一致,也可能造成my...

Requests中文亂碼解決方案

python 解決requests中文亂碼

requests 中文亂碼

linux php mysql 中文亂碼解決方案

相關推薦