Python3編碼問題

python3最重要的進步之一就是解決了python2中字串與字元編碼的問題。

python2字串的缺陷如下：

而python3則把系統預設編碼設定為了 utf-8

>>> 
import sys
>>> sys.getdefaultencoding()
'utf-8'

之後，文字字元和二進位制資料分別用str和bytes表示。str能表示unicode 字符集中所有字元，而二進位制位元組資料則用全新的資料型別bytes表示。

>>> a = 'a'
>>> a
'a'>>> type(a)
'str'>
>>> b = '我'
>>> b
'我'>>> type(b)
'str'>

python3 中，在字元引號前加『b』，明確表示這是乙個 bytes 型別的物件，實際上它就是一組二進位制位元組序列組成的資料，bytes 型別可以是 ascii範圍內的字元和其它十六進製制形式的字元資料，但不能用中文等非ascii字元表示。

>>> c = b'a'
>>> c
b'a'
>>> type(c)
'bytes'>
>>>
>>> d = b'\xe7\xa6\x85'
>>> d
b'\xe7\xa6\x85'
>>> type(d)
'bytes'>
>>> e = b'我'
file "", line 1
syntaxerror: bytes can only contain ascii literal characters.

bytes 型別提供的操作和 str 一樣，支援分片、索引、基本數值運算等操作。但是 str 與 bytes 型別的資料不能執行 + 操作，儘管在py2中是可行的。會報錯：

typeerror: can』t convert 『bytes』 object to str implicitly

python2 與 python3 位元組與字元的對應關係

python2

python3

表現轉換

作用str

bytes

位元組encode

儲存unicode

str字元

decode

顯示str 與 bytes 之間的轉換可以用 encode 和從decode 方法。

encode ：字元str 到位元組bytes 的編碼轉換，預設用utf-8編碼；

>>> s = 'python大神'
>>> s.encode()
b'python\xe5\xa4\xa7\xe7\xa5\x9e'
>>> s.encode('gbk')
b'python\xb4\xf3\xc9\xf1'

decode ：位元組bytes 到字元str的轉換，通用使用 utf-8 編碼格式進行轉換

>>> b'python\xe5\xa4\xa7\xe7\xa5\x9e'.decode()
'python大神'
>>> b'python\xb4\xf3\xc9\xf1'.decode('gbk')
'python大神'

原文出處：

python3編碼問題

編碼問題在python3中只有兩種資料型別 1 str 編碼形式是unicode,unicode任一字元編碼都存在 2 bytes 編碼形式是十六進製制編碼encoding utf,gbk都只是一種編碼規則,按照各自的規則進行編碼，可以存在多種編碼規則 s hello中國在記憶體中是以unic...

python3 編碼問題

asci 碼 8 位unicode 至少兩個位元組 utf 8 為了傳輸而設計的編碼方式用於網路傳輸或者儲存 python2 使用ascii編碼,不支援中文 python3 使用utf 8 編碼.文字字元和二進位制資料區分得更清晰，分別用 str 和bytes 表示。文字字元全部用 str 型別...

python3編碼宣告 python3編碼問題彙總

這兩天寫了個監測網頁的爬蟲，作用是跟蹤乙個網頁的變化，但執行了一晚出現了乙個問題。希望大家不吝賜教！我用的是python3，錯誤在對html response的decode時丟擲，原樣為 response urllib.urlopen dsturl content response.read dec...

Python3編碼問題

python3編碼問題

python3 編碼問題

python3編碼宣告 python3編碼問題彙總

相關推薦