Python編碼小記

當用python讀取檔案時,檔案的編碼方式與讀取檔案的方式似乎很有意思.

一小段python**如下:

#
coding:utf-8
if__name__ == '
__main__':
str=open('
content.txt
','r
').readline()
#str.decode('gbk').encode('utf-8')注意這行隱掉了
str.decode('
utf-8
').encode('
gbk'
)　　　　
print str

content.txt內容如下:

啦啦啦啦,德瑪西亞!!!1245abcd

執行會報錯:

traceback (most recent call last):
file 
"c:\python27\551059.py
", line 5, in
str.decode(
'utf-8
').encode('
gbk'
)file 
"c:\python27\lib\encodings\utf_8.py
", line 16, in
decode
return
codecs.utf_8_decode(input, errors, true)
unicodedecodeerror: 
'utf8
' codec can'
t decode byte 0xc0 in position 0: invalid s
tart byte
>>>

現在去掉# 並隱掉下一行,如下:

#
coding:utf-8
if__name__ == '
__main__':
str=open('
content.txt
','r
').readline()
str.decode(
'gbk
').encode('
utf-8
')注意這行隱掉了
#str.decode('utf-8').encode('gbk')
print str

執行結果:

啦啦啦啦,德瑪西亞!!!1245abcd

>>>

現在好好來分析一下,encode和decode吧!網上大嬸們的解釋如下:

字串在python內部的表示是unicode編碼，因此，在做編碼轉換時，通常需要以unicode作為中間編碼，即先將其他編碼的字串解碼（decode）成unicode，再從unicode編碼（encode）成另一種編碼。

decode的作用是將其他編碼的字串轉換成unicode編碼，如str1.decode('gb2312')，表示將gb2312編碼的字串str1轉換成unicode編碼。

encode的作用是將unicode編碼轉換成其他編碼的字串，如str2.encode('gb2312')，表示將unicode編碼的字串str2轉換成gb2312編碼。

因此，轉碼的時候一定要先搞明白，字串str是什麼編碼，然後decode成unicode，然後再encode成其他編碼

檢視記事本的編碼格式:預設為ansi,補充一下:不同的國家和地區制定了不同的標準，由此產生了 gb2312, big5, jis 等各自的編碼標準。這些使用 2 個位元組來代表乙個字元的各種漢字延伸編碼方式，稱為 ansi 編碼。在簡體中文系統下，ansi 編碼就代表 gb2312 編碼!於是乎,我們對txt檔案應先將gbk轉換為unicode,使python可以識別txt中的內容,然後再將unicode轉換為utf-8就可以正確顯示了.

當然若是儲存txt檔案時選擇以utf-8編碼方式儲存(utf-8（8-bit unicode transformation format）是一種針對unicode的可變長度字元編碼，又稱萬國碼。),那麼第二段**就會報錯了!

Python編碼小記

python編碼小記

Python 編碼踩坑小記

前端前端編碼規範小記

Python編碼小記

python編碼小記

Python 編碼踩坑小記

前端 前端編碼規範小記

相關推薦

前端前端編碼規範小記