python 編碼問題總結

1、

簡而言之，python 2.x裡字串有兩種：str和unicode

前者到後者要decode，後者到前者要encode,'utf-8'為例：

str.decode('utf-8') -> unicode

str <- unicode.encode('utf-8')

總結：unicode就像是中間橋梁，utf-8編碼，gbk編碼都可以decode成unicode編碼，而unicode編碼可以encode成utf-8。編碼其實就分為兩類unicode編碼和非unicode編碼，非unicode包含了uft-8,gbk之類的，utf-8和gbk的轉換可以通過unicode來作為中間橋梁，先decode成unicode,再encode成相應的碼

print "type of    '中文'   is %s" % type('中文')
print "type of   '中文'.decode('utf-8')   is %s" % type('中文'.decode('utf-8')) 
print "type of   u '中文'   is %s" % type(u'中文')
print "type of   u'中文'.encode('utf-8')   is %s" % type(u'中文'.encode('utf-8'))

說明： type of '中文' is

type of '中文'.decode('utf-8') is

type of u '中文' is

type of u'中文'.encode('utf-8') is

2、避免編碼問題建議

一、使用字元編碼宣告，並且同一工程中的所有源**檔案使用相同的字元編碼宣告

#encoding=utf-8

說明：如果py檔案的開頭已經使用了#encoding=utf-8，那麼就print 就自動將print的字元轉成utf-8,

test2 = u'漢字'
print test2

#encoding=utf-8
test2 = u'漢字'
print test2

說明：這樣就不會報錯,否則亂碼

3、讀寫檔案

從目標檔案讀入，然後decode成unicode碼，然後再encode成utf-8碼，再存到檔案中。

內建的open()方法開啟檔案時，read()讀取的是str,str可以使用gbk,utf-8，讀取後需要使用正確的編碼格式進行decode()。write()寫入時，如果引數是unicode，則需要使用你希望寫入的編碼進行encode()，如果是其他編碼格式的str，則需要先用該str的編碼進行decode()，轉成unicode後再使用寫入的編碼進行encode()。如果直接將unicode作為引數傳入write()方法，python將先使用源**檔案宣告的字元編碼進行編碼然後寫入。

# coding: utf-8
f = open('test.txt')
s = f.read()
f.close()
print type(s) # # 已知是gbk編碼，解碼成unicode
u = s.decode('gbk')
f = open('test.txt', 'w')
# 編碼成utf-8編碼的str
s = u.encode('utf-8')
f.write(s)
f.close()

python 編碼問題總結

python 編碼問題總結

python編碼問題總結

python編碼問題總結

python 編碼問題總結

python 編碼問題總結

python編碼問題總結

python編碼問題總結

相關推薦