Python編碼中的坑及處理方法

python虐我千百遍，我待python如初戀。

使用python編寫模型指令碼，其中python的編碼讓我一路採坑。首先報的乙個錯誤就是：

traceback (most recent call last):
file
"", line 1, in
unicodedecodeerror: 'ascii' codec can't decode byte 0xe6
in position 0: ordinal not
inrange(128)

各種搜尋找到的方法都是不外乎設定python編碼

import sys
reload
(sys)
sys.setdefaultencoding
('utf8')

這種方法可能解決其他問題，但是確實沒有解決我的問題！

另外乙個python常見的編碼錯誤如下：

traceback (most recent call last):
file
"", line 1, in
file
"/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, true)
unicodeencodeerror: 'ascii' codec can't encode characters in position 0-1: ordinal not
inrange(128)

尤其是在使用python處理中文時，例如讀寫檔案、字串處理、print等，一執行發現一大堆亂碼。這時多數人都會各種呼叫encode/decode進行除錯，並沒有明確思考為何出現亂碼。

str和unicode都是basestring的子類，所以有判斷是否是字串的方法。

def
is_str
(s):
return isinstance(s, basestring)

str和unicode 的轉換與區別

str-> decode('the_coding_of_str') -> unicode
unicode-> encode('the_coding_you_want') -> str

str是位元組串，由unicode經過編碼(encode)後的位元組組成的。

宣告方式及求長度(返回位元組數)

s = '中文'
s = u'中文'.encode('utf-8')
>>> type('中文')
'str'>

>>> 
u'中文'.encode('utf-8')
'\xe4\xb8\xad\xe6\x96\x87'
>>> len(u'中文'.encode('utf-8'))
6

unicode才是真正意義上的字串，由字元組成

宣告方式及求長度(返回字元數)

s = u'中文'
s = '中文'.decode('utf-8')
s = unicode('中文', 'utf-8')
>>> type(u'中文')
'unicode'>

>>> 
u'中文'
u'\u4e2d\u6587'
>>> len(u'中文')
2

搞明白要處理的是str還是unicode, 使用對的處理方法(str.decode/unicode.encode)

下面是判斷是否為unicode/str的方法

>>> isinstance(u'中文', unicode)
true
>>> isinstance('中文', unicode)
false
>>> isinstance('中文', str)
true
>>> isinstance(u'中文', str)
false

簡單原則：不要對str使用encode，不要對unicode使用decode

>>> '中文'.encode('utf-8')
traceback (most recent call last):
file
"", line 1, in
unicodedecodeerror: 'ascii' codec can't decode byte 0xe4
in position 0: ordinal not
inrange(128)
>>> u'中文'.decode('utf-8')
traceback (most recent call last):
file
"", line 1, in
file
"/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, true)
unicodeencodeerror: 'ascii' codec can't encode characters in position 0-1: ordinal not
inrange(128)

不同編碼轉換,使用unicode作為中間編碼

#s是code_a的str
s.decode('code_a').encode('code_b')

python中的編碼處理

usr bin env python coding cp936 import codecs import sys if name main if len sys.argv 5 print python s infile decodetype outfile encodetype sys.argv 0...

python異常處理及Url編碼

url編碼 import traceback import urllib.parse s besttest 自動化測試 print urllib.parse.quote s url編碼 print urllib.parse.quote plus s url編碼,src print urllib.pa...

python 中文編碼的處理

在win下寫點python的對utf 8 老是處理不過來，並且解析一點漢字總會遇到一些漢字的編碼問題。下決心把它解決掉。1 嘗試第一種方式 utf8string utf8string.decode utf 8 utf8string utf8string.encode gbk 這個時候顯示基本上是正...

Python編碼中的坑及處理方法

python中的編碼處理

python異常處理及Url編碼

python 中文編碼的處理

相關推薦