python編碼轉換

參見：

主要介紹了python的編碼機制，unicode, utf-8, utf-16, gbk, gb2312,iso-8859-1 等編碼之間的轉換。

常見的編碼轉換分為以下幾種情況：

1.自動識別字串編碼：

#coding:utf8
import urllib
import chardet
rawdata = urllib.urlopen('').read()
print chardet.detect(rawdata)

輸出：

#confidence是可信度，encoding是編碼

2.unicode轉換為其他編碼

#coding:utf8
a = u'中文'
a_gb2312 = a.encode('gb2312')
print a_gb2312

輸出：中文

3.其他編碼轉換為unicode

#coding:utf8
a = u'中文'
a_gb2312 = a.encode('gb2312')
print a_gb2312
#a為gb2312編碼，要轉為unicode. unicode(a, 'gb2312')或a.decode('gb2312')
print [unicode(a_gb2312,'gb2312')]
print [a_gb2312.decode('gb2312')]

輸出：

中文
[u'\u4e2d\u6587']
[u'\u4e2d\u6587']

4.非unicode編碼之間的相互轉化

#coding:utf8
a = u'中文'
a_gb2312 = a.encode('gb2312')
print a_gb2312
#編碼1轉換為編碼2可以先轉為unicode再轉為編碼2
a_unicode = a_gb2312.decode('gb2312')
print [a_unicode]
a_utf8 = a_unicode.encode('utf8')
#dos不識別utf8編碼，直接輸出會是亂碼
print [a_utf8]

5.判斷字串編碼
#coding:utf8
#isinstance(s, str) 用來判斷是否為一般字串 
#isinstance(s, unicode) 用來判斷是否為unicode 3
#如果乙個字串已經是unicode了，再執行unicode轉換有時會出錯(並不都出錯) 
def u(s,encoding):
if isinstance(s,unicode):
return s
else:
return unicode(s,encoding)
6.漢字轉化為unicode編碼
#coding:utf8
#該方法沒看懂，先留下了
name = '中國' 
name = name.decode('utf8')
print name
tmpname = ""
for c in name:
c = "%%u%04x" % ord(c)
tmpname += c
print tmpname
輸出結果：
中國%u4e2d%u56fd

Python 編碼轉換

coding utf 8 s abc print type s str utf 8 print len s 3 s unicode s str unicode，其中str的每個字元值必須小於128 print type s unicode print len s 3 s u abc print ty...

python 編碼轉換

主要介紹了python的編碼機制，unicode,utf 8,utf 16,gbk,gb2312,iso 8859 1 等編碼之間的轉換。常見的編碼轉換分為以下幾種情況可以使用 chardet 模組自動識別字元創編碼 chardet 使用方法例如 a為unicode編碼要轉為gb2312。a...

Python 編碼轉換

coding utf 8 s abc print type s str utf 8 print len s 3 s unicode s str unicode，其中str的每個字元值必須小於128 print type s unicode print len s 3 s u abc print ty...

python編碼轉換

Python 編碼轉換

python 編碼轉換

Python 編碼轉換

相關推薦