python 中的編碼問題總結

一、首先，請選擇python3.x最新版本。

因為最新版本的編碼問題相對於之前的2.0版本要好不少。

二、在編寫程式**時，習慣性地在**開始處加上下面的語句。

預設地，python的.py檔案以標準的7位ascii碼儲存，而不是unicode格式的，然而如果有的時候使用者需要在.py檔案中包含很多的unicode字元，例如.py檔案中需要包含中文的字串，這時可以在.py檔案的第一行或第二行增加encoding注釋來將.py檔案指定為unicode格式。

#!/usr/bin/env python #設定python直譯器

# -*- coding: utf-8 -*- #宣告檔案編碼為utf-8

s = "中國" # string in quotes is directly encoded in utf-8.

三、編寫完**，把**儲存成.py檔案時確保檔案編碼與宣告一致。

當你編寫完**，把**儲存成.py檔案時，一定要將檔案編碼設為與開頭的檔案編碼宣告一致的編碼（如：宣告為

# -*- coding: utf-8 -*-，則可以通過使用notepad++等軟體指定.py檔案為utf-8編碼）。

#獲取檔案的預設編碼

import sys

print(sys.getdefaultencoding())

#設定python檔案的編碼

#encoding=uft-8

import sys

reload(sys)

sys.setdefaultencoding('utf-8')

四、python3.2版本中對檔案的讀寫操作都要指明編碼。

參考**如下：

f = open("1.txt","r",encoding="utf-8")

當然也可以使用codecs包進行檔案的讀取，在使用open（）函式時指定編碼的型別：

import codecs

f=codecs.open('123.txt','r+',encoding='utf-8)

參考**如下：

file = open(filename, "rb")#要有"rb"，如果沒有這個的話，預設使用gbk讀檔案。

buf = file.read()

result = chardet.detect(buf)

file = open(filename,"r",encoding=result["encoding"])

content = file.readlines()

使用中，chardet.detect()返回字典，其中confidence是檢測精確度，encoding是編碼形式

（1）網頁編碼判斷：

>>> import urllib

>>> rawdata = urllib.urlopen('

').read()

>>> import chardet

>>> chardet.detect(rawdata)

檢測的結果是：

（2）檔案編碼判斷

import chardet

tt=open('c:\\111.txt','rb')

ff=tt.readline()#這裡試著換成read(5)也可以，但是換成readlines()後報錯

enc=chardet.detect(ff)

print enc['encoding']

tt.close()

另外為了提高探測速度，可使用如下方式：

detector = universaldetector()

for line in f.readlines():

detector.feed(line)

if detector.done:

break

detector.close()

detector.result

（3）字串編碼的判斷

isinstance(s,str) 用來判斷是否為一般字串

isinstance（s,unicode) 用來判斷是否為unicode

或者:if type(str).__name__!="unicode":

str=unicode(str,"utf-8")

else:

pass.

python的encode和decode的用法：

無論是在python2.0還是在python3.0中，在做編碼轉換時，都通常以unicode做為中間編碼，即

先將其他編碼的字串解碼（decode）成unicode，再從unicode編碼（encode）成另一種編碼。

一、python2.0

decode encode

str--------------------->unicode----------------->str

如: str=u"中文" #指定str為unicode型別物件

uni=str.encode('gb2312') #unicode編碼轉換為gb2312編碼

二、python3.0

在新版的python3.0中，取消了unicode型別，代替它的是使用unicode字元的字串型別str:

decode encode

bytes------------>str(unicode)----->bytes

注：**中的字串的預設編碼與**檔案本身的編碼是一致的，如：s=「中文"，若在utf8的檔案中，該字串就是utf8編碼，此時要進行編碼轉換，都需要先用decode方法將其轉換為unicode編碼，在使用encode方法將其轉換成其他編碼。在沒有指定特定的編碼方式時，使用系統預設編碼。

s=u"中文"

則該字串的編碼已被指定為unicode了，即python的內部編碼，而與檔案本身的編碼無關。此時，只需要使用encode方法就可以將其轉換成指定編碼即可。如果字串已經是unicode時，再進行解碼會出錯，這是就需要判斷其編碼方式是否為unicode：

isinstance(s,unicode)

用非unicode編碼形式的str來encode也會報錯。

unicode(str,gb2312)與str.decode(gb2312)一樣，都將gb2312編碼的str轉為unicode編碼。

python2和python3中str的比較：

python3中的str的轉化函式：

可能需要str的轉化的情況：

python 中的編碼問題總結

python的編碼問題總結

python的編碼問題總結

python 編碼問題總結

python 中的編碼問題總結

python的編碼問題總結

python的編碼問題總結

python 編碼問題總結

相關推薦