python 讀取unicode編碼檔案

參考：

import
chardet
f=open('
a.txt
','rb')
text=f.read()
info=chardet.detect(text)
print
(info)

import
chardet
f=open('
a.txt
',encoding='
utf-16')
text=f.read()
print(text.encode("
utf-8
").decode("
unicode_escape"))
'1.新出吐魯番文書及其研究
'

先編碼然後解碼讀取到了中文文字。

import
sixdef
convert_to_unicode(text):
"""converts `text` to unicode (if it's not already), assuming utf-8 input.
"""#
six_ensure_text is copied from 
def six_ensure_text(s, encoding="unicode_escape
", errors="
strict"):
ifisinstance(s, six.binary_type):
print('
true')
return s.decode(encoding, errors)#
如果是位元組流，那麼就以指定方式解碼
elif isinstance(s, six.text_type):#
如果是文字型別，直接返回
return
s        
else
:            
raise typeerror("
not expecting type '%s'
" %type(s))
return six_ensure_text(text, encoding="unicode_escape
", errors="
ignore")
f=open('
a.txt
',encoding=('
utf-16'))
text=f.read()
print
(convert_to_unicode(text.encode("utf-8")))

true

1.新出吐魯番文書及其研究

注意：

>>> type(text.encode("

utf-8

"))#

經過編碼之後encode型別為位元組型別

'bytes

>>> type(text)#

通過open中的encoding的是檔案編碼方式，text型別是str

上面的二進位制型別也就是py3中的位元組型別。

python 學習 unicode 編碼

如果檔案需要制定編碼格式如utf 8 1.要在檔案開始時寫如下注釋 coding utf 8 2.或則使用以下 import sys reload sys sys.setdefaultencoding utf 8 說明 unicode支援不同的編碼方式，最著名的的是utf 8.ascii字元的...

Python學習筆記 Unicode

這裡簡單的說一下。下面內容基本上時從 python.core.programming.2ed 上摘的 unicode是計算機可以支援這個星球上的多種語言的秘密在unicode之前，用的都是ascii，ascii嗎非常簡單，每個英文本元都用7位二進位制數的方式儲存在計算機內，其範圍是32到126.它...

Python學習筆記 Unicode

內容摘自 python核心程式設計 unicode是計算機可以支援這個星球上的多種語言的秘密在unicode之前，用的都是ascii，ascii嗎非常簡單，每個英文本元都用7位二進位制數的方式儲存在計算機內，其範圍是32到126.它的實現原理這裡也不說了。但是ascii碼只能表示95個可列印的字元...

python 讀取unicode編碼檔案

python 學習 unicode 編碼

Python學習筆記 Unicode

Python學習筆記 Unicode

相關推薦