Python判斷字串編碼以及編碼的轉換

判斷字串編碼

使用 chardet 可以很方便的實現字串/檔案的編碼檢測。尤其是中文網頁，有的頁面使用gbk/gb2312，有的使用utf8，如果你需要去爬一些頁面，知道網頁編碼很重要

>>> import
urllib
>>> html = urllib.urlopen('
').read()
>>> import
chardet
>>>chardet.detect(html)

函式返回值為字典，有2個元素，乙個是檢測的可信度，另外乙個就是檢測到的編碼。

編碼轉換

先把其他編碼轉換為unicode再轉換其他編碼, 如utf-8轉換為gb2312

>>> import
chardet
>>> str = "我們"
>>> print
(chardet.detect(str))
>>> str1 = str.decode('
utf-8')
>>> str2 = str1.encode('
gb2312')
>>> print
(chardet.detect(str2))

判斷字串編碼

size large 猜測法猜測一種字串編碼，然後使用該編碼對字串進行編碼，還原。如果猜測錯誤，字串會被破壞，還原城亂碼。size 判斷字串編碼 param str return public static string getencoding string str catch exception...

Python字串編碼

在python中有些特殊的地方是存在兩種字串，分別為str和unicode字串，他們都繼承自basestring。如 s hello world s為str us u hello world us為unicode。使用help str 和help unicode 可以檢視各自說明，他們都有decod...

python字串編碼

常見字元編碼型別 ascii 美國資訊交換標準碼，是目前計算機中最廣泛使用的字符集編碼。每個 ascii 碼以 1 個位元組儲存，例如數字字元 0 的 ascii 碼是 0110000，十進位制表示為 48。unicode 為解決世界上上百種語言帶來混合衝突，各國有各國的標準，顯示很容易出現亂碼...

Python判斷字串編碼以及編碼的轉換

判斷字串編碼

Python字串編碼

python字串編碼

相關推薦