python 中文問題

# -*- coding: utf-8 -*- #coding:utf-8

中文問題：

python內部所有編碼時unicode 是一種中轉碼

中文是gbk

正常輸出是utf-8 //**中使用了# -- coding: utf-8 --

亂碼–>unicode中轉碼–>我們需要的編碼格式

decode() –> unicode –>encode我們需要的編碼

例子：

print content.decode(「gbk」).encode(「utf-8」)

自動判斷編碼，及自動處理**的轉換方法：

if isinstance(content,unicode):
print
"is unicode"
print content.encode("utf-8")
else:
print
"is not unicode"
print content.decode("gbk").encode("utf-8")

有時候乙個網頁中可能有多種編碼,網頁編碼不規範，此時decode時可能有些不能正常顯示，此時可以給decode加上第二個引數

content.decode("gbk"，'ignore').encode("utf-8")

存在的問題：

(解決：

為什麼我的電腦win7上面預設的編碼是mbcs

來檢測字串的編碼方式：

所以使用起來需要用如下方式才能獲得中文：

content = html.read().decode("utf-8").encode("mbcs")
content = html.read().decode("gbk").encode("mbcs")

但是存在的問題是，使用mbcs編碼時，linux的python實現中沒有這種編碼，所以一旦移植到linux一定會出現異常！另外，只要設定的windows系統區域不同，mbcs指代的編碼也是不一樣的。so how to deal with it?

sys.getfilesystemencoding(): 獲取檔案系統使用編碼方式，windows下返回』mbcs』，mac下返回』utf-8

python中文問題

搭建 python 的 eclips環境後寫了乙個測試程式。結果出現這樣的錯誤 syntaxerror non ascii character xbd in file e workspace makeupdatafilesindex src makeindex.py on line 12,but n...

python 中文問題

1 使用python源中可以寫入中文 coding utf 8 開頭加上 2 web json編碼為gbk，isinstance s,unicode 為true import sys reload sys sys.setdefaultencoding utf 8 3 python 中處理中文字元 ...

python中文編碼問題

在 python 中對中文進行處理的時候，往往涉及到編碼轉換的問題，通常使用以下三種編碼格式 utf 8 gbkunicode 國內用的比較多的是 gbk格式，unicode 是乙個很好的編碼方案，將世界各國的語言進行了統一的編碼，美國人後來覺得自己吃了大虧，於是又搞了一種變長編碼的 utf 8 的...

python 中文問題

python中文問題

python 中文問題

python中文編碼問題

相關推薦