C Windows平台字串編碼轉換

一、windows api

本文介紹使用windows api進行字元編碼的轉換，涉及widechartomultibyte和multibytetowidechar2個api，

api介面名中的multibyte對應著多位元組編碼，如ascii、utf-8等都是多位元組編碼，而widechar字面意思是寬字元，在windows內部寬字元特指utf-16編碼。原型如下：

int widechartomultibyte( uint codepage, dword dwflags, lpcwstr lpwidecharstr, int cchwidechar, lpstr lpmultibytestr, int cbmultibyte, lpcstr lpdefaultchar, lpbool lpuseddefaultchar

);

int multibytetowidechar( uint codepage, dword dwflags, lpcstr lpmultibytestr, int cbmultibyte, lpwstr lpwidecharstr, int cchwidechar

);

二、字串編碼轉換

std::string unicodetoansi(const std::wstring &str, uint icodepage = cp_acp) 
std::wstring ansitounicode(const std::string &str, uint icodepage = cp_acp) 
std::string unicodetoutf8(const std::wstring &str) 
std::string unicodetoutf8bom(const std::wstring &str) 
strres = (char*)szbuf;
delete szbuf;
return strres;
}std::wstring utf8tounicode(const std::string &str) 
std::string ansitoutf8(const std::string &str, uint icodepage = cp_acp) 
std::string ansitoutf8bom(const std::string &str, uint icodepage = cp_acp) 
std::string utf8toansi(const std::string &str, uint icodepage = cp_acp)

對於只支援簡體中文（部分韓文、日文）的系統，icodepage可以使用cp_acp，這時api會使用系統當前的**頁（簡體中文系統為cp936，即gbk字符集）來進行編碼轉換。但遇到如下情況就需要手動指定**頁了：

需要轉換的字串中的文字是系統當前**頁不支援的。如字串中含有中文，而當前系統**頁確是英文的；

gbk字符集中只包含了一部分韓文和日文，部分韓文和日文的轉換可以正常轉換，若遇到不能轉換的情況也需要將指定icodepage為特定的支援韓文或日文的**頁了，特別是中文和韓文、日文等混合的情況下。如韓文「탉」不包含在gbk中，若這時仍然使用cp_acp就會得到錯誤的轉換結果?，十六進製制3f。但gb18030（**頁為54936）支援「탉」，可以手動指定icodepage為54936。

選擇「以其他編碼儲存」，選擇「unicode（utf-8帶簽名）- **頁65001」儲存。

雖然「簡體中文（gb18030) - **頁54936」也支援這些字元，但不能選擇該選項進行儲存，具體原因在

撥開字元編碼的迷霧--編譯器如何處理檔案編碼

中有詳細的介紹。

C Windows平台字串編碼轉換

c Windows程式設計字串

字串編碼

字串與編碼

C Windows平台字串編碼轉換

c Windows程式設計 字串

字串編碼

字串與編碼

相關推薦

c Windows程式設計字串