解決Boost Regex對中文支援不好的問題收藏

收藏k.m.cao

v0.1

boost.regex作為boost對正規表示式的實踐，是c++開發中常用模式匹配工具。但在這次使用過程中發現，它他對中文的支援並不好。當我們指定/w匹配時，包含「數」或「節」等字的字串就會出現匹配失敗的問題。

思路：把字元都轉換成寬字元，然後再匹配。

需要用到以下和寬字元有關的類：

1、wstring：

作為stl中和string相對應的類，專門用於處理寬字串。方法和string都一樣，區別是value_type是wchar_t。wstring類的物件要賦值或連線的常量字串必須以l開頭標示為寬字元。

2、wregex：

和regex相對應，專門處理寬字元的正規表示式類。同樣可以使用regex_match()和regex_replace()等函式。regex_match()的結果需要放在wsmatch類的物件中。

字元和寬字元的相互轉換：

1、rtl的方法

//把字串轉換成寬字串

setlocale( lc_ctype, "" ); // 很重要，沒有這一句，轉換會失敗。

int iwlen= mbstowcs( null, stomatch.c_str(), stomatch.length() ); // 計算轉換後寬字串的長度。（不包含字串結束符）

wchar_t *lpwsz= new wchar_t[iwlen+1];

int i= mbstowcs( lpwsz, stomatch.c_str(), stomatch.length() ); // 轉換。**換後的字串有結束符）

wstring wstomatch(lpwsz);

delete lpwsz;

//把寬字串轉換成字串，輸出使用

int ilen= wcstombs( null, wsm[1].str().c_str(), 0 ); // 計算轉換後字串的長度。（不包含字串結束符）

char *lpsz= new char[ilen+1];

int i= wcstombs( lpsz, wsm[1].str().c_str(), ilen ); // 轉換。（沒有結束符）

lpsz[ilen] = '/0';

string stomatch(lpsz);

delete lpsz;

2、win32 sdk的方法

//把字串轉換成寬字串

int iwlen= multibytetowidechar( cp_acp, 0, stomatch.c_str(), stomatch.size(), 0, 0 ); // 計算轉換後寬字串的長度。（不包含字串結束符）

wchar_t *lpwsz= new wchar_t [iwlen+1];

multibytetowidechar( cp_acp, 0, stomatch.c_str(), stomatch.size(), lpwsz, iwlen ); // 正式轉換。

wsz[iwlen] = l'/0';

//把寬字串轉換成字串，輸出使用

int ilen= widechartomultibyte( cp_acp, null, wsresult.c_str(), -1, null, 0, null, false ); // 計算轉換後字串的長度。（包含字串結束符）

char *lpsz= new char[ilen];

widechartomultibyte( cp_oemcp, null, wsresult.c_str(), -1, lpsz, ilen, null, false); // 正式轉換。

sresult.assign( lpsz, ilen-1 ); // 對string物件進行賦值。

通過以下程式我們可以看到，對字串做/w匹配時，某些字會引起匹配失敗。通過把字串轉換成寬字串嘗試解決這個問題。

#include

using std::cout;

using std::endl;

#include

using std::string;

using std::wstring;

#include

#include "boost/tr1/regex.hpp"

using namespace boost;

void matchwords(string stomatch)

void matchwords(wstring wstomatch)

void main()

編譯執行程式後輸出：

匹配結果：數超限

匹配結果：

匹配結果：節點數目超限

第一行顯示「數超限」匹配成功。但第二行「節點數超限」沒有匹配到任何字元。只有轉換成寬字串之後才能夠對「節點數超限」成功進行/w匹配。

解決Boost Regex對中文支援不好的問題收藏

解決中文亂碼

解決中文亂碼

解決FileOutputStream中文亂碼問題

解決Boost Regex對中文支援不好的問題收藏

解決中文亂碼

解決中文亂碼

解決FileOutputStream中文亂碼問題

相關推薦