Lua UTF 8 全形轉半形

根據utf-8的編碼規則，可以知道：

1. 全形空格為12288，半形空格為32

2. 其他字元半形(33-126)與全形(65281-65374)的對應關係是：均相差65248

但是utf-8不能位元組轉換位整型變數，因此需要乙個函式，做乙個轉換：

function utf8_to_num(raw_string)
local result = {}
pos = 1
while pos <= #raw_string do
local count_1_of_byte = get_continuous_1_count_of_byte(string.byte(raw_string, pos))
local num = 0
if count_1_of_byte < 1 then 
num = string.byte(raw_string, pos)
count_1_of_byte = 1
else 
boundary = 8
i = count_1_of_byte + 1
while i < boundary * count_1_of_byte do
if 0 == i % boundary then
i = i + 2
endif (1 << (boundary - i % boundary - 1)) & string.byte(raw_string, pos + math.floor(i / boundary)) ~= 0 then
--print(1)
num = (num << 1) | 1
else
--print(0)
num = num << 1
endi= i + 1
endend
pos = pos + count_1_of_byte
table.insert(result, num)
endreturn result
end

為了方便將乙個utf8字元，轉換成整型，還需要判斷乙個utf8字元佔了多少個位元組，為此需要乙個函式來判斷（具體參考：

--獲取乙個位元組中，從最高位開始連續的1的個數
function  get_continuous_1_count_of_byte(num)
if nil == num then 
return -1
endlocal count = 0
while num & 0x80 ~= 0 do
count = count + 1
num = num << 1
endreturn count
end

接下來就只轉換的函式：

function full_width_to_half_width(raw_string)
local new_string = {}
local pos = 1
while pos <= #raw_string do
local count_1_of_byte = get_continuous_1_count_of_byte(string.byte(raw_string, pos))
if 3 == count_1_of_byte then
char = string.sub(raw_string, pos, pos + 2)
num_of_char = utf8_to_num(char)[1]
if 12288 == num_of_char then
num_of_char = 32
table.insert(new_string, string.char(num_of_char))
elseif 65281 <= num_of_char and num_of_char <= 65374 then 
num_of_char = num_of_char - 65248
table.insert(new_string, string.char(num_of_char))
endpos = pos + count_1_of_byte 
else
table.insert(new_string, string.sub(raw_string, pos, pos))
pos = pos + 1
endend
return table.concat(new_string)
end

比較簡單，就不做解釋了。

全形轉半形半形轉全形（Python）

coding utf 8 def str q2b u string 全形轉半形全形字符unicode編碼從65281 65374 十六進製制 0xff01 0xff5e 半形字元unicode編碼從33 126 十六進製制 0x21 0x7e 空格比較特殊，全形為 12288 0x3000 半形為...

全形轉半形與半形轉全形

1.全形指乙個字元占用兩個標準字元位置。漢字字元和規定了全形的英文本元及國標gb2312 80中的圖形符號和特殊字元都是全形字符。一般的系統命令是不用全形字符的，只是在作文書處理時才會使用全形字符。2.半形指一字元占用乙個標準的字元位置。通常的英文本母數字鍵符號鍵都是半形的，半形的顯示內碼都...

lua utf 8編碼的漢字

lua 的string庫不支援處理utf 8編碼的漢字。用lua要處理漢字還是很費勁的。utf8的編碼規則 1.字元的第乙個位元組範圍 0x00 0x7f 0 127 或者 0xc2 0xf4 194 244 utf8 是相容 ascii 的，所以 0 127 就和 ascii 完全一致 2.0xc...

Lua UTF 8 全形轉半形

全形轉半形 半形轉全形（Python）

全形轉半形與半形轉全形

lua utf 8編碼的漢字

相關推薦

全形轉半形半形轉全形（Python）