Tesseract引擎驗證碼

ocr(optical character recognition):光學字元識別,是指對檔案中的文字進行分析識別，獲取的過程。

tesseract：開源的ocr識別引擎，初期tesseract引擎由hp實驗室研發，後來貢獻給了開源軟體業，後經由google進行改進，消除bug，優化，重新發布。當前版本為3.01.

專案位址為：

windows 命令列使用tesseract-ocr引擎識別驗證碼:

tesseract-ocr-setup-3.01-1.exe

附錄:tessdata 目錄存放的是語言字型檔檔案，和在命令列介面中可能用到的引數所對應的檔案. 這個安裝程式預設包含了英文字型檔。

2、使用tessract-ocr引擎識別驗證碼

開啟dos介面，輸入tesseract：

如果出現如上輸出，表示安裝正常。

我準備了一張驗證碼code.jpg放在d盤根目錄下

結果為:

附錄:usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

pagesegmode values are:

0 = orientation and script detection (osd) only.

1 = automatic page segmentation with osd.

2 = automatic page segmentation, but no osd, or ocr

3 = fully automatic page segmentation, but no osd. (default)

4 = assume a single column of text of variable sizes.

5 = assume a single uniform block of vertically aligned text.

6 = assume a single uniform block of text.

7 = treat the image as a single text line.

8 = treat the image as a single word.

9 = treat the image as a single word in a circle.

10 = treat the image as a single character.

-l lang and/or -psm pagesegmode must occur before anyconfigfile.

tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

tesseract 名輸出檔名 -l 字型檔檔案 -psm pagesegmode 配置檔案

例如：tesseract code.jpg result -l chi_sim -psm 7 nobatch

-psm 7 表示告訴tesseract code.jpg是一行文字這個引數可以減少識別錯誤率. 預設為 3

configfile 引數值為tessdata\configs 和 tessdata\tessconfigs 目錄下的檔名

驗證碼簡單驗證碼識別

這裡的驗證碼是內容非常簡單的，結構非常清晰的這裡的驗證碼是內容非常簡單的，結構非常清晰的這裡的驗證碼是內容非常簡單的，結構非常清晰的興之所至之所以說簡單，我覺得是這樣的抽了五張驗證碼扔進ps，50 透明度，長這樣只有數字為內容每張圖的數字都在固定位置沒有太大的干擾因素數字字型，形態完...

驗證碼一（驗證碼生成）

根據手機好查詢密碼 return type description code for i 0 i 6 i 4位驗證碼也可以用rand 1000,9999 直接生成將生成的驗證碼寫入session，備驗證時用 session start session verify num code 建立，定義顏色...

12306驗證碼驗證碼的架構

最近和眾屌絲一樣，在12306上面刷著春節回家的票。與她大戰無數個回合之後，終於搶到了一張回家的高鐵票，不斷感慨最近人品還不錯。當前，在使用12306的過程中，充滿很多的心酸，念叨了鐵道部的親人很多次罪過其中最讓人糾結的一項即是驗證碼。12306採用驗證碼，無疑是一種很不錯的措施，可以在一定程...

Tesseract引擎 驗證碼

驗證碼 簡單驗證碼識別

驗證碼一（驗證碼生成）

12306驗證碼 驗證碼的架構

相關推薦

Tesseract引擎驗證碼

驗證碼簡單驗證碼識別

12306驗證碼驗證碼的架構