一, 环境
- windows 7 x64
- Python 3 +
二, 安装
1,tesseract-ocr 安装
- http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.00dev.exe
- http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.00dev.exe
2,pytesseract 安装
pip install pytesseract
3,Pillow 安装
pip install pillow
三, 使用
- #! -*- coding:utf-8 -*-
- import pytesseract
- from PIL import Image
- pytesseract.pytesseract.tesseract_cmd = 'c://Program Files (x86)//Tesseract-OCR//tesseract.exe'
- tessdata_dir_config = '--tessdata-dir"c://Program Files (x86)//Tesseract-OCR//tessdata"'
- def main():
- image = Image.open('code.png')
- code = pytesseract.image_to_string(image, lang = 'eng', config=tessdata_dir_config)
- print (code)
- if __name__ == '__main__':
- main()
四, 心得, 遇到的坑
1, 在 Windows 环境下的支持没有那么好, 单单导入 import pytesseract 包, 会一直报 Not Find 的错误.
原因: 没有找到安装步骤中的 tesseract-ocr 应用程序, 需在代码中加入引用:
pytesseract.pytesseract.tesseract_cmd = 'c://Program Files (x86)//Tesseract-OCR//tesseract.exe'
2,image_to_string 需要重载两个参数, 大概的理解,
lang = 'eng' 会找到 tessdate_dir_config 下配置路径下的 tessdata 文件夹下的 eng.traineddata 文件,
config= 则是引用路径
可以根据 tessdata 目录下的 *.traineddata 文件进行配置不同的识别库 (不知道是否正确, 大概的理解是这样)
错误提示:
- Traceback (most recent call last):
- File "D:\***\VerifyCodeTest\src\main.py", line 17, in <module>
- main()
- File "D:\***\VerifyCodeTest\src\main.py", line 11, in main
- code = pytesseract.image_to_string(image, lang = 'eng', config=tessdata_dir_config)
File "C:\Users\*\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 193, in image_to_string
return run_and_get_output(image, 'txt', lang, config, nice)
File "C:\Users\*\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 140, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\*\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 111, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Users\*\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "C:\Users\*\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 990, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
- Traceback (most recent call last):
- File "D:\***\VerifyCodeTest\src\main.py", line 17, in <module>
- main()
- File "D:\***\VerifyCodeTest\src\main.py", line 11, in main
- code = pytesseract.image_to_string(image)#, lang = 'eng', config=tessdata_dir_config)
File "C:\Users\*\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 193, in image_to_string
return run_and_get_output(image, 'txt', lang, config, nice)
File "C:\Users\*\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 140, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\*\AppData\Local\Programs\Python\Python36\lib\site-packages\pytesseract\pytesseract.py", line 116, in run_tesseract
- raise TesseractError(status_code, get_errors(error_string))
- pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your"tessdata"directory. Failed loading language \'eng\'Tesseract couldn\'t load any languages! Could not initialize tesseract.')
参考自: https://blog.csdn.net/a349458532/article/details/51490291
来源: http://www.bubuko.com/infodetail-2601616.html