Warning - the development of the current version of Tesseract and cppan is very active, and this tutorial may be obsolete. This documentation is working at 21.12.2016

Tesseract is an optical character recognition engine for various operating systems. This library is currently used in CCExtractor. At the moment, the alpha version 4.0.0 is available from current master https://github.com/tesseract-ocr/tesseract

Historically, build it for Windows is complex and unclear. Now in projects with tesseract is advised to use cppan (https://github.com/cppan/cppan) that can be used in CMake projects and automatically downloads all dependencies. This program is used for the compilation of the current master of tesseract (because of the dependencies), too.

You must install Git, CMake, cppan and put in to PATH variable https://github.com/tesseract-ocr/tesseract/wiki/Compiling#windows And it is recommended to install Visual Studio 2015

Open cmd.exe (I think better as administrator) and run

git clone https://github.com/tesseract-ocr/tesseract tesseract
cd tesseract
cppan --self-upgrade
cppan
mkdir build && cd build
cmake ..

First problem - process can hangs at:

-- Performing 71 checks using 4 thread(s)
-- This process may take up to 5 minutes depending on your hardware

Follow this hack - https://github.com/tesseract-ocr/tesseract/issues/464#issuecomment-264166445

Try to build it:

cmake --build . --config Release

You can get errors:

This problem is related to the bad files from libtiff.

How to fix it - during the commands you have new files at C:/Users/user/.cppan. In folder .cppan/storage/lnk in one of folders you can find tiff-4.0.7.sln, try to open it in Visual Studio and compile

Open the file tif_config.h and edit 189 line

#define SIZEOF_UNSIGNED_LONG 4

Open the file tiffconf.h and edit 29 line

#define TIFF_UINT64_T unsigned long int

Build it again. All should be OK.

Try to run it in console again:

cmake --build . --config Release

All should be OK

Now you have deps dll (in tesseract/build/bin/Release), tesseract lib (in tesseract/build/Release). Deps libs you can find in C:\Users\user\.cppan\storage\lib\511d09b6\Release (instead of 511d09b6 you can have other value)

You can also build debug version

cmake --build . --config Debug

Note - if you just add a built libraries to ccextractor.sln without updating ccextractor OCR code, there may be errors due to legacy code.

Open tesseract/build/tesseract.sln and explore it. To find out where the header files and .lib files, open Properties of “tesseract” project. You must look at C/C++→General→Additional Include Directories and Linker→Input→Additional Dependencies. DLL files are in tesseract/build/bin folder. Copy all files in ccextractor\windows\libs and update settings in ccextractor\windows\ccextractor.sln

  • public/general/tess_build.txt
  • Last modified: 2016/12/21 14:11
  • by izaron