markhollow.com

Current Project Status

Feb
27

Although this blog has been silent for a few months, progress has continued as time permits. This article gives a summary update of the current status. (more…)

PyCon APAC 2017 Presentation

Aug
27

Here is the slide deck of the presentation given at the PyCon APAC 2017 conference in Kuala Lumpur, Malaysia on August 27th 2017.

(more…)

Technical Presentation at PyCon APAC 2017 Conference

Jul
18

A technical presentation of the work-to-date for this project will be presented at the PyCon APAC 2017 conference in Kuala Lumpur, Malaysia. (more…)

Programmatically Cleaning Document Scans, Part 2

Mar
04

The first article on this topic introduced basic automation techniques for cleaning document scans. This article builds on those foundations by introducing more advanced topics for background removal, frame removal and rotation correction.

(more…)

Docker Tesseract/TesserOCR Image

Feb
13

A docker container for the Tesseract 4.00 (alpha) TesserOCR python bindings is now available here. It contains English and Thai language OCR data for programmatic OCR of images.

Programmatically Cleaning Document Scans, Part 1

Feb
09

This technical post describes a few simple steps for programmatically cleaning document scans with the python programming language. The concepts can be used for batch processing hundreds of images quickly and consistently.

(more…)