markhollow.com

In the News: Digitising Newspapers

Jul
05

A recent edition of Matichon Online has a story about digitising old newspapers including จีนโนสยามวารศัพท์ (Chino Siamese Daily News), กรุงเทพเดลิเมล์ (Bangkok/Krungthep Daily), หนังสือพิมพ์ไทย (Thai Newspaper) and The Bangkok Times. (more…)

Homemade Book Scanner

Apr
02

A few of the books I’ve collected are very old, fragile and can’t be digitised on either a normal flatbed scanner or flatbed book scanner without damaging them. So, I’ve built a camera-based book scanner.

(more…)

The Importance of Maintaining the Culture of the Nation

Mar
07

ความสำคัญในการบำรุงวัฒนธรรมของชาติ (“The Importance of Maintaining the Culture of the Nation”) was published by the Department for Publicity in 1941 under the government of Field Marshal Plaek Phibunsongkhram (1897 – 1964).  (more…)

Programmatically Cleaning Document Scans, Part 2

Mar
04

The first article on this topic introduced basic automation techniques for cleaning document scans. This article builds on those foundations by introducing more advanced topics for background removal, frame removal and rotation correction.

(more…)

Docker Tesseract/TesserOCR Image

Feb
13

A docker container for the Tesseract 4.00 (alpha) TesserOCR python bindings is now available here. It contains English and Thai language OCR data for programmatic OCR of images.

Programmatically Cleaning Document Scans, Part 1

Feb
09

This technical post describes a few simple steps for programmatically cleaning document scans with the python programming language. The concepts can be used for batch processing hundreds of images quickly and consistently.

(more…)