markhollow.com

PyCon APAC 2017 Presentation

Aug
27

Here is the slide deck of the presentation given at the PyCon APAC 2017 conference in Kuala Lumpur, Malaysia on August 27th 2017.

(more…)

Technical Presentation at PyCon APAC 2017 Conference

Jul
18

A technical presentation of the work-to-date for this project will be presented at the PyCon APAC 2017 conference in Kuala Lumpur, Malaysia. (more…)

In the News: Digitising Newspapers

Jul
05

A recent edition of Matichon Online has a story about digitising old newspapers including จีนโนสยามวารศัพท์ (Chino Siamese Daily News), กรุงเทพเดลิเมล์ (Bangkok/Krungthep Daily), หนังสือพิมพ์ไทย (Thai Newspaper) and The Bangkok Times. (more…)

Quinine for Sale

May
19

The February 1845 edition of the Bangkok Recorder contained a first for Siam: the country’s first newspaper classified advertisement.

(more…)

Homemade Book Scanner

Apr
02

A few of the books I’ve collected are very old, fragile and can’t be digitised on either a normal flatbed scanner or flatbed book scanner without damaging them. So, I’ve built a camera-based book scanner.

(more…)

The Importance of Maintaining the Culture of the Nation

Mar
07

ความสำคัญในการบำรุงวัฒนธรรมของชาติ (“The Importance of Maintaining the Culture of the Nation”) was published by the Department for Publicity in 1941 under the government of Field Marshal Plaek Phibunsongkhram (1897 – 1964).  (more…)

Programmatically Cleaning Document Scans, Part 2

Mar
04

The first article on this topic introduced basic automation techniques for cleaning document scans. This article builds on those foundations by introducing more advanced topics for background removal, frame removal and rotation correction.

(more…)

Docker Tesseract/TesserOCR Image

Feb
13

A docker container for the Tesseract 4.00 (alpha) TesserOCR python bindings is now available here. It contains English and Thai language OCR data for programmatic OCR of images.

Programmatically Cleaning Document Scans, Part 1

Feb
09

This technical post describes a few simple steps for programmatically cleaning document scans with the python programming language. The concepts can be used for batch processing hundreds of images quickly and consistently.

(more…)

Reported Missing

Feb
08

Keen-eyed readers of the PDF of the Huntrakun collection of the Bangkok Recorder may have noticed there are a few pages missing: pages 169 and 170 have been mistakenly replaced with copies of pages 461 and 462.

(more…)