Automated Registration of Historical Population Registers:

New Prospects and Possibilities

Lund, Sweden, 14 February 2019

The IUSSP Scientific Panel on Historical Demography organized a workshop on methods for automated text recognition of both printed (OCR) and hand-written (HTR) documents. The workshop was organized in collaboration with the Centre for Economic Demography, Lund University, the Swedish National Archives, and the SWEDPOP and LONGPOP projects. The aim was to bring together specialists who have developed and worked with these methods and researchers in historical demography and economic history who are working on large scale data digitization projects.

Anders Hast, Uppsala University, presenting handwritten text recognition techniques at the workshop.

Four presentations were given at the workshop, followed by a general discussion where members of two large research projects (SWEDPOP and LONGPOP) together with the audience discussed prospects and challenges in using these methods for population registers and similar sources.

The first presentation was by Lars Björk and Torsten Johansson from the National Library of Sweden (KB). Their talk “Improving the OCR-process – Experiences from the newspaper digitization at the National Library of Sweden”, shared lessons and experiences from a long-running and large-scale project digitizing Swedish newspapers using OCR techniques.

Anders Hast from the Centre for Image Analysis (CBA), Uppsala University (Sweden) talked about automated recognition of hand-written text using image analysis: “Making document collections searchable and readable by using handwritten text recognition techniques – possibilities and limitations”. It showed the great potential of these methods, but also many of the challenges in applying them to large-scale general data entry projects.

In the afternoon there were two presentations from projects which have actually implemented the techniques in digitizing population data. Emil Sorensen and Christian Westermann from the University of Southern Denmark in Odense showed several applications where they automatically entered data from tables, including hand-written data, such as grades of school children in Sweden in the 1930s and printed cause-of-death data from the United States. Their talk was titled “Digitizing and analyzing historical documents at scale: The power of AI.”

Joana Maria Pujades Mora from the Autonomous University of Barcelona (UAB) in Spain gave a fascinating presentation of their work digitizing Catalan marriage registers back to the 1500s: “The big data of the past: a journey through historical population documents driven by Computer Vision”.

With over 40 academic scholars, archivists and scientific librarians in attendance, there was an engaging discussion and active networking. Plans for future collaboration on issues related to data digitization of population registers are taking form.

If you have questions about this workshop or about the IUSSP panel on Historical Demography, please contact Martin Dribe (Martin.Dribe@ekh.lu.se).

See also:

Workshop programme
Photos: https://www.facebook.com/lunddemography/posts/1240297796094970

Funding: The workshop received support from the Centre for Economic Demography, Lund University and the LONGPOP project which has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 676060.