Digital demography in the era of Big Data
Seville, Spain, 6-7 June 2019
The Research Workshop on Digital Demography in the Era of Big Data*, held at the Institute of Statistics and Cartography of Andalusia (IECA) in Seville, Spain, 6-7 June 2019, brought together 30 scholars to discuss the implications of digital technologies for demographic behaviour as well as the applications of new data from digital sources to understand population processes. Sixteen papers were presented, including two invited presentations.
The workshop was preceded by a pre-meeting that included a Demography Today Lecture by John Palmer (Pompeu Fabra University) on "Digital Demography, Human-Mosquito Interactions, and the Socio-Ecological Context of Vector-Borne Disease", and a training session led by Emilio Zagheni (MPIDR) on "Accessing and Making Sense of Digital Trace Data for Demographic Research", both of which were sponsored by the BBVA Foundation. (Videos available below).
First Day: 6 June
The Research Workshop was officially opened by Elena Manzanera (IECA), Juan del Ojo (IECA), and Giampaolo Lanzieri (Eurostat). Participants were then welcomed by the two Chairs of the IUSSP Panel on Digital Demography, Emilio Zagheni (MPIDR) and Francesco Billari (Bocconi University), and by Diego Ramiro Fariñas (CSIC), member of the Panel’s steering committee and local organizer of the workshop.
The first session, on Digital Demography, included two presentations. Samin Aref (MPIDR) introduced his work using the Web of Science (WoS) to track the international mobility of researchers through their affiliations as listed in publications. Sofía Gil-Clavel (MPIDR) made a presentation on demographic differentials in Facebook usage around the World, using disaggregated data from 136 countries by age and gender, retrieved from the Facebook Marketing Application Programming Interface (API).
The second session brought different methods to study mobility and migration. Dilek Yildiz (Wittgenstein Centre for Demography and Global Human Capital) proposed a Bayesian probabilistic hierarchical model for combining traditional media (such as Eurostat, EU Labour Force Survey, population and housing census) and social media (Facebook) data on migration between 2011 and 2018. Asli Ebru (Bocconi University) presented analyses of the potential predictive power of Google search data to observe the movement of Syrian refugees under temporary protection in Turkey, across provinces.
The third session included two case studies which complemented the previous session on migration: Latin-Americans in Spain and the Indian and sub-Saharan diasporas. By exploiting the digital footprint of potential migrants in Google, Juan Galeano (Centre d'Estudis Demogràfics) investigated if it was possible to predict their entry in Spain. Nachatter Singh (Centre d'Estudis Demogràfics) also used Facebook to understand the mobility of highly educated immigrants by gender from Indian and sub-Saharan African diasporas, in comparison with the United Nations Global Migration Database.
The last session of the day focused on poverty and energy. Jordi Ripoll (Devstat, Spain) presented his work on exploring the use of an e-commerce dataset to measure poverty levels in Brazil, linked with data from official country statistics. This was followed by a presentation by Vasileios Giagloglou (TELNET, Spain), on the work carried out by Energy Minus+ using Machine learning with electrical data to predict the behaviour of a system using external variables to detect anomalies, predict savings and confirm and validate savings. He also presented an overview of his work for the H2020 LONGPOP project using Elasticsearch to harmonise databases for research.
Second day: 7 June
Guangqing Chi (Pennsylvania State University) presented an overview on how to retrieve data on migration, for example by comparing migration estimates from tax return files by U.S. Internal Revenue Service – IRS) with Twitter data, as well as the challenges of using Twitter for this purpose. By moving to the use of mobile phones for the study of demography, Valentina Rotondi (Bocconi University) and colleagues provided large-scale evidence that mobile phones can be a vehicle for sustainable development. For example, there were positive effects in terms of reducing child and maternal mortality, narrowing gender inequalities and enhancing contraceptive use.
This was followed by two presentations on mobile phones and population estimation. Romain Avouac (ENSAE ParisTech) gave insights on the use of a Bayesian approach to improve the estimation of population using mobile phone data. The main contribution of their study was the improvement of spatial mapping through a combination of data sources and the use of a modular approach. In the second paper, Fabio Ricciato and Giampaolo Lanzieri (Eurostat) proposed a methodological framework for estimating present population density from mobile network operator data.
The next session included two invited presentations on Big Data. Antonio Argüeso (National Institute of Statistics – INE, Spain) gave insights on the use of Big Data within the 2021 census of Spain. Even if Big Data will bring many advantages, such as quality (better measurement of reality), timeliness, continuity or more information (not limited by a census form), it also has important limitations, e.g. access to information which is not found in administrative registers or, even if found, that can be biased. On the other side, Alvaro Ortiz (BBVA Research) provided the experience of a private enterprise using Big Data to monitor world geopolitics. He presented the exploitation of the Global Database on Events Location and Tone (an open database with georeferenced events with more than 3000 themes and emotions), through text mining and sentiment analyses to detect social unrest, dynamic migration flows or global health issues.
The closing session of the workshop was dedicated to two papers on Twitter data. First, Dariya Ordanovich (ESRI España) offered an overview of the interdisciplinary work carried out with her colleagues on using geotagged messages from Twitter for fertility nowcasting, which introduces a significant added value to the statistical production at marginal cost. The intention was to understand fertility intentions and short-term fertility changes in time and space. José Javier Ramasco (Institute for Cross-Disciplinary Physics and Complex Systems, Spain) and colleagues, in collaboration with UNICEF, used geocoded Twitter data to detect migration flows, focusing on Venezuela. They explored questions concerning travelling routes, exit times, spatial distribution in new settlement areas, etc.
The workshop was closed with a summary and discussion by Emilio Zagheni, Francesco Billari and Diego Ramiro, which included a proposal to pursue this work within the IUSSP Panel on Digital Demography’s programme of activities over the next months.
*This meeting was organised by the IUSSP Panel on Digital Demography, LONGPOP H2020 Marie Sklodowska-Curie ITN project, Max Planck Institute for Demographic Research (MPIDR), DisCont (ERC Advanced Grant, Bocconi University), the Institute of Economy, Geography and Demography (Spanish National Research Council - CSIC) and the Institute of Statistics and Cartography of Andalusia (IECA). BBVA Foundation provided funding for pre-conference activities.
Watch video recordings of three of the presentations: