Collecting and Storing Data from Internet-based Sources

Monday 11 June 2018
13:15 – 16:45 BST
Jura Teaching Lab, Level 4 Annexe, University of Glasgow Library, Hillhead Street, Glasgow G12 8QE

Collecting and Storing Data from Internet-based Sources will be an afternoon session providing researchers with the essential skills required to effectively use Application Programming Interfaces (APIs) for downloading data from a variety of online data sources.

It will then cover the use of databases for storing and retrieving data and demonstrate how to automate the collection processes.


Peter Smyth, Reasearch Associate, University of Manchester


Half day (Monday 11th June, 2018, 1:15pm – 4:45pm)


Jura teaching lab, Level 4 Annexe, Glasgow University Library


Researchers who need to collect Internet based data, e.g. social media and store it over a period of time


  • £25 - For UK registered students
  • £35 - For staff at UK academic institutions, Research Council UK funded researchers, UK public sector staff and staff at UK registered charity organisations
  • £50 - For all other participants

Pre-requisite knowledge

Some knowledge of Python would be useful but not essential as all code used will be provided.

Course summary

Many websites allow researchers and developers to download data using their Application Programming Interface (API). This data is often in formats that social scientists are unfamiliar with (e.g. JSON). Downloaded data can be processed immediately or stored in a database for later processing in a package like R or Stata. Data can be collected at regular intervals over a period of time, using the built-in functionality of the Windows or Linux operating systems.

Course content

Course participants will be introduced to the following:

  • Understand the JSON data format
  • Understand how to use APIs to collect data
  • Data storage and retrieval using a database (SQLite)
  • Ability to set up automated procedures to collect data

Payment and registration

Registration is available via Eventbrite.

For any queries regarding registration, please contact Keith Maynard.


Peter Smyth is a Research Associate at the University of Manchester, based in the Cathie Marsh Institute. He has spent 35 years working in IT at various large and small commercial organisations before taking an MSc in Big Data Analytics at Sheffield Hallam University and moving into academia. In his previous roles he used any convenient programming environment to hand to solve problems. Now he teaches a variety of programming languages to help others to do the same.

He is a qualified Data and Software Carpentry instructor.