blog | 16.09.2019 | Heather Sinclair

Data Science Update | January – July 2019

The Urban Big Data Centre Data Service has had another successful year to date with exciting new and extended data collections. We have also welcomed many new users to the service. In this blog, our Information Services Officer Heather Sinclair highlights how the data is being used and the impact it has had on research outputs. She also introduces a new addition to our transport data collection.

Data usage, outputs and impact - Zoopla data

Some users have already made excellent progress and great achievements. They have utilised the data to collaboratively work on projects, delivered presentations at conferences, produced blog posts and dissertations. In this section, an example of Zoopla data usage is described.

The housing unaffordability crisis not only affects different housing needs at different stages of life but may also contribute to spatial concentrations of populations of similar age and socioeconomic status. Using rich big data from Zoopla and the UBDC we can appropriately investigate this important issue. Professor Albert Sabater

Zoopla data was used for research into the relationship between housing (un)affordability and residential age segregation. The findings were presented at the Annual Conference of the British Sociological Association and the European Sociological Conference. There is also a blog about the research published on the Royal Geographical Society’s website. The project summary and links to these resources are listed below.

A central housing policy issue in the UK is the so-called 'affordability crisis' – the fact that both owner-occupied and private rental housing has become increasingly unaffordable, particularly for young adults. While the UK's affordability crisis has been developing slowly for decades, house prices rose steeply in the 1990s and early 2000s, and the potential consequences for the age make-up of different communities have so far been neglected. This research aims to investigate the relationship between housing unaffordability and residential age segregation across locales in the UK context. While the current policy focus in the UK and elsewhere on 'ageing in place' highlights one possible mechanism expected to increase residential age segregation, the hypothesis that the lack of affordable housing is also playing a key role will be tested.



The UBDC Data Science Team hope to be able to continue to provide access to Zoopla data for academic non-commercial research soon.

New data collections

Public Transport Availability Indicators (PTAI)

Public transport availability measures take account of both service frequency and service area. The service area is the area within which people are willing to walk to the station/stop along the road network. PTAIs can be used to study spatial and social inequalities in transport access and further estimate the population living in areas with poor public transport services.

The PTAI dataset provides public transport availability indicators at both the stop/station and small area levels across Great Britain (England, Wales and Scotland). The data includes small area level data are for UK 2011 lower layer super output area (LSOA) and middle layer super output area (MSOA) and for the year 2016. PTAI uses public transport schedule data and stop/station location data. Stop-level PTAIs were aggregated to small areas by overlaying service areas of stops with LSOA boundaries. This ensured that PTAI could be linked to socioeconomic data at the same geography level.

The two main small area geography levels in the UK are lower layer output area (LSOA) and middle layer super output area (MSOA). MSOA’s are made up of groups of LSOAs. Scotland has independent demographic surveys and uses different names to represent the two small area geography levels. Scottish counterparts of MSOA and LSOA are intermediate zone (IZ) and data zone (DZ). Compared to England and Wales, Scotland is less densely populated. Therefore, IZ and DZ have larger areas but smaller populations than MSOA and LSOA respectively.

English and Welsh LSOA boundaries were merged with Scottish DZ boundaries into a dataset “GB_LSOA_2011” and English and Welsh MSOA boundaries were merged with Scottish IZ boundaries into a dataset “GB_MSOA_2011”.

Some LSOAs might have a higher PTAI than their neighbours because they have railway stations or ferry stations. A few rural LSOAs have lower PTAIs due to larger boundary areas. In this case, most of the areas within those LSOAs are not served by the stops/stations.

The Urban Big Data Centre used both non-train and train schedule datasets and a train schedule dataset collected in July 2016 and combined them into one dataset (“GB_GTFS_2016”)

To use Public Transport Availability Indicators data for your research please complete our application form.

Heather Sinclair

Heather Sinclair is the Information Services Officer for Urban Big Data Centre. She provides information management services to enhance data collections and support data programmes.

Leave a comment. Please refer to our Comments Policy before posting.

Your comment