And data for all
The challenges facing 'big data' users are as vast as the data itself.
Which tools to use, which access methodology, which data sets and where to store the outputs are questions causing just as many issues technically, as the storage and processing models being utilised for the collection of data.
At the Urban Big Data Centre we have decided to embark on a holistic approach to user access, which will allow users from multiple backgrounds and disciplines with varying degrees of computing knowledge to interface with the data we hold. The data stored by the UBDC is primarily open but we have provision for dealing with sensitive data specific to research programmes too. Confidential data is not handled by the UBDC directly though we have stringent processes in place with the ADRC-S (Administrative Data Research Centre Scotland), who handle this type of data, for cross combining the various data types within the ADRC secured environments.
In many areas there has been a simplification of the process for handling and storing large and complex data sets. The Urban Big Data Centre’s approach is to emulate this process into the user analysis environment. Or as Isaac Newton said:
"We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances. Therefore, to the same natural effects we must, so far as possible, assign the same causes.”
The primary user base for our data service consists of four core audiences:
To access the UBDC's data system for open and safeguarded data, we will have an online web form where users from all of the above groups can request access. These requests will then be validated, and if approved, the user will be enabled access. However, we will work with each user group or individual user slightly different in terms of software, compute and data resource.
Support for the Public user category will consist of access to open data from multiple sources held within the UBDC data system, as well as online training guides, open source tools for data analysis and Virtual Machines (VMs utilising Linux). Users will be able to run these independently of the the UBDC infrastructure, on home laptops or desktops to conduct their own research into areas that individuals may have an interest. Mechanisms will be in place for the finalised data outputs to be re-entered into the main UBDC data system to record and archive the data generated by this process. Access to this service will require the user to supply a valid email address and other registration details.
The three other user categories will use the same process as the Public user category, though the data sets that they access may vary from the ones used by the general public. Within these categories, Licensed as well as Open Source software will be made available on a per instance basis if required. All data generated from this work will be stored on the UBDC data system, with copies of the output files from the research made available to the researcher too. This process allows for archiving and also the cross-sharing and re-analysis of results between research groups, which is a powerful tool in its own right.
Users will be able to use virtual machines (VM) that will run three operating systems: Linux, Mac OS X and Windows. This will allow researchers to work on a cluster environment with a familiar desktop environment depending on their level of computing expertise.
Security mechanisms for user access will be built into the process for gaining access to the UBDC data system, prior to the use of the service. Additionally, the VM environments are heavily monitored to ensure that data security and integrity is maintained at all times.
The ultimate aim of the system is to allow data tools to be near to the data in digital environments that all users are comfortable using.
A massive task but it is big data, after all.