First Impressions of a New Start
My first task was to install the OpenSAFELY pipeline and its small number of dependencies. The documentation was clear, well-presented and approachable and the necessary commands were copied, pasted and executed from the OpenSAFELY website without incident — special mention is due to the hover-over definitions and copy buttons on code snippets! I had my own version of the Getting Started project up and running within a few hours (in the interests of full disclosure, I am comfortable working on the command line and this was a fresh install on a laptop with full installation privileges).
By the end of Week 1, I was working on a new project with another data scientist looking at prescribing behaviours during the Covid-19 pandemic. All OpenSAFELY projects are built on top of a study definition, a flexible framework written in Python that allows the user to define the patient cohort and data required to generate the necessary research dataset. This detailed walkthrough is especially helpful when getting to grips with this framework, and the OpenSAFELY Github acts as a growing resource of example implementations that can help guide users when developing their own projects. This study definition in particular covers a broad range of primary care data.
The process of developing the study definition is iterative: cohort characteristics and data variables can be added over time and rerun. When an aspect of patient selection needed to be revised in my project, it was straightforward to amend the existing study definition and rerun the data generation. The idea of representing the data as a set of instructions rather than the data themselves is very appealing to me, particularly as the decision process for the set of instructions can be documented along with the analysis code via git commits.
The DataLab has high aspirations for its products and is committed to keeping those products open source. In the very short time that I’ve been here I can see why there is good reason to believe that these aspirations will be achieved. The right people with the right skills are in place. The pipeline has been designed to be robust and flexible. I’m very excited about what I’m going to learn and what I’m going to be able to contribute to working at The DataLab.