Reverse-engineering the wheel: Validating replicated algorithm implementations for health geography
Obtaining enough location and movement data, of good enough quality, to perform meaningful studies on human movement in time and space used to be a major challenge for health researchers. Those days are gone. Collecting large amounts of high quality location and movement data is possible thanks to powerful, wearable devices such as smartphones.
Using computer algorithms to analyze how people move through space and time allows researchers to understand patterns they otherwise wouldn’t be able to observe. Examples of such patterns are travel path complexity, stationary activity time, and accessibility to public services.
It is one thing to have the tools to collect rich location and movement data, but it’s another to be able to analyze and interpret it. For example, our INTERACT project currently has 5 Billion Global Positioning System (GPS) location points from over a 1000 participants. These data cannot be easily interpreted.
Considering 78% of Canadians use a smartphone, these devices have led to a boom in new research on how environments shape our health behaviours. To truly harness the power of these smart technologies, researchers must develop and use open-source computer algorithms.
Together researchers have developed numerous computer algorithms on how people move through space and time, but when it comes to the implementation of these data-interpreting tools, the information is often not publicly available. This is why open-source computer algorithms are so important. They allow researchers to build on each other’s work and improve our understanding of human behaviour.
Reliable open-source code requires a good software engineer to write and test the computer algorithm in such a way that correctness is evident when examining the algorithm’s performance. In the context of evaluating how people move through space and time, doing so requires test data that is easy to interpret. While location and movement data may be increasingly easy to collect it is very difficult to make public because of the sensitive nature of location data.
So how can we move past these obstacles? It will require a major change in academic behaviour and the way we share our work.
1. Make the code behind academic papers open source
This solution solves the problem at its core. Different research teams would be guaranteed to work with the same tools when attempting to solve problems as complex problems about how people move through space and time. Researchers from different institutions can build on existing publications. The challenge with this proposed solution is that not everything can be open sourced. Open sourcing code requires test data, which is difficult to share. As well, a major academic goal is contributing to industry. Researchers must balance competing interests when open sourcing their algorithms.
2. Dedicate a section of every study towards describing, in detail, the implementation of used software
While typically information is provided about the algorithms used to conduct research, in some cases it is not enough to adequately recreate the algorithm. If not all academics are able (or willing) to publish their software, it would be a great service to the research community to thoroughly describe the steps involved in the process of building the algorithms. This could be done in separate online appendices or in supplemental material if there is insufficient space published papers.
3. Publish test data
Ultimately, if the source code cannot be released, to minimize the likelihood of creating a buggy replica of the original work, the authors of academic studies have the option of publishing test data. Depending on the quality and the amount of such data, this option has the potential to be more impactful than the second proposed solution. This approach must respects the ethical relationship between the researchers and the study participants.
Open source software used in academic studies has multiple benefits. When building upon or simply using the work of other academics there is a clear benefit in terms of reducing time and errors in computer algorithms. The replicability and transparency of a study increases. Nevertheless, researchers must correctly identify the instances where open sourcing is the right approach.
The proposed solutions have the potential to save a lot of time and money invested in implementing the same program many different times, while also making the jobs of other academics easier, more effective, and more interesting. It is safe to say that no researcher wants to spend time on recreating someone else’s work while they could be exploring new ideas.
Antoniu Vadan is an undergraduate student at the University of Saskatchewan, Canada who worked on implementing a joint Potential Path Area algorithm using discretized space. His work can be found at https://github.com/TeamINTERACT