Reverse-engineering the wheel: Validating replicated algorithm implementations for health geography

Obtaining enough location and movement data, of good enough quality, to perform meaningful studies on human movement in time and space used to be a major challenge for health researchers. Those days are gone. Collecting large amounts of high quality location and movement data is possible thanks to powerful, wearable devices such as smartphones.

Using computer algorithms to analyze how people move through space and time allows researchers to understand patterns they otherwise wouldn’t be able to observe. Examples of such patterns are travel path complexity, stationary activity time, and accessibility to public services.

It is one thing to have the tools to collect rich location and movement data, but it’s another to be able to analyze and interpret it. For example, our INTERACT project currently has 5 Billion Global Positioning System (GPS) location points from over a 1000 participants. These data cannot be easily interpreted.

Considering 78% of Canadians use a smartphone, these devices have led to a boom in new research on how environments shape our health behaviours. To truly harness the power of these smart technologies, researchers must develop and use open-source computer algorithms.

Together researchers have developed numerous computer algorithms on how people move through space and time, but when it comes to the implementation of these data-interpreting tools, the information is often not publicly available. This is why open-source computer algorithms are so important. They allow researchers to build on each other’s work and improve our understanding of human behaviour.

Reliable open-source code requires a good software engineer to write and test the computer algorithm in such a way that correctness is evident when examining the algorithm’s performance. In the context of evaluating how people move through space and time, doing so requires test data that is easy to interpret. While location and movement data may be increasingly easy to collect it is very difficult to make public because of the sensitive nature of location data.

So how can we move past these obstacles? It will require a major change in academic behaviour and the way we share our work.

1. Make the code behind academic papers open source

This solution solves the problem at its core. Different research teams would be guaranteed to work with the same tools when attempting to solve problems as complex problems about how people move through space and time. Researchers from different institutions can build on existing publications. The challenge with this proposed solution is that not everything can be open sourced. Open sourcing code requires test data, which is difficult to share. As well, a major academic goal is contributing to industry. Researchers must balance competing interests when open sourcing their algorithms.

2. Dedicate a section of every study towards describing, in detail, the implementation of used software

While typically information is provided about the algorithms used to conduct research, in some cases it is not enough to adequately recreate the algorithm. If not all academics are able (or willing) to publish their software, it would be a great service to the research community to thoroughly describe the steps involved in the process of building the algorithms. This could be done in separate online appendices or in supplemental material if there is insufficient space published papers.

3. Publish test data

Ultimately, if the source code cannot be released, to minimize the likelihood of creating a buggy replica of the original work, the authors of academic studies have the option of publishing test data. Depending on the quality and the amount of such data, this option has the potential to be more impactful than the second proposed solution. This approach must respects the ethical relationship between the researchers and the study participants.

Open source software used in academic studies has multiple benefits. When building upon or simply using the work of other academics there is a clear benefit in terms of reducing time and errors in computer algorithms. The replicability and transparency of a study increases. Nevertheless, researchers must correctly identify the instances where open sourcing is the right approach.

The proposed solutions have the potential to save a lot of time and money invested in implementing the same program many different times, while also making the jobs of other academics easier, more effective, and more interesting. It is safe to say that no researcher wants to spend time on recreating someone else’s work while they could be exploring new ideas.

Antoniu Vadan is an undergraduate student at the University of Saskatchewan, Canada who worked on implementing a joint Potential Path Area algorithm using discretized space. His work can be found at https://github.com/TeamINTERACT

--

--

--

CIHR-funded research team harnessing big data to deliver public health intelligence on the influence of urban form on health, well-being, and equity.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Strata Data Conference New York 2019: How to Unlock Your Data’s Full Potential

What Is Data Democratization and How To Do It Right?

Four Things Programmers Need To Know About Python Classes and Libraries

The Decentralized Collaboration Approach to Algorithmic Trading

Price Action Vs Indicator: Which is better and what to chose?

Google Data Studio: 5 Charts for Visualizing your Data

Personalize Starbucks Offer Recommendation Engines

The Future of Big Data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INTERACT

INTERACT

CIHR-funded research team harnessing big data to deliver public health intelligence on the influence of urban form on health, well-being, and equity.

More from Medium

REST Vs GRAPHQL Vs GRPC

All about Apache Flume

Data Type in JVM

What is Encryption vs Hashing vs Encoding vs Compression?