Identifying Impact of Software Dependencies on Replicability of Biomedical Workflows

T. Miksa, A. Rauber, E. Mina:
"Identifying Impact of Software Dependencies on Replicability of Biomedical Workflows";
Journal of Biomedical Informatics,64C(2016), S. 232 - 254.

Zusätzliche Informationen


Abstract:


Complex data driven experiments form the basis of biomedical research. Recent findings warn that the context in which the software is run, that is the infrastructure and the third party dependencies, can have a crucial impact on the final results delivered by a computational experiment. This implies that in order to replicate the same result, not only the same data must be used, but also it must be run on an equivalent software stack.

In this paper we present the VFramework that enables assessing replicability of workflows. It identifies whether any differences in software dependencies among two executions of the same workflow exist and whether they have impact on the produced results. We also conduct a case study in which we investigate the impact of software dependencies on replicability of Taverna workflows used in biomedical research of Huntington´s disease. We re-execute analysed workflows in environments differing in operating system distribution and configuration.

The results show that the VFramework can be used to identify the impact of software dependencies on the replicability of biomedical workflows. Furthermore, we observe that despite the fact that the workflows are executed in a controlled environment, they still depend on specific tools installed in the environment. The context model used by the VFramework improves the deficiencies of provenance traces and documents also such tools. Based on our findings we define guidelines for workflow owners that enable them to improve replicability of their workflows.