Reproducibility as a service: can the cloud make it real?

Research published in academic journals is trustworthy. Or is it? This question is being asked more and more these days. While few doubt the integrity of the researchers, many in the scientific community are concerned about the inability to reproduce experiments. If the work is not reproducible, how can its reliability be judged?

 

Kenji Takeda, Solutions Architect and Technical Manager, Microsoft Research, writes:

Making research reproducible is far from trivial. It involves sharing not just the results, methods, and data, but also the implied knowledge of the original researcher, without which it’s difficult to independently reproduce the complete results. The sharing of results and methods has become established practice over the last 350 years through academic publication. Data sharing is coming of age, and huge efforts by the research and wider community on open data are starting to bear fruit. But that still leaves a big problem: how to share the original researcher’s implied knowledge. Fortunately, the possibility of attaining completely reproducible computational experiments is drawing closer, as we’ve seen this week (December 8–11, 2014) during the Recomputability 2014 workshop at the 7th IEEE/ACM International Conference on Utility and Cloud Computing in London.

During the workshop, Tom Crick of Cardiff Metropolitan University, Benjamin Hall of the University of Cambridge, Samin Ishtiaq of Microsoft Research [Cambridge], and I presented new ways of thinking about reproducibility. We’ve been considering what it takes to make reproducible research not only possible but attractive to researchers. Much of the current work around data sharing and reproducibility focuses on the person trying to reproduce the work. As such, it often fails to take into account the intense pressures under which the original researcher labors: the stress of having to produce and publish results. We therefore propose building systems that make researchers more productive during their day-to-day work: automated systems that help make their work reproducible. In computational domains, researchers have many cloud-based tools that make our lives easier, such as Github, Visual Studio Online, figshare, Office 365, and OneDrive. We use these tools every day, but usually in a disconnected way. We envisage a new world of “reproducibility as a service,” wherein these disparate services are brought together to make it much easier for researchers to think, develop, test, and publish their computational work wherever they are. Read more about our thoughts in our workshop paper, “Share and Enjoy: Publishing Useful and Usable Scientific Models.”

"This is an exciting area of research and one that could have a profound impact on the way that computational science is performed. By rethinking how we develop, use, benchmark, and share algorithms, software, and models, alongside the development of integrated and automated e-infrastructure to support recomputability and reproducibility, we will be able to improve the efficiency of scientific exploration as well as promoting open and verifiable scientific research," says Crick.

We are also excited to be working with Ian Gent and his team at the University of St Andrews on Recomputation.org, whose work on using virtual machines in the cloud to freeze, and later unfreeze, computational experiments is very promising. It’s like creating an exact copy of Michael Faraday’s Lab and then being able to reproduce and extend his experiments. During the summer, we had a fantastic time exploring what’s possible; read more about our experiences trying to reproduce several experiments. These efforts provide a great starting point, although even with access to detailed lab notebooks, it is very difficult to know every detail of the original experimenter’s experience and implied knowledge. So this approach is not the whole solution, but certainly a move in the right direction.

All of these steps toward reproducibility are becoming more realizable due to rapid developments in the cloud, and we were excited to share our own experiences with the research community during the conference. Ant Rowstron of Microsoft Research described the need to think at rack scale—that is to say, to think in terms of racks, rather than individual servers, as the building blocks in our data centers. Ross Smith from Skype and Lori Ada Kilty of Microsoft Research showed how crowdsourcing and gamification can improve online conferencing systems, such as Lync and Skype. And I was delighted to discuss where we are heading with hyper-scale cloud computing, from efficiency, consistency, and productivity points of view. To round it off, we had fun with dozens of researchers as they learned about Microsoft Azure and Azure Machine Learning and explored how these platforms can help with their work.

All and all, it was an amazing week in London. We were thrilled to share how cloud computing is making a difference for researchers today—and the promise it holds for the future. If you’re interested in how cloud computing can help with your research, we invite you to learn more through our Azure for Research program, which offers a host of tools and tips, not to mention substantial grants of cloud resources for qualified projects.

Learn more

___________________________________________________________________



Read more

Looking for something specific?