Capturing Distributed Provenance Metadata from Cloud-Based Scientific Workflows

  • Sergio Manuel Serra da Cruz UFRJ
  • Carlos Eduardo Paulino UFRJ
  • Daniel de Oliveira UFRJ
  • Maria Luiza Machado Campos UFRJ
  • Marta Mattoso UFRJ
Keywords: Provenance, Scientific Workflows, Cloud Computing, Metadata


Workflows are scientific abstractions used in the modeling of scientific experiments. High performance computing environments such as clusters and grids are often required to run the experiments. Cloud computing is starting to be adopted by the scientific community. However, the cloud environment is still incipient in collecting and recording retrospective workflow provenance. This paper presents an approach to capturing distributed provenance metadata from cloud-based scientific workflows. The approach was implemented through an evolution of the Matrioshka architecture that was refactored for cloud environments. Preliminary results show that provenance metadata captured from the virtual components running at the cloud can aid scientists to manage and reproduce their large scale in silico experiments.