Multiscaling a Graph-based Dataspace
Keywords:
multiscale, dataspace, data integrationAbstract
Biologists increasingly need a unified view to understand and discover relationships among data elements scattered along data sources with different levels of heterogeneity.
Existing approaches usually adopt ad-hoc heavyweight integration strategies, requiring a costly upfront effort involving a monolithic chain of steps to handle specific formats/schemas, with low or no reuse.
Based on several previous work on data integration for data analysis, this work discusses the conception of a multiscale-based dataspace architecture, called LinkedScales. It departs from
the notion of integration-scales within a dataspace, and defines a systematic and progressive integration process via graph-based transformations over a graph database. LinkedScales aims to provide a homogeneous view of heterogeneous sources, allowing systems to reach and produce different integration levels on demand, going from raw representations (lower scales) towards ontology-like structures (higher scales). Although the proposed framework can be extended to several scenarios, this work focuses on the biology domain addressing the organism-centric analysis scenario. This paper details inner aspects of the architecture and its transformation process and introduces the Multiscale Transformation Graph, which tracks the transformation process among scales, enabling traceability. Obtained results reveal the viability of the framework and its implementation to integrate relevant resources for the organism-centric domain.