A Framework for ETL Systems Development
Keywords: data integration, data warehouse, ETL application, software framework
AbstractThere are many commercial Extract-Transform-Load (ETL) tools, of which most of them do not offer an integrated platform for modeling processes and extending functionality. This drawback complicates the customization and integration with other applications, and consequently, many companies adopt internal development of their ETL systems. A possible solution is to create a framework to provide extensibility, reusability and identification of hotspots. Although, most academic ETL frameworks address the development of conceptual frameworks, application-oriented tools and modeling of ETL processes, they do not include a programming framework to highlight the aspects of flexibility, extensibility and reusability in the ETL systems. We present FramETL, which is a novel framework for developing ETL systems that offers a programmable and integrated process modeling environment, and allows the creation of customized ETL transformations. To illustrate the FramETL applicability in different domains, we extended our approach to facilitate the ETL processes of data warehouses modeled as star schemas, and another example to define data integration processes in a cost accounting application was also addressed. The evaluation results showed that FramETL is a programmable and extensible framework, offers a platform to design distinct ETL applications for modeling and specializing processes in an integrated environment. Moreover, our proposal of a programming interface which conforms to the fundamentals of ETL formulations, allowed the reuse of common transformations in the implementation of our examples of applications, while enabled the documentation of the flexibility points that facilitated the creation of customized transformations for different ETL application domains.