https://periodicos.ufmg.br/index.php/jidm/issue/feed Journal of Information and Data Management 2021-02-14T13:06:24-03:00 Angelo Brayner brayner@dc.ufc.br Open Journal Systems <p><span style="font-size: 11px; line-height: 19px;">JIDM is an electronic journal that is published three times a year. Its focus is on information and data management in large repositories and document collections. Specifically, it relates to different areas from Computer Science, including databases, information retrieval, digital libraries, knowledge discovery, data mining, geographic information systems, among others.</span></p> <p><span style="font-size: small;"><span style="font-size: 11px; line-height: 19px;">For information on submission and policies, check the <strong><a href="/index.php/jidm/about" target="_self">About</a></strong> menu.</span></span></p> <p><span style="font-size: small;"><span style="font-size: 11px; line-height: 19px;">For information on the Editorial Board, check the <strong><a href="/index.php/jidm/about/displayMembership/5">People</a>&nbsp;</strong>menu.</span></span></p> https://periodicos.ufmg.br/index.php/jidm/article/view/29454 Editorial 2021-02-14T13:06:24-03:00 Angelo Brayner brayner@dc.ufc.br Maristela Holanda maristela.holanda@gmail.com Carina Friedrich Dorneles dorneles@gmail.com 2020-12-30T00:00:00-03:00 Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/29453 Frontmatter 2021-02-14T11:25:56-03:00 Angelo Brayner brayner@dc.ufc.br Maristela Holanda maristela.holanda@gmail.com 2020-12-30T00:00:00-03:00 Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24215 Polyflow: a Polystore-compliant Mechanism to Provide Interoperability to Heterogeneous Provenance Graphs 2020-10-07T09:32:24-03:00 Yan Ferreira ymendesf@gmail.com Daniel Oliveira danielcmo@ic.uff.br Victor Ströele victor.stroele@ice.ufjf.br <p>any scientific experiments are modeled as workflows. Data from a workflow is captured by Workflow Management Systems (WfMS). Each WfMS has its own format to represent provenance (metadata that describes the generated data history), and stores it in different granularity in the form of a graph. Provenance allows scientists to analyze and evaluate results produced by a workflow. However, in more complex scenarios in which the scientist needs to analyze provenance graphs generated by multiple WfMSs and workflows, a challenge arises. To solve this problem, we propose a tool called Polyflow, which is based on the concept of Polystore systems, being able to integrate several databases of heterogeneous origin by adopting a global ProvONE schema. Polyflow allows scientists to query multiple provenance graphs in an integrated way. We evaluate Polyflow with experts using provenance data collected from real phylogenetic data analysis workflows.</p> 2020-12-30T00:00:00-03:00 Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24843 An Experimental Analysis of the Use of Different Storage Technologies on a Relational DBMS 2020-10-14T08:15:17-03:00 Francisco D. B. S. Praciano daniel.praciano@lsbd.ufc.br Italo C. Abreu italo.abreu@lsbd.ufc.br Javam C. Machado javam.machado@lsbd.ufc.br <p>The most traditional Database Management Systems (DBMS) are built on the premise that the data is stored on magnetic disks such as hard disks drives (HDD). Recently, several alternatives to HDDs have emerged, such as the solid state drives (SSDs) based on non-volatile memory (NVM) technology such as 3D X-point and the new generations of dynamic random access memories (DRAM). The different characteristics of these devices may impact the performance of DBMSs. In this work, we propose to analyze the performance of a DBMS that stores its databases in four different ways, in HDD, SSD NVM, DRAM, and in a hybrid way, using the three storage devices together. To do this, we use two workloads, analytical and transactional, and we observe the throughput as well as the latency. After, we discuss the reasons that give rise to the results obtained for each type of storage. We also show that the query processing can benefit from the different characteristics of each storage device to perform faster queries and, finally, we analyze the benefits of using a hybrid storage system.</p> 2020-12-30T00:00:00-03:00 Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24300 Efficient Processing of Analytical Queries Extended with Similarity Search Predicates over Images in Spark 2020-10-04T13:58:02-03:00 Cristina Dutra de Aguiar Ciferri cdac@uol.com.br Guilherme Muzzi da Rocha guilherme.muzzi.rocha@usp.br <p class="p1">An image data warehousing extends a conventional data warehousing to also manipulate images represented&nbsp;by feature vectors and attributes for similarity search. A challenge that arises is the efficient processing of analytical&nbsp;queries extended with a similarity search predicate. These queries have a high computational cost since they require the&nbsp;processing of costly star join operations and distance calculations in the same setting. We consider applications that&nbsp;manage huge volumes of data, where the use of parallel and distributed data processing frameworks is needed. In this&nbsp;article, we introduce two methods to efficiently solve this challenge in Spark. BrOmnImg is based on the integration of&nbsp;the broadcast join and the Omni techniques for the processing of the star join operation and the distance calculations,&nbsp;respectively. BrOmnImg<span class="s1">CF </span>extends BrOmnImg by using the conventional predicate to further reduce the number of&nbsp;distance calculations. Compared with the closest method available in the literature, BrOmnImg reduced the time spent&nbsp;on query processing by up to about 65%. Compared with BrOmnImg, BrOmnImg<span class="s1">CF </span>improved the performance by&nbsp;up to about 54%.</p> 2020-12-30T00:00:00-03:00 Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24213 Mining Temporal Exception Rules from Multivariate Time Series Using a new Support Measure 2020-10-14T08:22:52-03:00 Thabata Amaral thabataamaral23@gmail.com Elaine Parros Machado de Sousa parros@icmc.usp.br <p>Association rules are a common task to discover useful and comprehensive relationships among frequent and infrequent data. Frequent patterns describe a usual behavior, while infrequent ones represent uncommon knowledge. Our interest lies in finding exception rules, a class of infrequent patterns that may have critical effects as a consequence. Existing approaches for exception rules mining usually handle “itemsets databases”, where transactions are organized with no temporal information. However, temporality may be inherent to some real contexts and should be considered to improve the semantic quality of results. Moreover, these approaches implement a non-discriminatory support measure to estimate the relevance of an item, thus interpreting a large volume of data that may be merely occasional as patterns. Aiming to overcome these drawbacks, we propose TRiER (TempoRal Exception Ruler), an efficient method for mining temporal exception rules that not only discover exceptional behaviors and their causative agents, but also identifies how long consequences take to appear. We also present a new support measure to manipulate time series. This measure considers the context in which a pattern occurs, thus incorporating more semantics to the results obtained. We performed an extensive experimental analysis in real multivariate time series to verify the practical applicability of TRiER. Our results show TRiER has lower computational cost and is more scalable than existing approaches while finding a succinct and relevant set of patterns.</p> 2020-12-30T00:00:00-03:00 Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24223 SAVIME: An Array DBMS for Simulation Analysis and ML Models Prediction 2020-10-14T08:20:16-03:00 Anderson Chaves da Silva achaves@lncc.br Hermano Lourenço Souza Lustosa hermano@lncc.br Daniel Nascimento Ramos da Silva dramos@lncc.br Fábio André Machado Porto fporto@lncc.br Patrick Valduriez patrick.valduriez@inria.fr <p>Limitations in current DBMSs prevent their wide adoption in scientific applications. In order to make them benefit from DBMS support, enabling Declarative data analysis and visualization over scientific data, we present an in-memory array DBMS system called SAVIME. In this work we describe the system SAVIME, along with its data model. Our preliminary evaluation show how SAVIME, by using a simple storage definition language (SDL) can outperform the state-of-the-art array database system, SciDB, during the process of data ingestion. We also show that it is possible to use SAVIME as a storage alternative for a numerical solver without affecting its scalability, making it useful for modern ML based applications.</p> 2021-02-14T00:00:00-03:00 Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24227 Overcoming Bias in Community Detection Evaluation 2020-10-20T13:26:17-03:00 Jeancarlo Campos Leão jeancarlo.leao@ifnmg.edu.br Alberto H. F. Laender laender@dcc.ufmg.br Pedro O. S. Vaz de Melo olmo@dcc.ufmg.br <p>Community detection is a key task to further understand the function and the structure of complex networks. Therefore, a strategy used to assess this task must be able to avoid biased and incorrect results that might invalidate further analyses or applications that rely on such communities. Two widely used strategies to assess this task are generally known as structural and functional. The structural strategy basically consists in detecting and assessing such communities by using multiple methods and structural metrics. On the other hand, the functional strategy might be used when ground truth data are available to assess the detected communities. However, the evaluation of communities based on such strategies is usually done in experimental configurations that are largely susceptible to biases, a situation that is inherent to algorithms, metrics and network data used in this task. Furthermore, such strategies are not systematically combined in a way that allows for the identification and mitigation of bias in their algorithms, metrics or network data to converge into more consistent results. In this context, the main contribution of this article is an approach that supports a robust quality evaluation when detecting communities in real-world networks. In our approach, we measure the quality of a community by applying the structural and functional strategies, and the combination of both, to obtain different pieces of evidence. Then, we consider the divergences and the consensus among the pieces of evidence to identify and overcome possible sources of bias in community detection algorithms, evaluation metrics, and network data. Experiments conducted with several real and synthetic networks provided results that show the effectiveness of our approach to obtain more consistent conclusions about the quality of the detected communities.</p> 2020-12-30T00:00:00-03:00 Copyright (c) 2021 Journal of Information and Data Management