Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm <p><span style="font-size: 11px; line-height: 19px;">JIDM is an electronic journal that is published three times a year. Its focus is on information and data management in large repositories and document collections. Specifically, it relates to different areas from Computer Science, including databases, information retrieval, digital libraries, knowledge discovery, data mining, geographic information systems, among others.</span></p> <p><span style="font-size: small;"><span style="font-size: 11px; line-height: 19px;">For information on submission and policies, check the <strong><a href="/index.php/jidm/about" target="_self">About</a></strong> menu.</span></span></p> <p><span style="font-size: small;"><span style="font-size: 11px; line-height: 19px;">For information on the Editorial Board, check the <strong><a href="/index.php/jidm/about/displayMembership/5">People</a>&nbsp;</strong>menu.</span></span></p> en-US brayner@dc.ufc.br (Angelo Brayner) mholanda@unb.br (Maristela Holanda) Wed, 30 Dec 2020 00:00:00 -0300 OJS 3.3.0.14 http://blogs.law.harvard.edu/tech/rss 60 Editorial https://periodicos.ufmg.br/index.php/jidm/article/view/29454 Angelo Brayner; Maristela Holanda; Carina Friedrich Dorneles Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/29454 Wed, 30 Dec 2020 00:00:00 -0300 Frontmatter https://periodicos.ufmg.br/index.php/jidm/article/view/29453 Angelo Brayner; Maristela Holanda Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/29453 Wed, 30 Dec 2020 00:00:00 -0300 Polyflow: a Polystore-compliant Mechanism to Provide Interoperability to Heterogeneous Provenance Graphs https://periodicos.ufmg.br/index.php/jidm/article/view/24215 <p>any scientific experiments are modeled as workflows. Data from a workflow is captured by Workflow Management Systems (WfMS). Each WfMS has its own format to represent provenance (metadata that describes the generated data history), and stores it in different granularity in the form of a graph. Provenance allows scientists to analyze and evaluate results produced by a workflow. However, in more complex scenarios in which the scientist needs to analyze provenance graphs generated by multiple WfMSs and workflows, a challenge arises. To solve this problem, we propose a tool called Polyflow, which is based on the concept of Polystore systems, being able to integrate several databases of heterogeneous origin by adopting a global ProvONE schema. Polyflow allows scientists to query multiple provenance graphs in an integrated way. We evaluate Polyflow with experts using provenance data collected from real phylogenetic data analysis workflows.</p> Yan Mendes, Daniel de Oliveira, Victor Ströele Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24215 Wed, 30 Dec 2020 00:00:00 -0300 An Experimental Analysis of the Use of Different Storage Technologies on a Relational DBMS https://periodicos.ufmg.br/index.php/jidm/article/view/24843 <p>The most traditional Database Management Systems (DBMS) are built on the premise that the data is stored on magnetic disks such as hard disks drives (HDD). Recently, several alternatives to HDDs have emerged, such as the solid state drives (SSDs) based on non-volatile memory (NVM) technology such as 3D X-point and the new generations of dynamic random access memories (DRAM). The different characteristics of these devices may impact the performance of DBMSs. In this work, we propose to analyze the performance of a DBMS that stores its databases in four different ways, in HDD, SSD NVM, DRAM, and in a hybrid way, using the three storage devices together. To do this, we use two workloads, analytical and transactional, and we observe the throughput as well as the latency. After, we discuss the reasons that give rise to the results obtained for each type of storage. We also show that the query processing can benefit from the different characteristics of each storage device to perform faster queries and, finally, we analyze the benefits of using a hybrid storage system.</p> Francisco D. B. S. Praciano, Italo C. Abreu, Javam C. Machado Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24843 Wed, 30 Dec 2020 00:00:00 -0300 Efficient Processing of Analytical Queries Extended with Similarity Search Predicates over Images in Spark https://periodicos.ufmg.br/index.php/jidm/article/view/24300 <p class="p1">An image data warehousing extends a conventional data warehousing to also manipulate images represented&nbsp;by feature vectors and attributes for similarity search. A challenge that arises is the efficient processing of analytical&nbsp;queries extended with a similarity search predicate. These queries have a high computational cost since they require the&nbsp;processing of costly star join operations and distance calculations in the same setting. We consider applications that&nbsp;manage huge volumes of data, where the use of parallel and distributed data processing frameworks is needed. In this&nbsp;article, we introduce two methods to efficiently solve this challenge in Spark. BrOmnImg is based on the integration of&nbsp;the broadcast join and the Omni techniques for the processing of the star join operation and the distance calculations,&nbsp;respectively. BrOmnImg<span class="s1">CF </span>extends BrOmnImg by using the conventional predicate to further reduce the number of&nbsp;distance calculations. Compared with the closest method available in the literature, BrOmnImg reduced the time spent&nbsp;on query processing by up to about 65%. Compared with BrOmnImg, BrOmnImg<span class="s1">CF </span>improved the performance by&nbsp;up to about 54%.</p> Cristina Dutra de Aguiar Ciferri, Guilherme Muzzi da Rocha Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24300 Wed, 30 Dec 2020 00:00:00 -0300 Mining Temporal Exception Rules from Multivariate Time Series Using a new Support Measure https://periodicos.ufmg.br/index.php/jidm/article/view/24213 <p>Association rules are a common task to discover useful and comprehensive relationships among frequent and infrequent data. Frequent patterns describe a usual behavior, while infrequent ones represent uncommon knowledge. Our interest lies in finding exception rules, a class of infrequent patterns that may have critical effects as a consequence. Existing approaches for exception rules mining usually handle “itemsets databases”, where transactions are organized with no temporal information. However, temporality may be inherent to some real contexts and should be considered to improve the semantic quality of results. Moreover, these approaches implement a non-discriminatory support measure to estimate the relevance of an item, thus interpreting a large volume of data that may be merely occasional as patterns. Aiming to overcome these drawbacks, we propose TRiER (TempoRal Exception Ruler), an efficient method for mining temporal exception rules that not only discover exceptional behaviors and their causative agents, but also identifies how long consequences take to appear. We also present a new support measure to manipulate time series. This measure considers the context in which a pattern occurs, thus incorporating more semantics to the results obtained. We performed an extensive experimental analysis in real multivariate time series to verify the practical applicability of TRiER. Our results show TRiER has lower computational cost and is more scalable than existing approaches while finding a succinct and relevant set of patterns.</p> Thabata Amaral, Elaine Parros Machado de Sousa Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24213 Wed, 30 Dec 2020 00:00:00 -0300 SAVIME: An Array DBMS for Simulation Analysis and ML Models Prediction https://periodicos.ufmg.br/index.php/jidm/article/view/24223 <p>Limitations in current DBMSs prevent their wide adoption in scientific applications. In order to make them benefit from DBMS support, enabling Declarative data analysis and visualization over scientific data, we present an in-memory array DBMS system called SAVIME. In this work we describe the system SAVIME, along with its data model. Our preliminary evaluation show how SAVIME, by using a simple storage definition language (SDL) can outperform the state-of-the-art array database system, SciDB, during the process of data ingestion. We also show that it is possible to use SAVIME as a storage alternative for a numerical solver without affecting its scalability, making it useful for modern ML based applications.</p> Anderson Chaves da Silva, Hermano Lourenço Souza Lustosa, Daniel Nascimento Ramos da Silva, Fábio André Machado Porto, Patrick Valduriez Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24223 Sun, 14 Feb 2021 00:00:00 -0300 Overcoming Bias in Community Detection Evaluation https://periodicos.ufmg.br/index.php/jidm/article/view/24227 <p>Community detection is a key task to further understand the function and the structure of complex networks. Therefore, a strategy used to assess this task must be able to avoid biased and incorrect results that might invalidate further analyses or applications that rely on such communities. Two widely used strategies to assess this task are generally known as structural and functional. The structural strategy basically consists in detecting and assessing such communities by using multiple methods and structural metrics. On the other hand, the functional strategy might be used when ground truth data are available to assess the detected communities. However, the evaluation of communities based on such strategies is usually done in experimental configurations that are largely susceptible to biases, a situation that is inherent to algorithms, metrics and network data used in this task. Furthermore, such strategies are not systematically combined in a way that allows for the identification and mitigation of bias in their algorithms, metrics or network data to converge into more consistent results. In this context, the main contribution of this article is an approach that supports a robust quality evaluation when detecting communities in real-world networks. In our approach, we measure the quality of a community by applying the structural and functional strategies, and the combination of both, to obtain different pieces of evidence. Then, we consider the divergences and the consensus among the pieces of evidence to identify and overcome possible sources of bias in community detection algorithms, evaluation metrics, and network data. Experiments conducted with several real and synthetic networks provided results that show the effectiveness of our approach to obtain more consistent conclusions about the quality of the detected communities.</p> Jeancarlo Campos Leão, Alberto H. F. Laender, Pedro O. S. Vaz de Melo Copyright (c) 2021 Journal of Information and Data Management https://periodicos.ufmg.br/index.php/jidm/article/view/24227 Wed, 30 Dec 2020 00:00:00 -0300