Time Series Indexing Taking Advantage of the Generalized Suffix Tree

  • Daniel Yoshinobu Takada Chino Universidade de São Paulo
  • Felipe Alves da Louza Universidade de São Paulo
  • Agma Juci Machado Traina Universidade de São Paulo
  • Cristina Dutra de Aguiar Ciferri Universidade de São Paulo
  • Caetano Traina Junior Universidade de São Paulo
Keywords: generalized suffix tree, indexing, similarity search, time series

Abstract

A time series is a collection of observations made sequentially over time. Time series appear in several application areas such as finance, marketing, agriculture, weather, industrial and scientific data gathering. Similarity searching on time series databases is an important tool to extract knowledge from them. In this article, we propose Telesto, a novel indexing approach aimed at performing similarity search over time series, which is based on discretized time series and generalized suffix trees. Initially, Telesto discretizes time series and represents them as strings, using as a basis the Symbolic Aggregate Approximation (SAX) technique. Thereafter, these strings are indexed using a generalized suffix tree. To provide both range and nearest neighbor query operations among discretized time series, Telesto extends the suffix tree substring search algorithm by calculating distances between the discretized time series. Performance tests showed that Telesto is scalable in response to increasing sizes of databases and queries, in addition to be very efficient in similarity queries over large real-world time series databases. 

Published
2012-09-20