Time Series Indexing Taking Advantage of the Generalized Suffix Tree
Keywords:
generalized suffix tree, indexing, similarity search, time seriesAbstract
A time series is a collection of observations made sequentially over time. Time series appear in several application areas such as finance, marketing, agriculture, weather, industrial and scientific data gathering. Similarity searching on time series databases is an important tool to extract knowledge from them. In this article, we propose Telesto, a novel indexing approach aimed at performing similarity search over time series, which is based on discretized time series and generalized suffix trees. Initially, Telesto discretizes time series and represents them as strings, using as a basis the Symbolic Aggregate Approximation (SAX) technique. Thereafter, these strings are indexed using a generalized suffix tree. To provide both range and nearest neighbor query operations among discretized time series, Telesto extends the suffix tree substring search algorithm by calculating distances between the discretized time series. Performance tests showed that Telesto is scalable in response to increasing sizes of databases and queries, in addition to be very efficient in similarity queries over large real-world time series databases.