This website requires JavaScript.

ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees

Karima EchihabiTheophanis TsandilasAnna GogolouAnastasia BezerianosThemis Palpanas
Dec 2022
Existing systems dealing with the increasing volume of data series cannotguarantee interactive response times, even for fundamental tasks such assimilarity search. Therefore, it is necessary to develop analytic approachesthat support exploration and decision making by providing progressive results,before the final and exact ones have been computed. Prior works lack bothefficiency and accuracy when applied to large-scale data series collections. Wepresent and experimentally evaluate ProS, a new probabilistic learning-basedmethod that provides quality guarantees for progressive Nearest Neighbor (NN)query answering. We develop our method for k-NN queries and demonstrate how itcan be applied with the two most popular distance measures, namely, Euclideanand Dynamic Time Warping (DTW). We provide both initial and progressiveestimates of the final answer that are getting better during the similaritysearch, as well suitable stopping criteria for the progressive queries.Moreover, we describe how this method can be used in order to develop aprogressive algorithm for data series classification (based on a k-NNclassifier), and we additionally propose a method designed specifically for theclassification task. Experiments with several and diverse synthetic and realdatasets demonstrate that our prediction methods constitute the first practicalsolutions to the problem, significantly outperforming competing approaches.This paper was published in the VLDB Journal (2022).