skip to main content
research-article

Distributed Geometric Query Monitoring Using Prediction Models

Published:26 May 2014Publication History
Skip Abstract Section

Abstract

Many modern streaming applications, such as online analysis of financial, network, sensor, and other forms of data, are inherently distributed in nature. An important query type that is the focal point in such application scenarios regards actuation queries, where proper action is dictated based on a trigger condition placed upon the current value that a monitored function receives. Recent work [Sharfman et al. 2006, 2007b, 2008] studies the problem of (nonlinear) sophisticated function tracking in a distributive manner. The main concept behind the geometric monitoring approach proposed there is for each distributed site to perform the function monitoring over an appropriate subset of the input domain. In the current work, we examine whether the distributed monitoring mechanism can become more efficient, in terms of the number of communicated messages, by extending the geometric monitoring framework to utilize prediction models. We initially describe a number of local estimators (predictors) that are useful for the applications that we consider and which have already been shown particularly useful in past work. We then demonstrate the feasibility of incorporating predictors in the geometric monitoring framework and show that prediction-based geometric monitoring in fact generalizes the original geometric monitoring framework. We propose a large variety of different prediction-based monitoring models for the distributed threshold monitoring of complex functions. Our extensive experimentation with a variety of real datasets, functions, and parameter settings indicates that our approaches can provide significant communication savings ranging between two times and up to three orders of magnitude, compared to the transmission cost of the original monitoring framework.

References

  1. B. Babcock and C. Olston. 2003. Distributed top-k monitoring. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). 28--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Burdakis and A. Deligiannakis. 2012. Detecting outliers in sensor networks using the geometric approach. In Proceedings of the 28th International Conference on Data Engineering (ICDE'12). 1108--1119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Cormode and M. Garofalakis. 2005. Sketching streams through the net: Distributed approximate query tracking. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Cormode and M. Garofalakis. 2007. Streaming in a connected world: Querying and tracking distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). 1178--1181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Cormode and M. Garofalakis. 2008. Approximate continuous querying over distributed streams. ACM Trans. Database Syst. 33, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. 2005. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'05). 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cormode, S. Muthukrishnan, and K. Yi. 2011. Algorithms for distributed functional monitoring. ACM Trans. Algor. 7, 21:1--21:20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Cormode, S. Muthukrishnan, and W. Zhuang. 2007. Conquering the divide: Continuous clustering of distributed data streams. In Proceedings of the 23rd International Conference on Data Engineering (ICDE'07). 1036--1045.Google ScholarGoogle Scholar
  9. A. Das, S. Ganguly, M. Garofalakis, and R. Rastogi. 2004. Distributed set-expression cardinality estimation. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB'04). Vol. 30. 312--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Deligiannakis, Y. Kotidis, and N. Roussopoulos. 2004. Compressing historical information in sensor networks. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04). 527--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Deligiannakis, Y. Kotidis, and N. Roussopoulos. 2007. Dissemination of compressed historical information in sensor networks. The VLDB J. 16, 4, 439--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Garofalakis, J. Gehrke, and R. Rastogi. 2002. Querying and mining data streams: You only get one look a tutorial. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'02). 635. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Garofalakis, D. Keren, and V. Samoladas. 2013. Sketch-based geometric monitoring of distributed stream queries. Proc. VLDB Endow. 6, 10, 937--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Giatrakos, A. Deligiannakis, M. Garofalakis, I. Sharfman, and A. Schuster. 2012. Prediction-based geometric monitoring over distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 265--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Giatrakos, Y. Kotidis, A. Deligiannakis, V. Vassalos, and Y. Theodoridis. 2013. In-network approximate computation of outliers with quality guarantees. Inf. Syst. 38, 8, 1285--1308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Gupta, K. Ramamritham, and M. Mohania. 2013. Ratio threshold queries over distributed data sources. Proc. VLDB Endow. 6, 8, 565--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Huang, M. Garofalakis, J. Hellerstein, A. Joseph, and N. Taft. 2006. Toward sophisticated detection with distributed triggers. In Proceedings of the SIGCOMM Workshop on Mining Network Data (MineNet'06). 311--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Huang, X. Nguyen, M. Garofalakis, and J. M. Hellerstein. 2007. Communication-efficient online detection of network-wide anomalies. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM'07). 134--142.Google ScholarGoogle Scholar
  19. A. Jain, J. M. Hellestein, S. Ratnasamy, and D. Wetherall. 2004. A wakeup call for internet monitoring systems: The case for distributed triggers. In Proceedings of the Hot Topics in Networks Workshops (HotNets'04).Google ScholarGoogle Scholar
  20. R. Keralapura, G. Cormode, and J. Ramamirtham. 2006. Communication-efficient distributed monitoring of thresholded counts. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Keren, I. Sharfman, A. Schuster, and A. Livne. 2012. Shape sensitive geometric monitoring. IEEE Trans. Knowl. Data Engin. 24, 8, 1520--1535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. 2004. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Z. Liu, B. Radunovic, and M. Vojnovic. 2012. Continuous distributed counting for non-monotonic streams. In Proceedings of the 31st Symposium on Principles of Database Systems (PODS'12). 307--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. 2005. TinyDB: An acquisitional query processing system for sensor networks. ACM Trans. Database Syst. 30, 122--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Olston, J. Jiang, and J. Widom. 2003. Adaptive filters for continuous queries over distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). 563--574. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Sagy, D. Keren, I. Sharfman, and A. Schuster. 2010. Distributed threshold querying of general functions by a difference of monotonic representation. Proc. VLDB Endow. 4, 46--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Sagy, I. Sharfman, D. Keren, and A. Schuster. 2011. Top-k vectorial aggregation queries in a distributed environment. J. Parallel Distrib. Comput. 71, 2, 302--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. Sharfman, A. Schuster, and D. Keren. 2006. A geometric approach to monitoring threshold functions over distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). 310--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. I. Sharfman, A. Schuster, and D. Keren. 2007a. Aggregate threshold queries in sensor networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'07). 1--10.Google ScholarGoogle Scholar
  30. I. Sharfman, A. Schuster, and D. Keren. 2007b. A geometric approach to monitoring threshold functions over distributed data streams. ACM Trans. Database Syst. 32, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. I. Sharfman, A. Schuster, and D. Keren. 2008. Shape sensitive geometric monitoring. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'08). 301--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Yi and Q. Zhang. 2013. Optimal tracking of distributed heavy hitters and quantiles. Algorithmica 65, 1, 206--223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Q. Zhang, J. Liu, and W. Wang. 2008. Approximate clustering on distributed data streams. In Proceedings of the 24th International Conference on Data Engineering (ICDE'08). 1131--1139. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distributed Geometric Query Monitoring Using Prediction Models

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 39, Issue 2
          May 2014
          336 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/2627748
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 May 2014
          • Accepted: 1 March 2014
          • Revised: 1 January 2014
          • Received: 1 August 2013
          Published in tods Volume 39, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader