Abstract
Many modern streaming applications, such as online analysis of financial, network, sensor, and other forms of data, are inherently distributed in nature. An important query type that is the focal point in such application scenarios regards actuation queries, where proper action is dictated based on a trigger condition placed upon the current value that a monitored function receives. Recent work [Sharfman et al. 2006, 2007b, 2008] studies the problem of (nonlinear) sophisticated function tracking in a distributive manner. The main concept behind the geometric monitoring approach proposed there is for each distributed site to perform the function monitoring over an appropriate subset of the input domain. In the current work, we examine whether the distributed monitoring mechanism can become more efficient, in terms of the number of communicated messages, by extending the geometric monitoring framework to utilize prediction models. We initially describe a number of local estimators (predictors) that are useful for the applications that we consider and which have already been shown particularly useful in past work. We then demonstrate the feasibility of incorporating predictors in the geometric monitoring framework and show that prediction-based geometric monitoring in fact generalizes the original geometric monitoring framework. We propose a large variety of different prediction-based monitoring models for the distributed threshold monitoring of complex functions. Our extensive experimentation with a variety of real datasets, functions, and parameter settings indicates that our approaches can provide significant communication savings ranging between two times and up to three orders of magnitude, compared to the transmission cost of the original monitoring framework.
- B. Babcock and C. Olston. 2003. Distributed top-k monitoring. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). 28--39. Google ScholarDigital Library
- S. Burdakis and A. Deligiannakis. 2012. Detecting outliers in sensor networks using the geometric approach. In Proceedings of the 28th International Conference on Data Engineering (ICDE'12). 1108--1119. Google ScholarDigital Library
- G. Cormode and M. Garofalakis. 2005. Sketching streams through the net: Distributed approximate query tracking. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). 13--24. Google ScholarDigital Library
- G. Cormode and M. Garofalakis. 2007. Streaming in a connected world: Querying and tracking distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). 1178--1181. Google ScholarDigital Library
- G. Cormode and M. Garofalakis. 2008. Approximate continuous querying over distributed streams. ACM Trans. Database Syst. 33, 2. Google ScholarDigital Library
- G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. 2005. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'05). 25--36. Google ScholarDigital Library
- G. Cormode, S. Muthukrishnan, and K. Yi. 2011. Algorithms for distributed functional monitoring. ACM Trans. Algor. 7, 21:1--21:20. Google ScholarDigital Library
- G. Cormode, S. Muthukrishnan, and W. Zhuang. 2007. Conquering the divide: Continuous clustering of distributed data streams. In Proceedings of the 23rd International Conference on Data Engineering (ICDE'07). 1036--1045.Google Scholar
- A. Das, S. Ganguly, M. Garofalakis, and R. Rastogi. 2004. Distributed set-expression cardinality estimation. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB'04). Vol. 30. 312--323. Google ScholarDigital Library
- A. Deligiannakis, Y. Kotidis, and N. Roussopoulos. 2004. Compressing historical information in sensor networks. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04). 527--538. Google ScholarDigital Library
- A. Deligiannakis, Y. Kotidis, and N. Roussopoulos. 2007. Dissemination of compressed historical information in sensor networks. The VLDB J. 16, 4, 439--461. Google ScholarDigital Library
- M. Garofalakis, J. Gehrke, and R. Rastogi. 2002. Querying and mining data streams: You only get one look a tutorial. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'02). 635. Google ScholarDigital Library
- M. Garofalakis, D. Keren, and V. Samoladas. 2013. Sketch-based geometric monitoring of distributed stream queries. Proc. VLDB Endow. 6, 10, 937--948. Google ScholarDigital Library
- N. Giatrakos, A. Deligiannakis, M. Garofalakis, I. Sharfman, and A. Schuster. 2012. Prediction-based geometric monitoring over distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 265--276. Google ScholarDigital Library
- N. Giatrakos, Y. Kotidis, A. Deligiannakis, V. Vassalos, and Y. Theodoridis. 2013. In-network approximate computation of outliers with quality guarantees. Inf. Syst. 38, 8, 1285--1308. Google ScholarDigital Library
- R. Gupta, K. Ramamritham, and M. Mohania. 2013. Ratio threshold queries over distributed data sources. Proc. VLDB Endow. 6, 8, 565--576. Google ScholarDigital Library
- L. Huang, M. Garofalakis, J. Hellerstein, A. Joseph, and N. Taft. 2006. Toward sophisticated detection with distributed triggers. In Proceedings of the SIGCOMM Workshop on Mining Network Data (MineNet'06). 311--316. Google ScholarDigital Library
- L. Huang, X. Nguyen, M. Garofalakis, and J. M. Hellerstein. 2007. Communication-efficient online detection of network-wide anomalies. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM'07). 134--142.Google Scholar
- A. Jain, J. M. Hellestein, S. Ratnasamy, and D. Wetherall. 2004. A wakeup call for internet monitoring systems: The case for distributed triggers. In Proceedings of the Hot Topics in Networks Workshops (HotNets'04).Google Scholar
- R. Keralapura, G. Cormode, and J. Ramamirtham. 2006. Communication-efficient distributed monitoring of thresholded counts. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). 289--300. Google ScholarDigital Library
- D. Keren, I. Sharfman, A. Schuster, and A. Livne. 2012. Shape sensitive geometric monitoring. IEEE Trans. Knowl. Data Engin. 24, 8, 1520--1535. Google ScholarDigital Library
- D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. 2004. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361--397. Google ScholarDigital Library
- Z. Liu, B. Radunovic, and M. Vojnovic. 2012. Continuous distributed counting for non-monotonic streams. In Proceedings of the 31st Symposium on Principles of Database Systems (PODS'12). 307--318. Google ScholarDigital Library
- S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. 2005. TinyDB: An acquisitional query processing system for sensor networks. ACM Trans. Database Syst. 30, 122--173. Google ScholarDigital Library
- C. Olston, J. Jiang, and J. Widom. 2003. Adaptive filters for continuous queries over distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). 563--574. Google ScholarDigital Library
- G. Sagy, D. Keren, I. Sharfman, and A. Schuster. 2010. Distributed threshold querying of general functions by a difference of monotonic representation. Proc. VLDB Endow. 4, 46--57. Google ScholarDigital Library
- G. Sagy, I. Sharfman, D. Keren, and A. Schuster. 2011. Top-k vectorial aggregation queries in a distributed environment. J. Parallel Distrib. Comput. 71, 2, 302--315. Google ScholarDigital Library
- I. Sharfman, A. Schuster, and D. Keren. 2006. A geometric approach to monitoring threshold functions over distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). 310--312. Google ScholarDigital Library
- I. Sharfman, A. Schuster, and D. Keren. 2007a. Aggregate threshold queries in sensor networks. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS'07). 1--10.Google Scholar
- I. Sharfman, A. Schuster, and D. Keren. 2007b. A geometric approach to monitoring threshold functions over distributed data streams. ACM Trans. Database Syst. 32, 4. Google ScholarDigital Library
- I. Sharfman, A. Schuster, and D. Keren. 2008. Shape sensitive geometric monitoring. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'08). 301--310. Google ScholarDigital Library
- K. Yi and Q. Zhang. 2013. Optimal tracking of distributed heavy hitters and quantiles. Algorithmica 65, 1, 206--223.Google ScholarDigital Library
- Q. Zhang, J. Liu, and W. Wang. 2008. Approximate clustering on distributed data streams. In Proceedings of the 24th International Conference on Data Engineering (ICDE'08). 1131--1139. Google ScholarDigital Library
Index Terms
- Distributed Geometric Query Monitoring Using Prediction Models
Recommendations
Prediction-based geometric monitoring over distributed data streams
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of DataMany modern streaming applications, such as online analysis of financial, network, sensor and other forms of data are inherently distributed in nature. An important query type that is the focal point in such application scenarios regards actuation ...
A geometric approach to monitoring threshold functions over distributed data streams
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataMonitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More ...
Continuous distributed monitoring: a short survey
AlMoDEP '11: Proceedings of the First International Workshop on Algorithms and Models for Distributed Event ProcessingIn the model of continuous distributed monitoring, a number of observers each see a stream of observations. Their goal is to work together to compute a function of the union of their observations. This can be as simple as counting the total number of ...
Comments