skip to main content
research-article

Fast Distributed Transactions and Strongly Consistent Replication for OLTP Database Systems

Published:26 May 2014Publication History
Skip Abstract Section

Abstract

As more data management software is designed for deployment in public and private clouds, or on a cluster of commodity servers, new distributed storage systems increasingly achieve high data access throughput via partitioning and replication. In order to achieve high scalability, however, today's systems generally reduce transactional support, disallowing single transactions from spanning multiple partitions.

This article describes Calvin, a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to significantly reduce the normally prohibitive contention costs associated with distributed transactions. This allows near-linear scalability on a cluster of commodity machines, without eliminating traditional transactional guarantees, introducing a single point of failure, or requiring application developers to reason about data partitioning. By replicating transaction inputs instead of transactional actions, Calvin is able to support multiple consistency levels—including Paxos-based strong consistency across geographically distant replicas—at no cost to transactional throughput.

Furthermore, Calvin introduces a set of tools that will allow application developers to gain the full performance benefit of Calvin's server-side transaction scheduling mechanisms without introducing the additional code complexity and inconvenience normally associated with using DBMS stored procedures in place of ad hoc client-side transactions.

References

  1. D. J. Abadi. 2012. Consistency tradeoffs in modern distributed database system design: Cap is only part of the story. IEEE Comput. 45, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. C. Anderson, J. Lehnardt, and N. Slater. 2010. Fast distributed transactions and strongly consistent replication for oltp database systems. In CouchDB: The Definitive Guide 1st Ed., O'Reilly Media, 1337:35.Google ScholarGoogle Scholar
  3. J. Baker, C. Bond, J. Corbett, J. J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of the Conference on Innovative Data System Research (CIDR'11). 223--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. A. Bernstein, C. W. Reid, and S. Das. 2011. Hyder—A transactional record manager for shared flash. In Proceedings of the Conference on Innovative Data System Research (CIDR'11). 9--20.Google ScholarGoogle Scholar
  5. D. Campbell, G. Kakivaya, and N. Ellis. 2010. Extreme scale with full sql language support in microsoft sql azure. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). 1021--1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Cao, M. Vaz Salles, B. Sowell, Y. Yue, A. Demers, J. Gehrke, and W. White. 2011. Fast checkpoint recovery algorithms for frequently consistent applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). 265--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Carlile. 2010. Tpc benchmark c full disclosure report: Oracle sparc supercluster with t3-4 servers using oracle database 11g release 2 with oracle real application clusters and partitioning. http://c970058.r58.cf2.rackcdn.com/fdr/tpcc/Oracle_SPARC_SuperCluster_with_T3-4s_TPC-C_FDR_120210.pdf.Google ScholarGoogle Scholar
  8. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI'06). 205--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. 2008. Pnuts: Yahoo!'s hosted data serving platform. Proc. VLDB Endow. 1, 2, 1277--1288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. 2012. Spanner: Google's globally-distributed database. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). 251--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. 2007. Dynamo: Amazon's highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41, 6, 205--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Gilbert and N. Lynch. 2002. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 33, 2, 51--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. 2010. Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. P. C. Jones, D. J. Abadi, and S. R. Madden. 2010. Concurrency control for partitioned databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). 603--614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Lakshman and P. Malik. 2009. Cassandra: Structured storage system on a p2p network. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures (PODC'09). 47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Lamport. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Lamport. 2001. Paxos made simple. ACM SIGACT News 34, 4, 18--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Lomet and M. F. Mokbel. 2009. Locking key ranges with unbundled transaction services. Proc. VLDB Endow. 2, 1, 265--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. B. Lomet, A. Fekete, G. Weikum, and M. J. Zwilling. 2009. Unbundling transaction services in the cloud. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Mohan, B. G. Lindsay, and R. Obermarck. 1986. Transaction management in the r* distributed database management system. ACM Trans. Database Syst. 11, 4, 378--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Pacitti, M. T. Ozsu, and C. Coulon. 2003. Preventive multi-master replication in a cluster of autonomous databases. In Proceedings of the 9th Euro-Par Conference on Parellel Processing. 318--327.Google ScholarGoogle Scholar
  22. E. Plugge, T. Hawkins, and P. Membrey. 2010. The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing. Apress, Berkely, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Rao, E. J. Shekita, and S. Tata. 2011. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow. 4, 4, 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Seltzer. 2011. Oracle nosql database. http://www.oracle.com/webapps/dialogue/ns/dlgwelcome.jsp?p_ext=Y&p_dlg_id==14620894&src==7912319&Act==63&sckw==WWMK13067492MPP001.Google ScholarGoogle Scholar
  25. M. Stonebraker, S. R. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. 2007. The end of an architectural era (it's time for a complete rewrite). In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). 1150--1160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Thomson and D. J. Abadi. 2010. The case for determinism in database systems. Proc. VLDB. Endow. 3, 1--2, 70--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. 2012. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Whitney, D. Shasha, and S. Apter. 1997. High volume transaction processing without concurrency control, two phase commit, sql or c++. In Proceedings of the International Workshop on High Performance Transaction Systems (HPTS'97).Google ScholarGoogle Scholar

Index Terms

  1. Fast Distributed Transactions and Strongly Consistent Replication for OLTP Database Systems

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Database Systems
                ACM Transactions on Database Systems  Volume 39, Issue 2
                May 2014
                336 pages
                ISSN:0362-5915
                EISSN:1557-4644
                DOI:10.1145/2627748
                Issue’s Table of Contents

                Copyright © 2014 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 26 May 2014
                • Accepted: 1 December 2013
                • Revised: 1 October 2013
                • Received: 1 October 2012
                Published in tods Volume 39, Issue 2

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader