Abstract
As more data management software is designed for deployment in public and private clouds, or on a cluster of commodity servers, new distributed storage systems increasingly achieve high data access throughput via partitioning and replication. In order to achieve high scalability, however, today's systems generally reduce transactional support, disallowing single transactions from spanning multiple partitions.
This article describes Calvin, a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to significantly reduce the normally prohibitive contention costs associated with distributed transactions. This allows near-linear scalability on a cluster of commodity machines, without eliminating traditional transactional guarantees, introducing a single point of failure, or requiring application developers to reason about data partitioning. By replicating transaction inputs instead of transactional actions, Calvin is able to support multiple consistency levels—including Paxos-based strong consistency across geographically distant replicas—at no cost to transactional throughput.
Furthermore, Calvin introduces a set of tools that will allow application developers to gain the full performance benefit of Calvin's server-side transaction scheduling mechanisms without introducing the additional code complexity and inconvenience normally associated with using DBMS stored procedures in place of ad hoc client-side transactions.
- D. J. Abadi. 2012. Consistency tradeoffs in modern distributed database system design: Cap is only part of the story. IEEE Comput. 45, 2. Google ScholarDigital Library
- J. C. Anderson, J. Lehnardt, and N. Slater. 2010. Fast distributed transactions and strongly consistent replication for oltp database systems. In CouchDB: The Definitive Guide 1st Ed., O'Reilly Media, 1337:35.Google Scholar
- J. Baker, C. Bond, J. Corbett, J. J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of the Conference on Innovative Data System Research (CIDR'11). 223--234. Google ScholarDigital Library
- P. A. Bernstein, C. W. Reid, and S. Das. 2011. Hyder—A transactional record manager for shared flash. In Proceedings of the Conference on Innovative Data System Research (CIDR'11). 9--20.Google Scholar
- D. Campbell, G. Kakivaya, and N. Ellis. 2010. Extreme scale with full sql language support in microsoft sql azure. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). 1021--1024. Google ScholarDigital Library
- T. Cao, M. Vaz Salles, B. Sowell, Y. Yue, A. Demers, J. Gehrke, and W. White. 2011. Fast checkpoint recovery algorithms for frequently consistent applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11). 265--276. Google ScholarDigital Library
- B. Carlile. 2010. Tpc benchmark c full disclosure report: Oracle sparc supercluster with t3-4 servers using oracle database 11g release 2 with oracle real application clusters and partitioning. http://c970058.r58.cf2.rackcdn.com/fdr/tpcc/Oracle_SPARC_SuperCluster_with_T3-4s_TPC-C_FDR_120210.pdf.Google Scholar
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI'06). 205--218. Google ScholarDigital Library
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. 2008. Pnuts: Yahoo!'s hosted data serving platform. Proc. VLDB Endow. 1, 2, 1277--1288. Google ScholarDigital Library
- J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. 2012. Spanner: Google's globally-distributed database. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). 251--264. Google ScholarDigital Library
- G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. 2007. Dynamo: Amazon's highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41, 6, 205--220. Google ScholarDigital Library
- S. Gilbert and N. Lynch. 2002. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 33, 2, 51--59. Google ScholarDigital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. 2010. Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference. Google ScholarDigital Library
- E. P. C. Jones, D. J. Abadi, and S. R. Madden. 2010. Concurrency control for partitioned databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). 603--614. Google ScholarDigital Library
- A. Lakshman and P. Malik. 2009. Cassandra: Structured storage system on a p2p network. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures (PODC'09). 47. Google ScholarDigital Library
- L. Lamport. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169. Google ScholarDigital Library
- L. Lamport. 2001. Paxos made simple. ACM SIGACT News 34, 4, 18--25. Google ScholarDigital Library
- D. Lomet and M. F. Mokbel. 2009. Locking key ranges with unbundled transaction services. Proc. VLDB Endow. 2, 1, 265--276. Google ScholarDigital Library
- D. B. Lomet, A. Fekete, G. Weikum, and M. J. Zwilling. 2009. Unbundling transaction services in the cloud. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR'09). Google ScholarDigital Library
- C. Mohan, B. G. Lindsay, and R. Obermarck. 1986. Transaction management in the r* distributed database management system. ACM Trans. Database Syst. 11, 4, 378--396. Google ScholarDigital Library
- E. Pacitti, M. T. Ozsu, and C. Coulon. 2003. Preventive multi-master replication in a cluster of autonomous databases. In Proceedings of the 9th Euro-Par Conference on Parellel Processing. 318--327.Google Scholar
- E. Plugge, T. Hawkins, and P. Membrey. 2010. The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing. Apress, Berkely, CA. Google ScholarDigital Library
- J. Rao, E. J. Shekita, and S. Tata. 2011. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow. 4, 4, 243--254. Google ScholarDigital Library
- M. Seltzer. 2011. Oracle nosql database. http://www.oracle.com/webapps/dialogue/ns/dlgwelcome.jsp?p_ext=Y&p_dlg_id==14620894&src==7912319&Act==63&sckw==WWMK13067492MPP001.Google Scholar
- M. Stonebraker, S. R. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. 2007. The end of an architectural era (it's time for a complete rewrite). In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). 1150--1160. Google ScholarDigital Library
- A. Thomson and D. J. Abadi. 2010. The case for determinism in database systems. Proc. VLDB. Endow. 3, 1--2, 70--80. Google ScholarDigital Library
- A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. 2012. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 1--12. Google ScholarDigital Library
- A. Whitney, D. Shasha, and S. Apter. 1997. High volume transaction processing without concurrency control, two phase commit, sql or c++. In Proceedings of the International Workshop on High Performance Transaction Systems (HPTS'97).Google Scholar
Index Terms
- Fast Distributed Transactions and Strongly Consistent Replication for OLTP Database Systems
Recommendations
Calvin: fast distributed transactions for partitioned database systems
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of DataMany distributed storage systems achieve high data access throughput via partitioning and replication, each system with its own advantages and tradeoffs. In order to achieve high scalability, however, today's systems generally reduce transactional ...
Distributed Optimistic Concurrency Control Methods for High-Performance Transaction Processing
There is an ever-increasing demand for more complex transactions and higher throughputs in transaction processing systems leading to higher degrees of transaction concurrency and, hence, higher data contention. The conventional two-phase locking (2PL) ...
GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database
PACMMODMultinational enterprises conduct global business that has a demand for geo-distributed transactional databases. Existing state-of-the-art databases adopt a sharded master-follower replication architecture. However, the single-master serving mode incurs ...
Comments