Google Globally-Distributed Database

PDF Publication Title:

Google Globally-Distributed Database ( google-globally-distributed-database )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 002

provides externally consistent [16] reads and writes, and globally-consistent reads across the database at a time- stamp. These features enable Spanner to support con- sistent backups, consistent MapReduce executions [12], and atomic schema updates, all at global scale, and even in the presence of ongoing transactions. These features are enabled by the fact that Spanner as- signs globally-meaningful commit timestamps to trans- actions, even though transactions may be distributed. The timestamps reflect serialization order. In addition, the serialization order satisfies external consistency (or equivalently, linearizability [20]): if a transaction T1 commits before another transaction T2 starts, then T1’s commit timestamp is smaller than T2’s. Spanner is the first system to provide such guarantees at global scale. The key enabler of these properties is a new TrueTime API and its implementation. The API directly exposes clock uncertainty, and the guarantees on Spanner’s times- tamps depend on the bounds that the implementation pro- vides. If the uncertainty is large, Spanner slows down to wait out that uncertainty. Google’s cluster-management software provides an implementation of the TrueTime API. This implementation keeps uncertainty small (gen- erally less than 10ms) by using multiple modern clock references (GPS and atomic clocks). Section 2 describes the structure of Spanner’s imple- mentation, its feature set, and the engineering decisions that went into their design. Section 3 describes our new TrueTime API and sketches its implementation. Sec- tion 4 describes how Spanner uses TrueTime to imple- ment externally-consistent distributed transactions, lock- free read-only transactions, and atomic schema updates. Section 5 provides some benchmarks on Spanner’s per- formance and TrueTime behavior, and discusses the ex- periences of F1. Sections 6, 7, and 8 describe related and future work, and summarize our conclusions. 2 Implementation This section describes the structure of and rationale un- derlying Spanner’s implementation. It then describes the directory abstraction, which is used to manage replica- tion and locality, and is the unit of data movement. Fi- nally, it describes our data model, why Spanner looks like a relational database instead of a key-value store, and how applications can control data locality. A Spanner deployment is called a universe. Given that Spanner manages data globally, there will be only a handful of running universes. We currently run a test/playground universe, a development/production uni- verse, and a production-only universe. Spanner is organized as a set of zones, where each zone is the rough analog of a deployment of Bigtable Figure 1: Spanner server organization. servers [9]. Zones are the unit of administrative deploy- ment. The set of zones is also the set of locations across which data can be replicated. Zones can be added to or removed from a running system as new datacenters are brought into service and old ones are turned off, respec- tively. Zones are also the unit of physical isolation: there may be one or more zones in a datacenter, for example, if different applications’ data must be partitioned across different sets of servers in the same datacenter. Figure 1 illustrates the servers in a Spanner universe. A zone has one zonemaster and between one hundred and several thousand spanservers. The former assigns data to spanservers; the latter serve data to clients. The per-zone location proxies are used by clients to locate the spanservers assigned to serve their data. The uni- verse master and the placement driver are currently sin- gletons. The universe master is primarily a console that displays status information about all the zones for inter- active debugging. The placement driver handles auto- mated movement of data across zones on the timescale of minutes. The placement driver periodically commu- nicates with the spanservers to find data that needs to be moved, either to meet updated replication constraints or to balance load. For space reasons, we will only describe the spanserver in any detail. 2.1 Spanserver Software Stack This section focuses on the spanserver implementation to illustrate how replication and distributed transactions have been layered onto our Bigtable-based implementa- tion. The software stack is shown in Figure 2. At the bottom, each spanserver is responsible for between 100 and 1000 instances of a data structure called a tablet. A tablet is similar to Bigtable’s tablet abstraction, in that it implements a bag of the following mappings: (key:string, timestamp:int64) → string Unlike Bigtable, Spanner assigns timestamps to data, which is an important way in which Spanner is more like a multi-version database than a key-value store. A Published in the Proceedings of OSDI 2012 2

PDF Image | Google Globally-Distributed Database

PDF Search Title:

Google Globally-Distributed Database

Original File Name Searched:

spanner-osdi2012.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)