Google Globally-Distributed Database

PDF Publication Title:

Google Globally-Distributed Database ( google-globally-distributed-database )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 004

a Bigtable tablet: the former is not necessarily a single lexicographically contiguous partition of the row space. Instead, a Spanner tablet is a container that may encap- sulate multiple partitions of the row space. We made this decision so that it would be possible to colocate multiple directories that are frequently accessed together. Movedir is the background task used to move direc- tories between Paxos groups [14]. Movedir is also used to add or remove replicas to Paxos groups [25], be- cause Spanner does not yet support in-Paxos configura- tion changes. Movedir is not implemented as a single transaction, so as to avoid blocking ongoing reads and writes on a bulky data move. Instead, movedir registers the fact that it is starting to move data and moves the data in the background. When it has moved all but a nominal amount of the data, it uses a transaction to atomically move that nominal amount and update the metadata for the two Paxos groups. A directory is also the smallest unit whose geographic- replication properties (or placement, for short) can be specified by an application. The design of our placement-specification language separates responsibil- ities for managing replication configurations. Adminis- trators control two dimensions: the number and types of replicas, and the geographic placement of those replicas. They create a menu of named options in these two di- mensions (e.g., North America, replicated 5 ways with 1 witness). An application controls how data is repli- cated, by tagging each database and/or individual direc- tories with a combination of those options. For example, an application might store each end-user’s data in its own directory, which would enable user A’s data to have three replicas in Europe, and user B’s data to have five replicas in North America. For expository clarity we have over-simplified. In fact, Spanner will shard a directory into multiple fragments if it grows too large. Fragments may be served from different Paxos groups (and therefore different servers). Movedir actually moves fragments, and not whole direc- tories, between groups. 2.3 Data Model Spanner exposes the following set of data features to applications: a data model based on schematized semi-relational tables, a query language, and general- purpose transactions. The move towards support- ing these features was driven by many factors. The need to support schematized semi-relational tables and synchronous replication is supported by the popular- ity of Megastore [5]. At least 300 applications within Google use Megastore (despite its relatively low per- formance) because its data model is simpler to man- age than Bigtable’s, and because of its support for syn- chronous replication across datacenters. (Bigtable only supports eventually-consistent replication across data- centers.) Examples of well-known Google applications that use Megastore are Gmail, Picasa, Calendar, Android Market, and AppEngine. The need to support a SQL- like query language in Spanner was also clear, given the popularity of Dremel [28] as an interactive data- analysis tool. Finally, the lack of cross-row transactions in Bigtable led to frequent complaints; Percolator [32] was in part built to address this failing. Some authors have claimed that general two-phase commit is too ex- pensive to support, because of the performance or avail- ability problems that it brings [9, 10, 19]. We believe it is better to have application programmers deal with per- formance problems due to overuse of transactions as bot- tlenecks arise, rather than always coding around the lack of transactions. Running two-phase commit over Paxos mitigates the availability problems. The application data model is layered on top of the directory-bucketed key-value mappings supported by the implementation. An application creates one or more databases in a universe. Each database can contain an unlimited number of schematized tables. Tables look like relational-database tables, with rows, columns, and versioned values. We will not go into detail about the query language for Spanner. It looks like SQL with some extensions to support protocol-buffer-valued fields. Spanner’s data model is not purely relational, in that rows must have names. More precisely, every table is re- quired to have an ordered set of one or more primary-key columns. This requirement is where Spanner still looks like a key-value store: the primary keys form the name for a row, and each table defines a mapping from the primary-key columns to the non-primary-key columns. A row has existence only if some value (even if it is NULL) is defined for the row’s keys. Imposing this struc- ture is useful because it lets applications control data lo- cality through their choices of keys. Figure 4 contains an example Spanner schema for stor- ing photo metadata on a per-user, per-album basis. The schema language is similar to Megastore’s, with the ad- ditional requirement that every Spanner database must be partitioned by clients into one or more hierarchies of tables. Client applications declare the hierarchies in database schemas via the INTERLEAVE IN declara- tions. The table at the top of a hierarchy is a directory table. Each row in a directory table with key K, together with all of the rows in descendant tables that start with K in lexicographic order, forms a directory. ON DELETE CASCADE says that deleting a row in the directory table deletes any associated child rows. The figure also illus- trates the interleaved layout for the example database: for Published in the Proceedings of OSDI 2012 4

PDF Image | Google Globally-Distributed Database

PDF Search Title:

Google Globally-Distributed Database

Original File Name Searched:

spanner-osdi2012.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)