Google Globally-Distributed Database

PDF Publication Title:

Google Globally-Distributed Database ( google-globally-distributed-database )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 007

when all locks have been acquired, but before any locks have been released. For a given transaction, Spanner as- signs it the timestamp that Paxos assigns to the Paxos write that represents the transaction commit. Spanner depends on the following monotonicity in- variant: within each Paxos group, Spanner assigns times- tamps to Paxos writes in monotonically increasing or- der, even across leaders. A single leader replica can triv- ially assign timestamps in monotonically increasing or- der. This invariant is enforced across leaders by making use of the disjointness invariant: a leader must only as- sign timestamps within the interval of its leader lease. maximum timestamp at which a replica is up-to-date. A replica can satisfy a read at a timestamp t if t <= tsafe. Define tsafe = min(tPaxos, tTM ), where each Paxos safe safe state machine has a safe time tPaxos and each transac- safe tion manager has a safe time tTM . tPaxos is simpler: it safe safe is the timestamp of the highest-applied Paxos write. Be- cause timestamps increase monotonically and writes are applied in order, writes will no longer occur at or below tPaxos with respect to Paxos. safe tTM is ∞ at a replica if there are zero prepared (but safe not committed) transactions—that is, transactions in be- tween the two phases of two-phase commit. (For a par- ticipant slave, tTM actually refers to the replica’s leader’s safe transaction manager, whose state the slave can infer through metadata passed on Paxos writes.) If there are any such transactions, then the state affected by those transactions is indeterminate: a participant replica does not know yet whether such transactions will commit. As we discuss in Section 4.2.1, the commit protocol ensures that every participant knows a lower bound on a pre- pared transaction’s timestamp. Every participant leader (for a group g) for a transaction Ti assigns a prepare timestamp sprepare to its prepare record. The coordinator i,g Note that whenever a timestamp s is assigned, s max is Spanner also enforces the following external- advanced to s to preserve disjointness. consistency invariant: if the start of a transaction T 2 occurs after the commit of a transaction T1, then the commit timestamp of T2 must be greater than the commit timestamp of T1. Define the start and commit events for a transaction T by estart and ecommit; and iii the commit timestamp of a transaction Ti by si. The invariant becomes t (ecommit) < t (estart) ⇒ s < s . write Ti to be ei . pared at g, tTM = mini(sprepare) − 1 over all transactions safe i,g prepared at g. 4.1.4 Assigning Timestamps to RO Transactions A read-only transaction executes in two phases: assign a timestamp sread [8], and then execute the transaction’s reads as snapshot reads at sread. The snapshot reads can execute at any replicas that are sufficiently up-to-date. The simple assignment of sread = TT.now().latest, at any time after a transaction starts, preserves external con- sistency by an argument analogous to that presented for writes in Section 4.1.2. However, such a timestamp may require the execution of the data reads at sread to block if tsafe has not advanced sufficiently. (In addition, note that choosing a value of sread may also advance smax to preserve disjointness.) To reduce the chances of block- ing, Spanner should assign the oldest timestamp that pre- serves external consistency. Section 4.2.2 explains how such a timestamp can be chosen. 4.2 Details This section explains some of the practical details of read-write transactions and read-only transactions elided earlier, as well as the implementation of a special trans- action type used to implement atomic schema changes. 4.1.3 Serving Reads at a Timestamp server i,g for every replica in a group g, over all transactions Ti pre- scribed in Section 4.2.1. Proof: s1 < tabs(ecommit) 1 tabs(ecommit) < tabs(estart) 12 tabs(estart) ≤ tabs(eserver) 22 tabs(eserver) ≤ s2 2 s1 < s2 (commit wait) (assumption) (causality) (start) (transitivity) abs1 abs2 12 The protocol for executing transactions and assigning timestamps obeys two rules, which together guarantee this invariant, as shown below. Define the arrival event of the commit request at the coordinator leader for a Start The coordinator leader for a write Ti assigns a commit timestamp si no less than the value of TT.now().latest, computed after eserver. Note that the i participant leaders do not matter here; Section 4.2.1 de- scribes how they are involved in the implementation of the next rule. Commit Wait The coordinator leader ensures that clients cannot see any data committed by Ti until TT.after(si) is true. Commit wait ensures that si is less than the absolute commit time of Ti, or si < t (ecommit). The implementation of commit wait is de- abs i leader ensures that the transaction’s commit timestamp si >= sprepare over all participant groups g. Therefore, The monotonicity invariant described in Section 4.1.2 al- lows Spanner to correctly determine whether a replica’s state is sufficiently up-to-date to satisfy a read. Every replica tracks a value called safe time tsafe which is the Published in the Proceedings of OSDI 2012 7

PDF Image | Google Globally-Distributed Database

PDF Search Title:

Google Globally-Distributed Database

Original File Name Searched:

spanner-osdi2012.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)