Google Globally-Distributed Database

PDF Publication Title:

Google Globally-Distributed Database ( google-globally-distributed-database )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 009

latency (ms) throughput (Kops/sec) replicas write read-only transaction snapshot read write read-only transaction snapshot read 1D 9.4±.6 — — 4.0±.3 — — 1 3 5 14.4±1.0 13.9±.6 14.4±.4 1.4±.1 1.3±.1 1.4±.05 1.3±.1 1.2±.1 1.3±.04 4.1±.05 2.2±.5 2.8±.3 10.9±.4 13.8±3.2 25.3±5.2 13.5±.1 38.5±.3 50.0±1.1 Table 3: Operation microbenchmarks. Mean and standard deviation over 10 runs. 1D means one replica with commit wait disabled. 4.2.4 Refinements tTM as defined above has a weakness, in that a single safe prepared transaction prevents tsafe from advancing. As a result, no reads can occur at later timestamps, even if the reads do not conflict with the transaction. Such false conflicts can be removed by augmenting tTM with safe a fine-grained mapping from key ranges to prepared- transaction timestamps. This information can be stored in the lock table, which already maps key ranges to lock metadata. When a read arrives, it only needs to be checked against the fine-grained safe time for key ranges with which the read conflicts. LastTS() as defined above has a similar weakness: if a transaction has just committed, a non-conflicting read- only transaction must still be assigned sread so as to fol- low that transaction. As a result, the execution of the read could be delayed. This weakness can be remedied sim- ilarly by augmenting LastTS() with a fine-grained map- ping from key ranges to commit timestamps in the lock table. (We have not yet implemented this optimization.) When a read-only transaction arrives, its timestamp can be assigned by taking the maximum value of LastTS() for the key ranges with which the transaction conflicts, unless there is a conflicting prepared transaction (which can be determined from fine-grained safe time). tPaxos as defined above has a weakness in that it cannot safe advance in the absence of Paxos writes. That is, a snap- shot read at t cannot execute at Paxos groups whose last write happened before t. Spanner addresses this problem by taking advantage of the disjointness of leader-lease intervals. Each Paxos leader advances tPaxos by keeping safe a threshold above which future writes’ timestamps will occur: it maintains a mapping MinNextTS(n) from Paxos sequence number n to the minimum timestamp that may be assigned to Paxos sequence number n + 1. A replica can advance tPaxos to MinNextTS(n) − 1 when it has ap- safe plied through n. A single leader can enforce its MinNextTS() promises easily. Because the timestamps promised by MinNextTS() lie within a leader’s lease, the disjoint- ness invariant enforces MinNextTS() promises across leaders. If a leader wishes to advance MinNextTS() beyond the end of its leader lease, it must first extend its lease. Note that smax is always advanced to the highest value in MinNextTS() to preserve disjointness. A leader by default advances MinNextTS() values ev- ery 8 seconds. Thus, in the absence of prepared trans- actions, healthy slaves in an idle Paxos group can serve reads at timestamps greater than 8 seconds old in the worst case. A leader may also advance MinNextTS() val- ues on demand from slaves. 5 Evaluation We first measure Spanner’s performance with respect to replication, transactions, and availability. We then pro- vide some data on TrueTime behavior, and a case study of our first client, F1. 5.1 Microbenchmarks Table 3 presents some microbenchmarks for Spanner. These measurements were taken on timeshared ma- chines: each spanserver ran on scheduling units of 4GB RAM and 4 cores (AMD Barcelona 2200MHz). Clients were run on separate machines. Each zone contained one spanserver. Clients and zones were placed in a set of dat- acenters with network distance of less than 1ms. (Such a layout should be commonplace: most applications do not need to distribute all of their data worldwide.) The test database was created with 50 Paxos groups with 2500 di- rectories. Operations were standalone reads and writes of 4KB. All reads were served out of memory after a com- paction, so that we are only measuring the overhead of Spanner’s call stack. In addition, one unmeasured round of reads was done first to warm any location caches. For the latency experiments, clients issued sufficiently few operations so as to avoid queuing at the servers. From the 1-replica experiments, commit wait is about 5ms, and Paxos latency is about 9ms. As the number of replicas increases, the latency stays roughly constant with less standard deviation because Paxos executes in parallel at a group’s replicas. As the number of replicas increases, the latency to achieve a quorum becomes less sensitive to slowness at one slave replica. For the throughput experiments, clients issued suffi- ciently many operations so as to saturate the servers’ Published in the Proceedings of OSDI 2012 9

PDF Image | Google Globally-Distributed Database

PDF Search Title:

Google Globally-Distributed Database

Original File Name Searched:

spanner-osdi2012.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)