Google Globally-Distributed Database

PDF Publication Title:

Google Globally-Distributed Database ( google-globally-distributed-database )

Previous Page View | Next Page View | Return to Search List

Text from PDF Page: 005

CREATE TABLE Users { uid INT64 NOT NULL, email STRING } PRIMARY KEY (uid), DIRECTORY; CREATE TABLE Albums { uid INT64 NOT NULL, aid INT64 NOT NULL, name STRING } PRIMARY KEY (uid, aid), INTERLEAVE IN PARENT Users ON DELETE CASCADE; Figure 4: Example Spanner schema for photo metadata, and theinterleavingimpliedbyINTERLEAVE IN. example, Albums(2,1) represents the row from the Albums table for user id 2, album id 1. This interleaving of tables to form directories is significant because it allows clients to describe the locality relation- ships that exist between multiple tables, which is nec- essary for good performance in a sharded, distributed database. Without it, Spanner would not know the most important locality relationships. Denote the absolute time of an event e by the func- tion tabs(e). In more formal terms, TrueTime guaran- tees that for an invocation tt = TT.now(), tt.earliest ≤ tabs(enow) ≤ tt.latest, where enow is the invocation event. The underlying time references used by TrueTime are GPS and atomic clocks. TrueTime uses two forms of time reference because they have different failure modes. GPS reference-source vulnerabilities include an- tenna and receiver failures, local radio interference, cor- related failures (e.g., design faults such as incorrect leap- second handling and spoofing), and GPS system outages. Atomic clocks can fail in ways uncorrelated to GPS and each other, and over long periods of time can drift signif- icantly due to frequency error. TrueTime is implemented by a set of time master ma- chines per datacenter and a timeslave daemon per ma- chine. The majority of masters have GPS receivers with dedicated antennas; these masters are separated physi- cally to reduce the effects of antenna failures, radio in- terference, and spoofing. The remaining masters (which we refer to as Armageddon masters) are equipped with atomic clocks. An atomic clock is not that expensive: the cost of an Armageddon master is of the same order as that of a GPS master. All masters’ time references are regularly compared against each other. Each mas- ter also cross-checks the rate at which its reference ad- vances time against its own local clock, and evicts itself if there is substantial divergence. Between synchroniza- tions, Armageddon masters advertise a slowly increasing time uncertainty that is derived from conservatively ap- plied worst-case clock drift. GPS masters advertise un- certainty that is typically close to zero. Every daemon polls a variety of masters [29] to re- duce vulnerability to errors from any one master. Some are GPS masters chosen from nearby datacenters; the rest are GPS masters from farther datacenters, as well as some Armageddon masters. Daemons apply a variant of Marzullo’s algorithm [27] to detect and reject liars, and synchronize the local machine clocks to the non- liars. To protect against broken local clocks, machines that exhibit frequency excursions larger than the worst- case bound derived from component specifications and operating environment are evicted. Between synchronizations, a daemon advertises a slowly increasing time uncertainty. ε is derived from conservatively applied worst-case local clock drift. ε also depends on time-master uncertainty and communication delay to the time masters. In our production environ- ment, ε is typically a sawtooth function of time, varying from about 1 to 7 ms over each poll interval. ε is there- fore 4 ms most of the time. The daemon’s poll interval is currently 30 seconds, and the current applied drift rate is set at 200 microseconds/second, which together account 3 TrueTime Method TT.now() TT.after(t) TT.before(t) Returns TTinterval: [earliest, latest] true if t has definitely passed true if t has definitely not arrived Table 1: TrueTime API. The argument t is of type TTstamp. This section describes the TrueTime API and sketches its implementation. We leave most of the details for an- other paper: our goal is to demonstrate the power of having such an API. Table 1 lists the methods of the API. TrueTime explicitly represents time as a TTinterval, which is an interval with bounded time uncertainty (un- like standard time interfaces that give clients no notion of uncertainty). The endpoints of a TTinterval are of type TTstamp. The TT.now() method returns a TTinterval that is guaranteed to contain the absolute time during which TT.now() was invoked. The time epoch is anal- ogous to UNIX time with leap-second smearing. De- fine the instantaneous error bound as ε, which is half of the interval’s width, and the average error bound as ε. The TT.after() and TT.before() methods are convenience wrappers around TT.now(). Published in the Proceedings of OSDI 2012 5

PDF Image | Google Globally-Distributed Database

PDF Search Title:

Google Globally-Distributed Database

Original File Name Searched:

spanner-osdi2012.pdf

DIY PDF Search: Google It | Yahoo | Bing

Cruise Ship Reviews | Luxury Resort | Jet | Yacht | and Travel Tech More Info

Cruising Review Topics and Articles More Info

Software based on Filemaker for the travel industry More Info

The Burgenstock Resort: Reviews on CruisingReview website... More Info

Resort Reviews: World Class resorts... More Info

The Riffelalp Resort: Reviews on CruisingReview website... More Info

CONTACT TEL: 608-238-6001 Email: greg@cruisingreview.com (Standard Web Page)