One of Oracles most critical and important functions is how it keeps track of events. Even within a single processor, simple database system, keeping track of events is just as important as recording and maintaining the proper order of events in a distributed system. Recording the when in when did event X take place? can be problematic, so a process that is complete (in the mathematical sense) is essential. In terms of an algorithm, does an algorithm result in the same/consistent answer each and every time the same circumstances or conditions exist?
Lets take an astrological event as an example to help frame what when means. Suppose you were born in 1970. Twenty years later you see a star go supernova. In one sense or frame of reference, the event occurred in 1990. In another frame of reference, the event, given that the star was 20 light-years away, took place the same year our notional observer was born. Depending on the context, the event time stamps of 1970 and 1990 are equally valid.
In an example closer to home, how would you keep track of when database events (both data manipulation and definition) take place? Within a subset of the database, using your schema and transactions as an example, keeping track of your ordered set of operations is simpler to maintain than when comparing or ordering your transactions with respect to someone elses. Your ordered set is only partially ordered. Clearly, the ordering of events is essential for many reasons, perhaps the most important being for recovery. Did your commit take place before mine, or did mine occur before yours but was recorded later?
How do you order events?
If you were to describe the situation of recording a series of ordered events, perhaps a title of Time, Clocks and the Ordering of Events in a Distributed System would aptly describe the situation. One way to keep track of events is to record their timestamps, but this can introduce another factor, that of maintaining the accuracy and synchronization of distributed clocks. But what type of clock are we referring to? No doubt, the clock you have in mind is a physical clock.
What if the clock were logical instead of physical? This is exactly the point raised by Dr. Leslie Lamport in his 1978 paper titled Time, Clocks and the Ordering of Events in a Distributed System. In such a system, a clock condition can be described as:
For any events a, b, if a occurs before b, then C(a) < C(b).
The clock time of event a, however the clock is defined, is before that of event b. In a computing system with multiple processes, the clock condition is met when the two conditions stated below hold.
C1. If a and b are events in process Pi, and a comes before b, the Ci(a) < Ci(b).
C2. If a is the sending of a message by process Pi and b is the receipt of that message by process Pj, then Ci(a) < Cj(b).
Now that we have some conditions, the next item to consider is their implementation. One rule (applies chiefly to condition one) is that process Pi increments Ci between any two successive events. A second rule implements the sending of a timestamp along with a message. Upon receipt of a message from another process, the receiving process sets its time to be greater than or equal to its present value and greater than the timestamp of the message.
Dr. Lamports paper presents an algorithm which orders events totally. The context of totally is across the entire system, not just partially ordered within one process out of many. If our system of interest is a database with multiple users, all of whom are vying for the same resource, then the algorithm must satisfy three conditions.
A process which has been granted the resource must release it before it can be granted to another process.
Different requests for the resource must be granted in the order in which they are made.
If every process which is granted the resource eventually releases it, then every request is eventually granted.
The algorithms five steps can be summarized as follows.
To request a resource, a process sends a time stamped message to all other processes.
A receiving process places the request resource message in its own request queue and sends a time stamped acknowledgement back to the sender.
To release a resource, the owning process sends a similar time stamped message (releases resource) to all other processes.
Upon receipt of a release resource message, the request resource message is removed from the request queue.
A requesting process is granted the resource when its time stamped request is ordered before any other requests and when it has received an acknowledgement message time stamped later than the initial request.
The relevance of Lamports algorithm to Oracle
By now the time stamping of messages, use of clocks, and ordering of events must sound very much like a stamp that defines a committed version of a database at a point in time. In other words, the implementation of Dr. Lamports algorithm within Oracle is what we know as an SCN generation scheme. A search for Lamport SCN generation in Oracles documentation will show several references (primarily in RAC) to this scheme, but none of them really defines what the scheme is or how it works.
The astute reader may have noticed I did not say this is the scheme (as in the one and only scheme). Oracle uses several schemes in addition to the Lamport SCN generation scheme. Peek inside your alert log and you will see something along the lines of what is shown below.
broadcast on commit scheme to generate SCNs
Picked latch-free SCN scheme 2
Picked latch-free SCN scheme 3
MetaLink Note 260304.1 defines what broadcast on commit is or does: At every commit, the most recent local SCN will be broadcasted. Read consistency is fully guaranteed for any kind of workload, since every change in the local SCN is immediately propagated. Aside from this description, there is a general lack of public information about how these schemes work.
In addition to the alert log, a hidden parameter will also reveal the current SCN generation scheme. Is the generation scheme modifiable? A query like the one below shows the answer.
SELECT X.KSPPINM NAME, DECODE(BITAND(KSPPIFLG/256, 1), 1, 'TRUE', 'FALSE') SESMOD, DECODE( BITAND(KSPPIFLG/65536, 3), 1, 'IMMEDIATE', 2, 'DEFERRED', 3, 'IMMEDIATE', 'FALSE' ) SYSMOD, KSPPDESC DESCRIPTION FROM SYS.X_$KSPPI X WHERE X.INST_ID = USERENV('INSTANCE') AND TRANSLATE(KSPPINM,'_','#') LIKE '#%' AND X.KSPPINM like '%scn_scheme%'; NAME SESMOD SYSMOD DESCRIPTION ------------ ------- ---------- ------------ _scn_scheme FALSE FALSE SCN scheme
This query results in no rows returned if using 10g Release 2 (the parameter is not present), which goes to the familiar bromide of dont count on being able to use hidden parameters.
The time stamping and ordering of events within Oracle is of prime importance and goes to the heart of how transaction control and serialization are even possible. The total ordering of disparate events, whether within a simple system or a distributed one, is not just essential, but mandatory. Take the date of Dr. Lamports paper into consideration and then recall when Oracles version 2 was released (no version 1, long marketing spin story documented elsewhere). The tie-in is not hard to miss. There was some very clever thought that went into the design of Oracle, and at the same time, some very clever adaptation of other peoples work, the algorithm stated in Dr. Lamports paper being one such example.
Lamport, Leslie, Time, Clocks
and the Ordering of Events in a Distributed System.
Communications of the ACM 21, 7 (July 1978), 558-565. Reprinted in several collections, including Distributed Computing: Concepts and Implementations, McEntire et al., ed. IEEE Press, 1984.
See www.lamport.org for other related articles.