Where Did Oracle Come From?

Wednesday Jul 13th 2005 by Steve Callan

Often times, we take for granted the functions an RDBMS performs. But where, exactly, did the concept of the relational model come from? Read on to learn the origins of the RDBMS model in general and where and how Oracle came into being.

As an Oracle DBA, you know that Oracle is a relational database management system and that it performs many complex functions well. In the pressure to learn how to be a DBA or what it is you should be doing, it is hard to take the time to sit back and appreciate the origins of the RDBMS model in general and where and how Oracle came into being. It is important to have an appreciation of your profession's history and origins. Often times, we take for granted the functions an RDBMS performs. But where, exactly, did the concept of the relational model come from?

"A Relational Model of Data for Large Shared Data Banks"

A paper written by E.F. Codd (yes, that Codd) 35 years ago set the stage for the development of the relational model. Codd's "A Relational Model of Data for Large Shared Data Banks" has been hailed as a landmark paper. Even his own employer, IBM - an industry leader in several areas at the time, including a leading database system - largely ignored the brilliance of what Dr. Codd proposed in 1970. How relational database management systems operate today - including Oracle - can be traced back to what Dr. Codd proposed back then.

The paper consists of two major sections: a narrative describing the shortcomings of the then current models and systems, and a proposal for how a relational model should work. A key distinction between the model types is that of data independence. Current models then were so inherently tied to programs and physical file structures that minor changes could invalidate the entire system. In the relational model, the separation between data and how an application or computer system works is paramount - the further apart the better.

Limitations of the Old Systems

Network and hierarchal-based systems were hobbled by data dependency. This problem arose in three forms: ordering dependence, indexing dependence and access path dependence. Ordering dependency was common to all of the major systems then in use. Specifically, they all failed "to make a clear distinction between order of presentation on the one hand and stored ordering on the other." In practical terms, once entered, there was no way to change the order.

Indexing dependence involves the relationship between applications and indices (or indexes as the word is now used). If an index were removed, would the application still function? Thirty plus years ago, the answer was mostly "no." Even then, the benefits and limitations of indexes were well known. The major problem for some systems concerned the situation where an index was renamed or removed. If an application was hard coded to use a specific index name, and the index was no longer present, would the application still function?

Codd illustrates the third limitation - access path dependence - with a series of five different parts versus project structures, and goes on to point out how any program will fail on at least three of the arrangements. In concrete terms, if a program were tied to a specific structure or location of data, then changes in the structure or location would most likely cause a failure of the program. This dependency illustrates the need for a program to be "blind" to structure and location.

Enter the Relational View of Data

Consider what you take for granted today - "of course the database is supposed to act this way" - and put yourself in Codd's shoes back in 1970. What seems so obvious and practical today required a paradigm shift back then. IBM had invested tremendous amounts of time and capital on its own non-relational problem-ridden system, and here comes a proposal from an employee at an IBM location (San Jose) that was not even involved with programming. Quite a few years went by before the genius of what Codd proposed was successfully implemented.

What hindered and delayed the acceptance of the relational model was the lack of a structured query language that could take advantage of such a model, and obviously, the lack of a working relational model. Similar situations exist today. For example, scientists can tell you what the limitations are with the current rocket-based space program. At the same time, they can tell you what the star ship Enterprise should be capable of, but how to build it and how to make it work are not known at this time. Like Codd back in 1970, we have the theory, but we do not have the technology.

So what does the relational model do for us, and how is it better than what was being used? The first major difference discussed in the paper has to do with moving from using relations to using relationships. Relations depend on a domain ordering (the same concept seen in algebra classes - domain versus range, function versus relation). Relationships are indifferent to the order of data, so it is sufficient to be able to uniquely identify domains (table columns) without regard to order.

The implementation of relationships requires the introduction of primary and foreign keys as a "preferred way." The preferred way Codd introduced is what is also known as normal form (NF). The concept of what has become known as 1NF, 2NF and so on is well known today and Codd alludes to further normalization of the example he uses to illustrate the employee example.

What is another name for SQL? How about "applied predicate calculus?" As Codd notes, "The adoption of a relational model of data ... permits the development of a universal data sub-language based on an applied predicate calculus." The paper does not go into the language in detail, but Codd does discuss its salient details (the ship should go at least warp factor three...).

SQL statements work with sets of data, either all of the possible values, or a subset. We may fetch a set for query purposes, or hold it for possible changes. Once the set is obtained, we should be able to exploit its relations (manipulate the data). Codd recognized back then that a high-level language would be needed to "provide efficient response and throughput shifts from the individual user to the data system."

When writing a SQL statement, the user is performing operations on relations. Three major relational operators familiar to most users are restriction, projection, and joins. Restrictions are employed by using WHERE clauses, which can further be restricted by ANDing or ORing additional conditions. Projections are exemplified during partial selects out of a table (selecting specific domains, or columns as opposed to select *). Joins are frequently used to compare sets of data (one table against another, and even the same table against itself).

In addition to the topics of strong and weak redundancy (maintaining multiple copies of a column across tables for user convenience), the final major characteristic of the relational model covered in the paper deals with consistency. Although not directly mentioned, consistency relates to referential integrity. Whether or not user omission or commission introduces (or at least attempts to introduce) an inconsistency, the system should be able to perform a check prior to insertion, deletion, or update. This is one of the approaches discussed by Codd and is obviously the one modern relational databases use today.

As you read this historic paper, you can practically see the blueprint for Oracle and other modern relational database systems. The ideal system adheres to 12 rules later, well, codified by Dr. Codd in 1985. Looking back on history, it is not surprising that someone - Lawrence Ellison in this case - would fully realize the importance and relevance of Codd's relational model and seize upon the opportunity to build a better system. By then, SQL was in place and all (not to trivialize the amount of work "all" implies) that was needed was an engine to drive the system.

Who was First to Use the Relational Model?

In the early 1970's, IBM's System R "was intended to provide a high-level, nonnavigational, data-independent interface to many users simultaneously, with high integrity and robustness." By the end of the 1970's, IBM had experimented with "a full-function, multi-user version." One of the main results of that effort was the development of a structured query language, or SQL.

At roughly the same time in the 70's, the product known as Ingres was being developed by scientists at UC Berkeley. Funding for Ingres came from a variety of sources, chief among them being various branches of the military. The Ingres project used its own query language (QUEL), but SQL is what has become the standard.

There were some significant differences between System R and Ingres. The source code for System R was kept private, whilst that of Ingres was freely available. System R was meant to be sold; Ingres was freely distributed to the academic community. Additionally, System R, being produced in the commercial arena, was competing with other resources within IBM. Ingres was new and could go any direction as needed.

System R, did, however, create the opportunity for another system to come along. IBM was sitting atop a gold mine and did not realize it. Lawrence Ellison recognized the potential of what System R had to offer, and that gave rise to the company and relational database system named Oracle.

Oracle's Beginning

In 1977, Lawrence Ellison, along with cofounders Bob Miner and Ed Oates, founded a company named Software Development Laboratories. In 1979, as the renamed Relational Software Inc., the first commercial version was released, running on the Digital PDP-11. Interestingly, the first released version was Version 2 - the "2" conveying the idea that all the bugs had been worked out since Version 1, of which, there was no such version. In 1983, the company was renamed to Oracle (it was a name of a project Ellison had worked on for the CIA). Today, Oracle is the largest relational database system vendor and the world's second largest independent software company.

Originally designed to run on the VMS/VAX computer system, Oracle today can be found on every version of UNIX, most versions of Windows, and a slew of other platforms. In fact, Oracle currently lists support for 77 platforms under its "Certify & Availability" link on its support Web site (MetaLink). The Oracle Web site shows a timeline of significant events and accomplishments (http://www.oracle.com/corporate/history.html).

Starting out as a pure database company, the company's chief strength today is still in the RDBMS market. Oracle also has a successful line of products used for development, presentation and reporting (Forms and Reports); Web services (Application Server); and enterprise management software (Oracle Applications, recently augmented by its acquisition of PeopleSoft).

For an interesting perspective of Oracle's beginnings, read The Difference Between God and Larry Ellison: Inside Oracle Corporation: God Doesn't Think He's Larry Ellison. The book paints a somewhat less than flattering view of Mr. Ellison as a person, but as a shark in corporate waters, he is a very efficient and successful killing machine. How many times can a company get away with promising functionality that does not exist yet, or inflate/manipulate sales figures without being caught by stockholders or regulatory agencies? If the product is that good, it appears the answer to the first question has no limit. The answer to the second question - well, that practice did catch up with Oracle and it nearly sunk the company.

Perhaps the greatest selling strength Oracle possesses today is its support for a wide range of platforms. SQL Server is very good in many ways, and is relatively inexpensive when compared to Oracle. Unfortunately, for SQL Server, it is now and forever tied to the Windows operating system.

In Closing

Virtually all great inventions start with an idea. The music industry's CD-ROM format, for example, started because a music aficionado wanted a better way to hear and preserve what was on vinyl records (do you even know the inventor's name?). Years and years went by before that idea caught on, but when it did, it signaled the demise of the cassette tape.

What are those key moments in time and how can you (or will you) recognize them when something brilliant is proposed? Does the implementation of your idea sit on the shelf for a decade? Does your employer take steps to downplay the significance of your proposal? And how hard would you be willing to work to sell the idea? Your use of Oracle today is due in large part to Dr. Codd's persistence and Lawrence Ellison's vision. What is the next (or current) unappreciated model or IT-related invention waiting to be discovered?

Suggested Readings

Listed below are some references used in this article. If you haven't explored the history of Oracle, these represent a good starting point to begin your investigation.

Codd, E.F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM 13.6 (1970): 377-387. 20 May 2005 <http://doi.acm.org/10.1145/362384.362685>

United States. National Academy of Sciences. Funding a Revolution: Government Support for Computing Research. Washington: National Academy Press, 1999. 20 May 2005 <http://www.nap.edu/readingroom/books/far/ch6.html>

Wilson, Mike. The Difference Between God and Larry Ellison: Inside Oracle Corporation: God Doesn't Think He's Larry Ellison. HarperCollins: New York, 1998.

» See All Articles by Columnist Steve Callan

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved