Part one of our series of excerpts from the Wrox Press Title, 'Expert One on One: Oracle' covers developing successful Oracle applications and the Black Box approach.
Tom Kyte has a simple philosophy: you can treat
Oracle as a black
box and just stick data into it or you can understand how it
works and exploit it as a powerful computing environment. If you
choose the latter course, then you will find that there are few
information management problems that you cannot solve quickly and
Tom has selected the most important features and techniques and
he teaches them in a proof-by-example manner, not only discussing
the features available, but also how to implement software using
them and indicating the potential pitfalls.
What does this book cover?
- Core database structures and utilities
- Performance tuning
- Advanced SQL features
- Extensibility using C, Java and OR features
- Important supplied packages
Who is this book for?
This book offers the knowledge required for both those who build
Oracle database applications, and those who administer the
It is critical that the DBA knows what the developers are trying
to accomplish and the best developers know how to exploit the
DBA's data management strategies. Armed with the fundamental
knowledge of the Oracle environment that Tom describes, you will
be able to build bigger, faster, and more scalable applications.
Knowledge of SQL and PL/SQL is assumed and familiarity with any
3GL language (such as C, Java,
or Visual Basic) would be useful.
Wrox provides on-line discussion for the above title and
surrounding technologies on P2P.
I spend the bulk of my time working with Oracle database software
and, more to the point, with people who use this software. Over
the last twelve years, I've worked on many projects —
successful ones as well as failures, and if I were to encapsulate
my experiences into a few broad statements, they would be:
- An application built around the database — dependent on
the database — will succeed or fail based on how it uses the
- A development team needs at its heart a core of
'database savvy' coders who are responsible for
ensuring the database logic is sound and the system is tuned.
These may seem like surprisingly obvious statements, but in my
experience, I have found that too many people approach the
database as if it were a 'black box' — something
that they don't need to know about. Maybe they have a SQL
generator that will save them from the hardship of having to
learn SQL. Maybe they figure they will just use it like a flat
file and do 'keyed reads'. Whatever they figure, I can
tell you that thinking along these lines is most certainly
misguided; you simply cannot get away with not understanding the
database. This chapter will discuss why you need to know about
the database, specifically why you need to understand:
- The database architecture, how it works, and what it looks
- What concurrency controls are, and what they mean to you.
- How to tune your application from day one.
- How some things are implemented in the database, which is not
necessary the same as how you think they should be implemented.
- What features your database already provides for you and why
it is generally better to use a provided feature then to build
- Why you might want more than a cursory knowledge of SQL.
Now this may seem like a long list of things to learn before you
start, but consider this analogy for a second: if you were
developing a highly scalable, enterprise application on a brand
new operating system (OS), what would be the first thing you
would do? Hopefully, you answered, 'find out how this new OS
works, how things will run on it, and so on'. If you did not, you
Consider, for example, one of the early versions of Windows
(Windows 3.x, say). Now this, like UNIX, was a 'multi-tasking'
operating system. However, it certainly didn't multi-task like
UNIX did — it used a non-preemptive multi-tasking model
(meaning that if the running application didn't give up control,
nothing else could run — including the operating system).
In fact, compared to UNIX, Windows 3.x was not really a multi-
tasking OS at all. Developers had to understand exactly how the
Windows 'multi-tasking' feature was implemented in order to
develop effectively. If you sit down to develop an application
that will run natively on an OS, then understanding that OS is
What is true of applications running natively on operating
systems is true of applications that will run on a database:
understanding that database is crucial to your success. If you do
not understand what your particular database does or how it does
it, your application will fail. If you assume that because your
application ran fine on SQL Server, it will necessarily run fine
on Oracle then, again, your application is likely to fail.
Before we begin, I feel it is only fair that you understand my
approach to development. I tend to take a database-centric
approach to problems. If I can do it in the database, I will.
There are a couple of reasons for this — the first and
foremost being that I know that if I build functionality in the
database, I can deploy it anywhere. I am not aware of a server
operating system on which Oracle is not available — from
Windows to dozens of UNIX systems to the OS/390 mainframe, the
same exact Oracle software and options are available. I
frequently build and test solutions on my laptop, running
Oracle8i on Windows NT. I deploy them on a variety of UNIX
servers running the same database software. When I have to
implement a feature outside of the database, I find it extremely
hard to deploy that feature anywhere I want. One of the main
features that makes Java appealing to many people — the
fact that their programs are always compiled in the same virtual
environment, the Java Virtual Machine (JVM), and so are highly
portable — is the exact same feature that make the database
appealing to me. The database is my Virtual Machine. It is my
'virtual operating system'.
My approach is to do everything I can in the database. If my
requirements go beyond what the database environment can offer, I
do it in Java outside of the database. In this way, almost every
operating system intricacy will be hidden from me. I still have
to understand how my 'virtual machines' work (Oracle, and
occasionally a JVM) — you need to know the tools you are
using — but they, in turn, worry about how best to do
things on a given OS for me.
Thus, simply knowing the intricacies of this one 'virtual OS'
allows you to build applications that will perform and scale well
on many operating systems. I do not intend to imply that you can
be totally ignorant of your underlying OS — just that as a
software developer building database applications you can be
fairly well insulated from it, and you will not have to deal with
many of its nuances. Your DBA, responsible for running the Oracle
software, will be infinitely more in tune with the OS (if he or
she is not, please get a new DBA!). If you develop client-server
software and the bulk of your code is outside of the database and
outside of a VM (Java Virtual Machines perhaps being the most
popular VM), you will have to be concerned about your OS once
I have a pretty simple mantra when it comes to developing
- You should do it in a single SQL statement if at all
- If you cannot do it in a single SQL Statement, then do it in
- If you cannot do it in PL/SQL, try a Java Stored Procedure.
- If you cannot do it in Java, do it in a C external procedure.
- If you cannot do it in a C external routine, you might want
to seriously think about why it is you need to do it...
Throughout this book, you will see the above philosophy
implemented. We'll use PL/SQL and Object Types in PL/SQL to do
things that SQL itself cannot do. PL/SQL has been around for a
very long time, over thirteen years of tuning has gone into it,
and you will find no other language so tightly coupled with SQL,
nor any as optimized to interact with SQL. When PL/SQL runs out
of steam — for example, when we want to access the network,
send e-mails' and so on — we'll use Java. Occasionally,
we'll do something in C, but typically only when C is the only
choice, or when the raw speed offered by C is required. In many
cases today this last reason goes away with native compilation of
Java — the ability to convert your Java bytecode into
operating system specific object code on your platform. This lets
Java run just as fast as C.
I have an idea, borne out by first-hand experience, as to why
database-backed software development efforts so frequently fail.
Let me be clear that I'm including here those projects that may
not be documented as failures, but take much longer to roll out
and deploy than originally planned because of the need to perform
a major 're-write', 're-architecture', or 'tuning' effort.
Personally, I call these delayed projects 'failures': more often
than not they could have been completed on schedule (or even
The single most common reason for failure is a lack of practical
knowledge of the database — a basic lack of understanding
of the fundamental tool that is being used. The 'blackbox'
approach involves a conscious decision to protect the developers
from the database. They are actually encouraged not to learn
anything about it! In many cases, they are prevented from
exploiting it. The reasons for this approach appear to be FUD-
related (Fear, Uncertainty, and Doubt). They have heard that
databases are 'hard', that SQL, transactions and data integrity
are 'hard'. The solution — don't make anyone do anything
'hard'. They treat the database as a black box and have some
software tool generate all of the code. They try to insulate
themselves with many layers of protection so that they do not
have to touch this 'hard' database.
This is an approach to database development that I've never been
able to understand. One of the reasons I have difficulty
understanding this approach is that, for me, learning Java and C
was a lot harder then learning the concepts behind the database.
I'm now pretty good at Java and C but it took a lot more hands-on
experience for me to become competent using them than it did to
become competent using the database. With the database, you need
to be aware of how it works but you don't have to know everything
inside and out. When programming in C or Java, you do need to
know everything inside and out and these are huge languages.
Another reason is that if you are building a database
application, then the most important piece of software is the
database. A successful development team will appreciate this and
will want its people to know about it, to concentrate on it. Many
times I've walked into a project where almost the opposite was
A typical scenario would be as follows:
- The developers were fully trained in the GUI tool or the
language they were using to build the front end (such as Java).
In many cases, they had had weeks if not months of training in
- The team had zero hours of Oracle training and zero hours of
Oracle experience. Most had no database experience whatsoever.
- They had massive performance problems, data integrity
problems, hanging issues and the like (but very pretty screens).
As a result of the inevitable performance problems, I would be
called in to help solve the difficulties. I can recall one
particular occasion when I could not fully remember the syntax of
a new command that we needed to use. I asked for the SQL
Reference manual, and I was handed an Oracle 6.0 document. The
development was taking place on version 7.3, five years after the
release of version.6.0! It was all they had to work with, but
this did not seem to concern them at all. Never mind the fact
that the tool they really needed to know about for tracing and
tuning didn't really exist back then. Never mind the fact that
features such as triggers, stored procedures, and many hundreds
of others, had been added in the five years since the
documentation to which they had access was written. It was very
easy to determine why they needed help— fixing their
problems was another issue all together.
The idea that developers building a database application should
be shielded from the database is amazing to me but still the
attitude persists. Many people still take the attitude that
developers should be shielded from the database, they cannot take
the time to get trained in the database — basically, they
should not have to know anything about the database. Why? Well,
more than once I've heard '... but Oracle is the most scalable
database in the world, my people don't have to learn about it,
it'll just do that'. It is true; Oracle is the most scalable
database in the world. However, I can write bad code that does
not scale in Oracle easier then I can write good, scaleable code
in Oracle. You can replace Oracle with any technology and the
same will be true. This is a fact — it is easier to write
applications that perform poorly than it is to write applications
that perform well. It is sometimes too easy to build a single-
user system in the world's most scalable database if you don't
know what you are doing. The database is a tool and the improper
use of any tool can lead to disaster. Would you take a nutcracker
and smash walnuts with it as if it were a hammer? You could but
it would not be a proper use of that tool and the result would be
a mess. Similar effects can be achieved by remaining ignorant of
I was recently working on a project where the system architects
had designed a very elegant architecture. A web browser client
would talk over HTTP to an application server running Java Server
Pages (JSP). The application logic would be 100 percent generated
by a tool and implemented as EJBs (using container managed
persistence) and would be physically located on another
application server. The database would hold tables and indexes
and nothing else.
So, we start with a technically complex architecture: we have
four entities that must talk to each other in order to get the
job done: web browser to a JSP in the Application Server to an
EJB to the database. It would take technically competent people
to develop, test, tune, and deploy this application. I was asked
to help benchmark this application post-development. The first
thing I wanted to know about was their approach to the database:
- What did they feel would be the major choke points, areas of
- What did they view as the major obstacles to overcome?
They had no idea. When asked, 'OK, when we need to tune a
generated query, who can help me rewrite the code in the EJB?'
The answer was, 'Oh, you cannot tune that code, you have to do it
all in the database'. The application was to remain untouched. At
that point, I was ready to walk away from the project — it
was already clear to me that there was no way this application
- The application was built without a single consideration for
scaling the database level.
- The application itself could not be tuned or touched.
- Experience shows that 80 to 90 percent of all tuning is done
at the application level, not at the database level.
- The developers had no idea what the beans did in the database
or where to look for potential problems.
That was shown to be the case in the first hour of testing. As it
turns out, the first thing the application did was a:
select * from t for update;
What this did was to force a serialization of all work. The model
implemented in the database was such that before any significant
work could proceed, you had to lock an extremely scarce resource.
That immediately turned this application into a very large single
user system. The developers did not believe me (in another
database, employing a shared read lock, the observed behavior was
different). After spending ten minutes with a tool called TKPROF
(you'll hear a lot more about this in Tuning Strategies and
Tools, Chapter 10) I was able to show them that, yes, in fact
this was the SQL executed by the application (they had no idea
— they had never seen the SQL). Not only was it the SQL
executed by the application but by using two SQL*PLUS sessions I
was able to show them that session two will wait for session one
to completely finish its work before proceeding.
So, instead of spending a week benchmarking the application, I
spent the time teaching them about tuning, database locking,
concurrency control mechanisms, how it worked in Oracle versus
Informix versus SQL Server versus DB2 and so on (it is different
in each case). What I had to understand first, though, was the
reason for the
SELECT FOR UPDATE. It turned out the developers
wanted a repeatable read.
Repeatable read is a database term that says if I
read a row once in my transaction and I read the row again later
in the same transaction, the row will not have changed —
the read is repeatable.
Why did they want this? They had heard it was a 'good thing'. OK,
fair enough, you want repeatable read. The way to do that in
Oracle is to set the isolation level to serializable (which not
only gives you a repeatable read for any row of data, it gives
you a repeatable read for a query — if you execute the same
query two times in a transaction, you'll get the same results).
To get a repeatable read in Oracle, you do not want to use SELECT
FOR UPDATE, which you only do when you want to physically
serialize access to data. Unfortunately, the tool they utilized
did not know about that — it was developed primarily for
use with another database where this was the way to get a
So, what we had to do in this case, in order to achieve
serializable transactions, was to create a logon trigger in the
database that altered the session for these applications and set
the isolation level to serializable. We went back to the tool
they were using and turned off the switches for repeatable reads
and re-ran the application. Now, with the FOR UPDATE clause
removed, we got some actual concurrent work done in the database.
That was hardly the end of the problems on this project. We had
to figure out:
- How to tune SQL without changing the SQL (that's hard, we'll
look at some methods in Chapter 11 on Optimizer Plan Stability).
- How to measure performance.
- How to see where the bottlenecks were.
- How and what to index. And so on.
At the end of the week the developers, who had been insulated
from the database, were amazed at what the database could
actually provide for them, how easy it was to get that
information and, most importantly, how big a difference it
could make to the performance of their application. We didn't do
the benchmark that week (they had some reworking to do!) but in
the end they were successful — just behind schedule by a
couple of weeks.
This is not a criticism of tools or technologies like EJBs and
container managed persistence. This is a criticism of purposely
remaining ignorant of the database and how it works and how to
use it. The technologies used in this case worked well —
after the developers got some insight into the database itself.
The bottom line is that the database is typically the cornerstone
of your application. If it does not work well, nothing else
really matters. If you have a black box and it does not work well
— what are you going to do about it? About the only thing
you can do is look at it and wonder why it is not doing so well.
You cannot fix it, you cannot tune it, you quite simply do not
understand how it works — and you made the decision to be
in this position. The alternative is the approach that I
advocate: understand your database, know how it works, know what
it can do for you, and use it to its fullest potential.
How (and how not) to Develop Database Applications
That's enough hypothesizing, for now at least. In the remainder
of this chapter, I will take a more empirical approach,
discussing why knowledge of the database and its workings will
definitely go a long way towards a successful implementation
(without having to write the application twice!). Some problems
are simple to fix as long as you understand how to find them.
Others require drastic rewrites. One of the goals of this book is
to help you avoid the problems in the first place.
In the following sections, I discuss certain core
Oracle features without delving into exactly what these features
are and all of the ramifications of using them. For example, I
discuss just one of the implications of using Multi-Threaded
Server (MTS) architecture— a mode in which you can (and
sometimes have to) configure Oracle in order to support multiple
database connections. I will not, however, go fully into what MTS
is, how it works and so on. Those facts are covered in detail in
the Oracle Server Concepts Manual (with more information to be
found in the Net8 Administrators Guide).