Expert One-on-One: Oracle: Pt. 1

Monday Sep 10th 2001 by DatabaseJournal.com Staff

Part one of our series of excerpts from the Wrox Press Title, 'Expert One on One: Oracle' covers developing successful Oracle applications and the Black Box approach.

Book overview

Tom Kyte has a simple philosophy: you can treat Oracle as a black box and just stick data into it or you can understand how it works and exploit it as a powerful computing environment. If you choose the latter course, then you will find that there are few information management problems that you cannot solve quickly and elegantly.

Tom has selected the most important features and techniques and he teaches them in a proof-by-example manner, not only discussing the features available, but also how to implement software using them and indicating the potential pitfalls.

What does this book cover?

  • Core database structures and utilities
  • Performance tuning
  • Advanced SQL features
  • Extensibility using C, Java and OR features
  • Security
  • Important supplied packages

Who is this book for?

This book offers the knowledge required for both those who build Oracle database applications, and those who administer the database.

It is critical that the DBA knows what the developers are trying to accomplish and the best developers know how to exploit the DBA's data management strategies. Armed with the fundamental knowledge of the Oracle environment that Tom describes, you will be able to build bigger, faster, and more scalable applications.

Knowledge of SQL and PL/SQL is assumed and familiarity with any 3GL language (such as C, Java, or Visual Basic) would be useful.

Wrox provides on-line discussion for the above title and surrounding technologies on P2P.

Title: Expert One on One: Oracle
ISBN: 1861004826
US Price: $ 59.99
Canadian Price: C$ 89.95
UK Price: # 47.99
Publication Date: June 2001
Pages: 1250
Wrox Press Limited, US and UK.

I spend the bulk of my time working with Oracle database software and, more to the point, with people who use this software. Over the last twelve years, I've worked on many projects — successful ones as well as failures, and if I were to encapsulate my experiences into a few broad statements, they would be:

  • An application built around the database — dependent on the database — will succeed or fail based on how it uses the database.
  • A development team needs at its heart a core of 'database savvy' coders who are responsible for ensuring the database logic is sound and the system is tuned.

These may seem like surprisingly obvious statements, but in my experience, I have found that too many people approach the database as if it were a 'black box' — something that they don't need to know about. Maybe they have a SQL generator that will save them from the hardship of having to learn SQL. Maybe they figure they will just use it like a flat file and do 'keyed reads'. Whatever they figure, I can tell you that thinking along these lines is most certainly misguided; you simply cannot get away with not understanding the database. This chapter will discuss why you need to know about the database, specifically why you need to understand:

  • The database architecture, how it works, and what it looks like.
  • What concurrency controls are, and what they mean to you.
  • How to tune your application from day one.
  • How some things are implemented in the database, which is not necessary the same as how you think they should be implemented.
  • What features your database already provides for you and why it is generally better to use a provided feature then to build your own.
  • Why you might want more than a cursory knowledge of SQL.

Now this may seem like a long list of things to learn before you start, but consider this analogy for a second: if you were developing a highly scalable, enterprise application on a brand new operating system (OS), what would be the first thing you would do? Hopefully, you answered, 'find out how this new OS works, how things will run on it, and so on'. If you did not, you would fail.

Consider, for example, one of the early versions of Windows (Windows 3.x, say). Now this, like UNIX, was a 'multi-tasking' operating system. However, it certainly didn't multi-task like UNIX did — it used a non-preemptive multi-tasking model (meaning that if the running application didn't give up control, nothing else could run — including the operating system). In fact, compared to UNIX, Windows 3.x was not really a multi- tasking OS at all. Developers had to understand exactly how the Windows 'multi-tasking' feature was implemented in order to develop effectively. If you sit down to develop an application that will run natively on an OS, then understanding that OS is very important.

What is true of applications running natively on operating systems is true of applications that will run on a database: understanding that database is crucial to your success. If you do not understand what your particular database does or how it does it, your application will fail. If you assume that because your application ran fine on SQL Server, it will necessarily run fine on Oracle then, again, your application is likely to fail.

My Approach

Before we begin, I feel it is only fair that you understand my approach to development. I tend to take a database-centric approach to problems. If I can do it in the database, I will. There are a couple of reasons for this — the first and foremost being that I know that if I build functionality in the database, I can deploy it anywhere. I am not aware of a server operating system on which Oracle is not available — from Windows to dozens of UNIX systems to the OS/390 mainframe, the same exact Oracle software and options are available. I frequently build and test solutions on my laptop, running Oracle8i on Windows NT. I deploy them on a variety of UNIX servers running the same database software. When I have to implement a feature outside of the database, I find it extremely hard to deploy that feature anywhere I want. One of the main features that makes Java appealing to many people — the fact that their programs are always compiled in the same virtual environment, the Java Virtual Machine (JVM), and so are highly portable — is the exact same feature that make the database appealing to me. The database is my Virtual Machine. It is my 'virtual operating system'.

My approach is to do everything I can in the database. If my requirements go beyond what the database environment can offer, I do it in Java outside of the database. In this way, almost every operating system intricacy will be hidden from me. I still have to understand how my 'virtual machines' work (Oracle, and occasionally a JVM) — you need to know the tools you are using — but they, in turn, worry about how best to do things on a given OS for me.

Thus, simply knowing the intricacies of this one 'virtual OS' allows you to build applications that will perform and scale well on many operating systems. I do not intend to imply that you can be totally ignorant of your underlying OS — just that as a software developer building database applications you can be fairly well insulated from it, and you will not have to deal with many of its nuances. Your DBA, responsible for running the Oracle software, will be infinitely more in tune with the OS (if he or she is not, please get a new DBA!). If you develop client-server software and the bulk of your code is outside of the database and outside of a VM (Java Virtual Machines perhaps being the most popular VM), you will have to be concerned about your OS once again.

I have a pretty simple mantra when it comes to developing database software:

  • You should do it in a single SQL statement if at all possible.
  • If you cannot do it in a single SQL Statement, then do it in PL/SQL.
  • If you cannot do it in PL/SQL, try a Java Stored Procedure.
  • If you cannot do it in Java, do it in a C external procedure.
  • If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it...

Throughout this book, you will see the above philosophy implemented. We'll use PL/SQL and Object Types in PL/SQL to do things that SQL itself cannot do. PL/SQL has been around for a very long time, over thirteen years of tuning has gone into it, and you will find no other language so tightly coupled with SQL, nor any as optimized to interact with SQL. When PL/SQL runs out of steam — for example, when we want to access the network, send e-mails' and so on — we'll use Java. Occasionally, we'll do something in C, but typically only when C is the only choice, or when the raw speed offered by C is required. In many cases today this last reason goes away with native compilation of Java — the ability to convert your Java bytecode into operating system specific object code on your platform. This lets Java run just as fast as C.

I have an idea, borne out by first-hand experience, as to why database-backed software development efforts so frequently fail. Let me be clear that I'm including here those projects that may not be documented as failures, but take much longer to roll out and deploy than originally planned because of the need to perform a major 're-write', 're-architecture', or  'tuning' effort. Personally, I call these delayed projects 'failures': more often than not they could have been completed on schedule (or even faster).

The single most common reason for failure is a lack of practical knowledge of the database — a basic lack of understanding of the fundamental tool that is being used. The 'blackbox' approach involves a conscious decision to protect the developers from the database. They are actually encouraged not to learn anything about it! In many cases, they are prevented from exploiting it. The reasons for this approach appear to be FUD- related (Fear, Uncertainty, and Doubt). They have heard that databases are 'hard', that SQL, transactions and data integrity are 'hard'. The solution — don't make anyone do anything 'hard'. They treat the database as a black box and have some software tool generate all of the code. They try to insulate themselves with many layers of protection so that they do not have to touch this 'hard' database.

This is an approach to database development that I've never been able to understand. One of the reasons I have difficulty understanding this approach is that, for me, learning Java and C was a lot harder then learning the concepts behind the database. I'm now pretty good at Java and C but it took a lot more hands-on experience for me to become competent using them than it did to become competent using the database. With the database, you need to be aware of how it works but you don't have to know everything inside and out. When programming in C or Java, you do need to know everything inside and out and these are huge languages.

Another reason is that if you are building a database application, then the most important piece of software is the database. A successful development team will appreciate this and will want its people to know about it, to concentrate on it. Many times I've walked into a project where almost the opposite was true.

A typical scenario would be as follows:

  • The developers were fully trained in the GUI tool or the language they were using to build the front end (such as Java). In many cases, they had had weeks if not months of training in it.
  • The team had zero hours of Oracle training and zero hours of Oracle experience. Most had no database experience whatsoever.
  • They had massive performance problems, data integrity problems, hanging issues and the like (but very pretty screens).

As a result of the inevitable performance problems, I would be called in to help solve the difficulties. I can recall one particular occasion when I could not fully remember the syntax of a new command that we needed to use. I asked for the SQL Reference manual, and I was handed an Oracle 6.0 document. The development was taking place on version 7.3, five years after the release of version.6.0! It was all they had to work with, but this did not seem to concern them at all. Never mind the fact that the tool they really needed to know about for tracing and tuning didn't really exist back then. Never mind the fact that features such as triggers, stored procedures, and many hundreds of others, had been added in the five years since the documentation to which they had access was written. It was very easy to determine why they needed help— fixing their problems was another issue all together.

The idea that developers building a database application should be shielded from the database is amazing to me but still the attitude persists. Many people still take the attitude that developers should be shielded from the database, they cannot take the time to get trained in the database — basically, they should not have to know anything about the database. Why? Well, more than once I've heard '... but Oracle is the most scalable database in the world, my people don't have to learn about it, it'll just do that'. It is true; Oracle is the most scalable database in the world. However, I can write bad code that does not scale in Oracle easier then I can write good, scaleable code in Oracle. You can replace Oracle with any technology and the same will be true. This is a fact — it is easier to write applications that perform poorly than it is to write applications that perform well. It is sometimes too easy to build a single- user system in the world's most scalable database if you don't know what you are doing. The database is a tool and the improper use of any tool can lead to disaster. Would you take a nutcracker and smash walnuts with it as if it were a hammer? You could but it would not be a proper use of that tool and the result would be a mess. Similar effects can be achieved by remaining ignorant of your database.

I was recently working on a project where the system architects had designed a very elegant architecture. A web browser client would talk over HTTP to an application server running Java Server Pages (JSP). The application logic would be 100 percent generated by a tool and implemented as EJBs (using container managed persistence) and would be physically located on another application server. The database would hold tables and indexes and nothing else.

So, we start with a technically complex architecture: we have four entities that must talk to each other in order to get the job done: web browser to a JSP in the Application Server to an EJB to the database. It would take technically competent people to develop, test, tune, and deploy this application. I was asked to help benchmark this application post-development. The first thing I wanted to know about was their approach to the database:

  • What did they feel would be the major choke points, areas of contention?
  • What did they view as the major obstacles to overcome?

They had no idea. When asked, 'OK, when we need to tune a generated query, who can help me rewrite the code in the EJB?' The answer was, 'Oh, you cannot tune that code, you have to do it all in the database'. The application was to remain untouched. At that point, I was ready to walk away from the project — it was already clear to me that there was no way this application would work:

  • The application was built without a single consideration for scaling the database level.
  • The application itself could not be tuned or touched.
  • Experience shows that 80 to 90 percent of all tuning is done at the application level, not at the database level.
  • The developers had no idea what the beans did in the database or where to look for potential problems.

That was shown to be the case in the first hour of testing. As it turns out, the first thing the application did was a:

select * from t for update;

What this did was to force a serialization of all work. The model implemented in the database was such that before any significant work could proceed, you had to lock an extremely scarce resource. That immediately turned this application into a very large single user system. The developers did not believe me (in another database, employing a shared read lock, the observed behavior was different). After spending ten minutes with a tool called TKPROF (you'll hear a lot more about this in Tuning Strategies and Tools, Chapter 10) I was able to show them that, yes, in fact this was the SQL executed by the application (they had no idea — they had never seen the SQL). Not only was it the SQL executed by the application but by using two SQL*PLUS sessions I was able to show them that session two will wait for session one to completely finish its work before proceeding.

So, instead of spending a week benchmarking the application, I spent the time teaching them about tuning, database locking, concurrency control mechanisms, how it worked in Oracle versus Informix versus SQL Server versus DB2 and so on (it is different in each case). What I had to understand first, though, was the reason for the SELECT FOR UPDATE. It turned out the developers wanted a repeatable read.

Repeatable read is a database term that says if I read a row once in my transaction and I read the row again later in the same transaction, the row will not have changed — the read is repeatable.

Why did they want this? They had heard it was a 'good thing'. OK, fair enough, you want repeatable read. The way to do that in Oracle is to set the isolation level to serializable (which not only gives you a repeatable read for any row of data, it gives you a repeatable read for a query — if you execute the same query two times in a transaction, you'll get the same results). To get a repeatable read in Oracle, you do not want to use SELECT FOR UPDATE, which you only do when you want to physically serialize access to data. Unfortunately, the tool they utilized did not know about that — it was developed primarily for use with another database where this was the way to get a repeatable read.

So, what we had to do in this case, in order to achieve serializable transactions, was to create a logon trigger in the database that altered the session for these applications and set the isolation level to serializable. We went back to the tool they were using and turned off the switches for repeatable reads and re-ran the application. Now, with the FOR UPDATE clause removed, we got some actual concurrent work done in the database.

That was hardly the end of the problems on this project. We had to figure out:

  • How to tune SQL without changing the SQL (that's hard, we'll look at some methods in Chapter 11 on Optimizer Plan Stability).
  • How to measure performance.
  • How to see where the bottlenecks were.
  • How and what to index. And so on.

At the end of the week the developers, who had been insulated from the database, were amazed at what the database could actually provide for them, how easy it was to get that information and, most importantly, how big  a difference it could make to the performance of their application. We didn't do the benchmark that week (they had some reworking to do!) but in the end they were successful — just behind schedule by a couple of weeks.

This is not a criticism of tools or technologies like EJBs and container managed persistence. This is a criticism of purposely remaining ignorant of the database and how it works and how to use it. The technologies used in this case worked well — after the developers got some insight into the database itself.

The bottom line is that the database is typically the cornerstone of your application. If it does not work well, nothing else really matters. If you have a black box and it does not work well — what are you going to do about it? About the only thing you can do is look at it and wonder why it is not doing so well. You cannot fix it, you cannot tune it, you quite simply do not understand how it works — and you made the decision to be in this position. The alternative is the approach that I advocate: understand your database, know how it works, know what it can do for you, and use it to its fullest potential.

How (and how not) to Develop Database Applications

That's enough hypothesizing, for now at least. In the remainder of this chapter, I will take a more empirical approach, discussing why knowledge of the database and its workings will definitely go a long way towards a successful implementation (without having to write the application twice!). Some problems are simple to fix as long as you understand how to find them. Others require drastic rewrites. One of the goals of this book is to help you avoid the problems in the first place.

In the following sections, I discuss certain core Oracle features without delving into exactly what these features are and all of the ramifications of using them. For example, I discuss just one of the implications of using Multi-Threaded Server (MTS) architecture— a mode in which you can (and sometimes have to) configure Oracle in order to support multiple database connections. I will not, however, go fully into what MTS is, how it works and so on. Those facts are covered in detail in the Oracle Server Concepts Manual (with more information to be found in the Net8 Administrators Guide).
Mobile Site | Full Site