As mentioned in Parts 1 and 2, there are several relatively easy steps you can take to improve performance. One of those steps involves using an automated tool to “guide” you in writing SQL statements. There are several vendors who manufacture analysis or performance tuning tools, and in the next two articles, we will look at one of them.
Quest Software
Many DBAs and developers use a tool named Toad, produced by Quest Software. According to a statement at Quest Software’s Web site, the Toad user community numbers around 500,000 users. One of Toad’s features is its ability to optimize SQL queries. In other words, Oracle Corporation does not own the market on tuning advisor type of tools.
Knowing that you have several choices with respect to advisory tools, and understanding what it is they do, what do you use them on if you are not working in a production or development environment? And perhaps just as likely, even if you are in a development environment, you may not have any bulk data to use. Generating bulk or large amounts of data is the focus of this article, and the tool we will look at for this purpose is another Quest Software product: DataFactory® for Oracle.
DataFactory
The purpose of DataFactory is to “quickly create meaningful test data for multiple database platforms.” The platforms include Oracle, DB2, Sybase and any ODBC compliant database. Normally retailing at $595 per server, a free 30-day version is available for download at Quest Software’s Web site.
To obtain the software (current version of 5.5.0), you must register using a “real” email address. Hotmail and Gmail addresses were rejected, but Comcast went through just fine. Once you register, you will receive an email that contains a key to unlock the application and start the 30-day free trial clock.
The installation process is quick and straightforward. If you are running Microsoft AntiSpyware, you may receive one or more errors. Disable the real time protection and attempt to reinstall DataFactory.
Creating Tutorial Objects
An excellent way to get to know the application is to use its tutorial objects. The general process is to:
- Create a project
- Create tables in a schema
- Run a script to load data
Unfortunately, an excellent way to get a refresher on disabling system-named referential constraints is to use the built-in tutorial objects. Using an iterative process, you can disable constraints one at a time until the load script runs through without error. However, while Quest is working on fixing this bug, we can take a short excursion into identifying and disabling a constraint.
After starting DataFactory, you can choose to start the tutorial. The instructions on how to load the tutorial objects (as is all help) are in HTML files.
The instructions from the help system state that the tables are populated. That is not correct. The tables (15 in all, named using a “DF_” prefix) are populated after an additional step.
Prior to creating the tables, you may want to create a separate schema in your database. Using Oracle10g, I created a user/schema owner named quest (granting connect and resource will be sufficient). You will be prompted for a username/password combination and database information.
So, following Tools>Create Tutorial Obects –
– the Tutorial Setup Wizard appears (with its own version of the Oracle logo).
A list of tables appears on the Finish Page.
Upon successful creation, DataFactory tells you so.
The project folder appears in the left frame
Click the Run button on the main menu. The ORA-02291 integrity constraint violated error will appear quite a few times (some tables more than once) because the loaded data in a foreign key-designated column does not correspond with data in a parent table. Almost all of the constraints use the SYS_Cxxxxxx naming structure, meaning they are not explicitly named.
To work around the integrity constraint violation, you can disable the constraint (once you know which table to alter). The query and ALTER TABLE statement below show one method to identify and disable the problem constraint.
SQL> select owner, constraint_name, table_name, column_name 2 from all_cons_columns 3 where constraint_name like '%9814%'; OWNER CONSTRAINT_NAME TABLE_NAME COLUMN_NAME ----- ------------------------------ -------------------- ------------ QUEST SYS_C009814 DF_ORDERS CUSTID SQL> alter table df_orders 2 disable constraint sys_c009814; Table altered.
The Results window showing that your project-named script has completed successfully means we are ready to start looking around at what was created.
Instead of analyzing each table one at a time, use the DBMS_STATS built-in (which Oracle recommends to use for most analyze operations). If you are using Oracle10g, you may want to add a WHERE dropped=’NO’ to prevent dropped tables from appearing in queries on user_tables or tab (as an example).
SQL> execute dbms_stats.gather_schema_stats('QUEST'); PL/SQL procedure successfully completed. SQL> select table_name, num_rows 2 from user_tables 3 where dropped='NO'; TABLE_NAME NUM_ROWS -------------------- ---------- DF_TITLES 100 DF_MOVIE_CUSTOMER 1100 DF_MOVIE_EMPLOYEE 900 DF_DUMMY 1100 DF_AUTHORS_TITLES 1100 DF_MOVIE_RENTAL 700 DF_PRODUCTS 100 DF_MOVIE_TAPE 400 DF_CUSTOMERS 1100 DF_AUTHORS 1100 DF_MOVIE_DISTRICT 1100 DF_ORDERS 101 DF_MOVIE_MOVIE 900 DF_ORDERDETAILS 200 DF_MOVIE_STORE 500 15 rows selected.
Back in the project hierarchy or list of tables, selecting a table in the list will show its columns and their datatypes (you may have to toggle between the Children and Results tabs at the bottom of the application).
Using the DF_MOVIE_CUSTOMER table as an example, how does its data look? The “random characters” option definitely produces exactly that.
More on the Tutorial Tables
Are there any indexes on the foreign key columns?
SQL> select index_name, table_name, column_name, column_position 2 from user_ind_columns; INDEX_NAME TABLE_NAME COLUMN_NAME COLUMN_POSITION ------------ -------------------- -------------------- --------------- SYS_C009823 DF_MOVIE_DISTRICT DISTRICTID 1 SYS_C009827 DF_MOVIE_STORE STOREID 1 SYS_C009830 DF_MOVIE_EMPLOYEE EMPID 1 SYS_C009837 DF_MOVIE_CUSTOMER CUSTID 1 SYS_C009841 DF_MOVIE_MOVIE MOVIEID 1 SYS_C009845 DF_MOVIE_TAPE TAPEID 1 SYS_C009850 DF_MOVIE_RENTAL TAPEID 1 SYS_C009850 DF_MOVIE_RENTAL CUSTID 2 SYS_C009850 DF_MOVIE_RENTAL RENTDATE 3 SYS_C009810 DF_CUSTOMERS CUSTID 1 SYS_C009813 DF_ORDERS ORDERID 1 SYS_C009816 DF_PRODUCTS PRODUCTID 1 SYS_C009819 DF_ORDERDETAILS ORDERID 1 SYS_C009819 DF_ORDERDETAILS PRODUCTID 2 14 rows selected.
What does the output suggest? You can immediately identify the fact that for one, not every table has a primary key. There are 15 tables, but only 14 rows (or 11 distinct tables as some indexes are composites). Why do we know this? Because one of the benefits of creating a primary key is that you get an index for free. If you disabled all of the referential integrity constraints that arise in the load script, what else might you suspect?
Oracle recommends indexing foreign key columns as they are frequently used in joins, and the general rule is to index columns used in WHERE clauses (which is obviously where joins are listed). Given the lack of indexes, your suspicion should be that the “create table” component of the tutorial tables does not index foreign key columns.
The query below shows the table name/column name foreign keys (they all have position 1 meaning only a single column was used).
SQL> select a.constraint_name, b.constraint_type, 2 a.table_name, a.column_name 3 from user_cons_columns a, all_constraints b 4 where a.constraint_name=b.constraint_name 5 and constraint_type = 'R'; CONSTRAINT_NAME C TABLE_NAME COLUMN_NAME ---------------- - -------------------- ------------- SYS_C009831 R DF_MOVIE_EMPLOYEE SUPERVISORID SYS_C009828 R DF_MOVIE_STORE DISTRICTID SYS_C009821 R DF_ORDERDETAILS PRODUCTID SYS_C009820 R DF_ORDERDETAILS ORDERID DFMOVIESTOREFK2 R DF_MOVIE_STORE MANAGERID SYS_C009852 R DF_MOVIE_RENTAL TAPEID SYS_C009851 R DF_MOVIE_RENTAL CUSTID SYS_C009838 R DF_MOVIE_CUSTOMER STOREID SYS_C009814 R DF_ORDERS CUSTID SYS_C009846 R DF_MOVIE_TAPE MOVIEID DFMOVIEEMPFK2 R DF_MOVIE_EMPLOYEE STOREID 11 rows selected.
The end result? Suspicion confirmed; the foreign keys were not indexed.
From a management or maintenance perspective, why are only two of the referential integrity constraints explicitly named while the rest are system named? As it turns out, of the 51 constraints in this schema, these happen to be the only two user named constraints.
In Closing
The key point to take away from this exploration of a tool such as DataFactory is that while the tool (or a script you create) can generate millions of rows of test or sample data, what good is any of it if it misses the boat on referential integrity or other best practices with respect to data modeling? If you are trying to tune queries for an application, the test data needs to reflect how the application uses it. If you are relying on referential integrity, your test data needs to support and honor parent table-child table relationships.
From a design standpoint, two best practices that were violated include failure to index foreign key columns, and failure to explicitly name three major items (primary keys, foreign keys, and indexes). A possible third violation concerns not having a primary key on every table. Does every table in a schema require a primary key? No, but for the most part, every table (to support normalization) should, and if not, you should at least know why. Put another way, to not normalize a table should be a conscious decision, not an oversight.