Unwrapping Oracle's DBMS Packages: Understanding Oracle's Random Number Generator

Thursday Apr 22nd 2004 by Steve Callan

Beginning with later versions of Oracle8, Oracle has provided a means of generating random numbers. This built-in package, DBMS_RANDOM, is fairly simple to use, and can generate random numbers which are generally good enough for the needs of most users. Learn more as Steve Callan unwraps the first DBMS package of this series.

Beginning with later versions of Oracle8, Oracle has provided a means of generating random numbers. This built-in package, DBMS_RANDOM, is fairly simple to use, and can generate random numbers which are generally good enough for the needs of most users. If you need to generate a large amount of data without having to provide a lot thought about how random the data is, then DBMS_RANDOM will suit your needs.

If you need to encrypt sensitive data, then you should use Oralce9i's DBMS_OBSFUCATION_TOOLKIT feature. Oracle tells users "Do not use DBMS_RANDOM as it is unsuitable for cryptographic key generation." Is there something wrong with DBMS_RANDOM? Aren't the numbers returned random enough? Don't you get the same output when given the same input? The answers are a combination of "yes" and "no."

By un-wrapping the package Oracle uses to create the random number generator, we will learn quite a bit about how DBMS_RANDOM works and what its limitations are. Before looking at the package and some examples of how it can be used, the meaning of "random" needs to be clarified. Technically speaking, generating random numbers by a known method removes the potential for true randomness. When generated in this manner, the numbers can be properly described as pseudo-random numbers. However, if the pseudo-random numbers meet several conditions or tests (chiefly, the numbers being independent and identically distributed, or "iid"), then they are considered to be random. Ideally, the distribution of the numbers is uniform over the interval of 0 to 1 (and inclusive of the endpoints).

Knowing the parameters of the distribution helps us in evaluating how random the numbers are. Conversely, observing the numbers and calculating the mean and variance helps identify the distribution. Given that our random numbers are (ideally) uniformly distributed over [0,1], we know that the mean should be 1/2 and the variance should turn out to be 1/12. There are many other tests which can be performed against the generated numbers. Having a mean of 1/2 and a variance of 1/12 are rough indicators of a good uniform distribution, but the real tests are more concerned with uniformity and independence. Your random numbers can have a mean of 1/2, for example, but not be uniformly distributed.

The following properties of a good random generator - fast, portable, long enough cycle, replicable results and output being uniformly "iid" - are present with Oracle. In fact, by using the same seed value used in the following examples, you should be able to produce the same results. Oracle's SQL Reference Guide lists four arguments or procedures you can use with DBMS_RANDOM: initialize, seed, random, and terminate. There are several points missing in this documentation. First, the range of numbers is from (-)231 to (+)231, or +/- 2147483648. Second is that the number of digits may be as many as ten, not eight. Lastly, there are other undocumented functions. One such function is DBMS_RANDOM.VALUE, and it will return the type of value we are more interested in (a number between zero and one). The other hidden functions you can use return normally distributed numbers and strings of varying length and case.

Let's look at some output from the DBMS_RANDOM package and see how Oracle's random number generator performs. We will use a 6-digit seed number (123456) and start by generating 1,000 numbers, then increasing by a factor of ten up to ten million. The table name is RAND and has columns named LINE and RNO (for random number).

  2    v_rand number;
  3  BEGIN
  5    FOR i IN 1..1000 LOOP
  6     v_rand := DBMS_RANDOM.value;
  7     INSERT into rand values (i,v_rand);
  8    END LOOP;
  9  END;
 10  /

PL/SQL procedure successfully completed.

Selecting the first 10 rows shows:

SQL> select * from rand
  2  where line < 11;

      LINE                                      RNO
---------- ----------------------------------------
         1  0.9253168129811330987378779577193159262
         2  0.3703059867076638894717777425502136731
         3  0.8562787602662748879896983860530778367
         4  0.8747769791015347163677476210098089609
         5  0.8538887894283505001033221816233701639
         6  0.0139762421028966557398918466225500621
         7  0.6789827768885798969202524863427842743
         8  0.1219758197605125529485878115247706788
         9  0.6384861881298654042162612548721038633
        10  0.5060415527775185635522779058964300161

10 rows selected.

How did the average and variance "perform?"

SQL> select avg(rno), variance(rno)
  2  from rand;

---------- -------------
.505209167    .081572912

The average and variance we would expect is .50000000 and .08333333. Continuing on with the output from tables with 10,000 to 10,000,000 rows, we will see an improvement in those indicators:

# of Rows



Time to generate





















Up until a million rows, the average and variance both tended to converge to (but not actually reach) their expected values. At the ten million row mark, only the variance improved. Again, the mean and variance are not the true tests of uniformity and independence. Other tests, which include the following - frequency, runs, autocorrelation, gap and poker - could be used to test uniformity and independence. For example, if the numbers were uniformly distributed, we would expect to see the same count of numbers in whatever intervals we were interested in.

Using RANDOM instead of VALUE in the million row table reflects the transformation of the Uniform(0,1) range of numbers to plus or minus 2147483648. You can see the minimum and maximum numbers are close to 2147483648 and that there is very little repetition of numbers. Out of a million generated numbers, 109 numbers were duplicated (a rate around .01%).

SQL> select min(rno), max(rno), count(distinct(rno))
  2  from rand;

------------- ---------- --------------------
  -2147479960 2147480366               999891

Looking at the scripts behind DBMS_RANDOM shows how the numbers from DBMS_RANDOM.RANDOM are created. You can look at the scripts which create this package, or view the text selected from all_source. Here is the first part of the source:

SQL> select text from all_source where name = 'DBMS_RANDOM';
PACKAGE dbms_random AS

    --  OVERVIEW
    --  This package should be installed as SYS.  It generates a sequence of
    --  random 38-digit Oracle numbers.  The expected length of the sequence
    --  is about power(10,28), which is hopefully long enough.
    --  USAGE
    --  This is a random number generator.  Do not use for cryptography.
    --  For more options the cryptographic toolkit should be used.
    --  By default, the package is initialized with the current user
    --  name, current time down to the second, and the current session.
    --  If this package is seeded twice with the same seed, then accessed
    --  in the same way, it will produce the same results in both cases.

The script to create the DBMS_RANDOM package is dbmsrand.sql, and is located in the ORACLE_HOME\rdbms\admin directory. You can see (at the end of it) how the range of numbers returned is between -power(2,31) and power(2,31). Note that the output can be negative as well as positive. For a million random numbers, and using the RANDOM argument, and assuming the random numbers are truly random, you could expect the range to cover quite a bit of the interval in [-2147483648, 2147483648] and to have an average of zero and a sum of zero. However, given the magnitude of the upper and lower bounds, one large value in either direction can "swamp" the results. In the million row table, there are 499817 numbers less than zero, so it would not be surprising to see both the sum and average having values above zero.

The minimum and maximum values returned in an earlier query were -2147479960 and 2147480366. This range missed the lower end by 3688 and the upper end by 3282, or put another way, the million row table covered over 99.999% of the possible range of values.

This is the description of the STRING function found in the DBMS_RANDOM package:

    -- get a random string
    FUNCTION string (opt char, len NUMBER)
          /* "opt" specifies that the returned string may contain:
             'u','U'  :  upper case alpha characters only
             'l','L'  :  lower case alpha characters only
             'a','A'  :  alpha characters only (mixed case)
             'x','X'  :  any alpha-numeric characters (upper)
             'p','P'  :  any printable characters
        RETURN VARCHAR2;  -- string of  characters (max 60)

Here is an example using the "P" option:

  2    v_str varchar2(100);
  3  BEGIN
  5    FOR i IN 1..10 LOOP
  6     v_str := DBMS_RANDOM.string('p',20);
  7     dbms_output.put_line(i||': '||v_str);
  8    END LOOP;
  9  END;
 10  /

1: wCqsq!`+\PVNXn!uEip,
2: kx5di5yaEC2 =~XQ! NI
3: $Am`fz^wH!VQevIaXlU7
4: Dr,yO0 YoP?I+_mRss]2
5: 6Q3+:[buk/hEs[CTQn;V
6: K~BSaD$Zk(to>iB^Oop<
7: ?i c,c}]O))@r!fxv8f'
8: cWe+x,%DK5pqX<;Xb@21
9: N.{_)[h6")f3HWG8u&)X
10: xnP)FDyVBx*EGbfl3OA{

PL/SQL procedure successfully completed.

No doubt about it, the "P" option returns quite the array of gibberish. The use of only alpha characters in the STRING function can help generate alphabet replacement type of cryptograms. It can also be used to generate passwords. You may want to combine two functions to produce varying case and numbers, and a simple example is shown below:

  2    v_str varchar2(100);
  3  BEGIN
  5    FOR i IN 1..10 LOOP
  6     v_str := DBMS_RANDOM.string('X',6);
  7     dbms_output.put_line(i||': '||v_str);
  8    END LOOP;
  9  END;
 10  /

4: U4SX8Q
6: 60AZLI
7: 00HF1C
10: FOLSK8

The other function hidden in the package returns normally distributed (bell curve) random numbers. Distributions that are not normally distributed tend to become normally distributed when you start collecting a lot of them. The standard Normal distribution has a mean of zero and a variance of one, and those results should be expected when looking at a large sample of random numbers. The million row table created using DBMS_RANDOM.NORMAL shows the following results (table name of NORM, columns named LINE and RNORM):

SQL> select avg(rnorm), variance(rnorm)
  2  from norm;

---------- ---------------
-.00012847      1.00006502

The minimum and maximum values of -4.9973893 and 4.85893083 (see below) correspond to observations in the far end of each tail, or in other words, extreme values of area (close to zero and close to 100% of the area under the curve).

SQL> select min(rnorm), max(rnorm)
  2  from norm;

---------- ----------
-4.9973893 4.85893083

In closing, this exploration or unwrapping of the DBMS_RANDOM package should have surfaced some new (but old, really) features of Oracle for you and given you some insight into the nature of the numbers produced by this package. Random numbers are used in many places in science and engineering, and as we will see in a later article, in Oracle databases.

» See All Articles by Columnist Steve Callan

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved