Storing Numbers As Strings In An Oracle Database

Monday Nov 21st 2016 by David Fitzjarrell
Share:

It’s interesting to note that sometimes VARCHAR2 fields are used to store numbers.  Read on to see why this isn’t the best practice and how to get around the sorting issue if you can’t change the table.

The predominance of business applications designed with no particular database in mind is growing/ With that growth arises an interesting issue: designing tables to contain various types of data in a single column. It may be due to the nature of the data needing to be stored -- telephone numbers, Social Security numbers, ID numbers of various sorts all of which are called numbers but contain non-numeric components making them unsuitable for traditional numeric data types. Sometimes, though, 'convenience' can generate table designs that are flawed by allowing multiple types of data to be stored in a single column, such as character strings and numeric values. Oracle, like any other relational database, allows this and on the face of it this doesn't seem strange. Converting numbers to characters, though, produces results that can be unexpected, especially when sorting those values. Let's examine that situation and see what results it can produce.

Numbers are, well, numbers and sort accordingly. 1 is always 1, and always comes before 2, which comes before 3 and so on. Convert those numbers to characters and an entirely new order is produced, because the character code for '1' isn't the same as the numeric value of 1. A portion of the ASCII table for the character representations of the numbers 0 through 9 follows:

`

BinaryOctDecHexGlyph
196319651967
011 0000 060 48 30 0
011 0001 061 49 31 1
011 0010 062 50 32 2
011 0011 063 51 33 3
011 0100 064 52 34 4
011 0101 065 53 35 5
011 0110 066 54 36 6
011 0111 067 55 37 7
011 1000 070 56 38 8
011 1001 071 57 39 9

With ASCII representation there are only 10 'digits' represented; for multi-digit numbers each digit is replaced with its ASCII code. As such any 'number' represented in ASCII that starts with '1' will all sort together since it's not the display value, but the decimal code that's being sorted on (the value in the third column of the provided table). This results in '10' (ASCII codes 49,48) directly following '1' (ASCII code 49), and '2' (ASCII code 50) not appearing until after all of the 'numbers' starting with '1' have been displayed. To prove this let's look at an example where the lyrics to "Deck The Halls" are loaded into a table, along with their associated line numbers. Two tables will be used, the first where the line_no column is a character string, the second declaring line_no as the traditional NUMBER type. We begin:


SQL>
SQL> --
SQL> -- Create demo table
SQL> --
SQL> -- Make line_no a character string
SQL> --
SQL>
SQL> create table order_tst (line_no varchar2(5), verse varchar2(80));

Table created.

SQL>
SQL> --
SQL> -- Add primary key
SQL> --
SQL>
SQL> alter table order_tst add constraint ord_tst_pk primary key(line_no);

Table altered.

SQL>
SQL> --
SQL> -- Insert data, in order, with line numbers
SQL> --
SQL> -- Unfortunately the line 'numbers' are character
SQL> -- strings
SQL> --
SQL>
SQL> insert all
  2  into order_tst values('1	 ','Deck the halls with boughs of holly,')
  3  into order_tst values('2	 ','Fa la la la la la la la!')
  4  into order_tst values('3	 ','''Tis the season to be jolly,')
  5  into order_tst values('4	 ','Fa la la la la la la la!')
  6  into order_tst values('5	 ','Don we now our gay apparel,')
  7  into order_tst values('6	 ','Fa la la la la la la la!')
  8  into order_tst values('7	 ','Troll the ancient Yuletide carol,')
  9  into order_tst values('8	 ','Fa la la la la la la la!')
 10  into order_tst values('9	 ','************************************')
 11  into order_tst values('10	 ','See the blazing yule before us,')
 12  into order_tst values('11	 ','Fa la la la la la la la!')
 13  into order_tst values('12	 ','Strike the harp and join the chorus,')
 14  into order_tst values('13	 ','Fa la la la la la la la!')
 15  into order_tst values('14	 ','Follow me in merry measure,')
 16  into order_tst values('15	 ','Fa la la la la la la la!')
 17  into order_tst values('16	 ','While I tell of Yuletide treasure,')
 18  into order_tst values('17	 ','Fa la la la la la la la!')
 19  into order_tst values('18	 ','************************************')
 20  into order_tst values('19	 ','Fast away the old year passes,')
 21  into order_tst values('20	 ','Fa la la la la la la la!')
 22  into order_tst values('21	 ','Hail the new, year lads and lasses,')
 23  into order_tst values('22	 ','Fa la la la la la la la!')
 24  into order_tst values('23	 ','Sing we joyous all together!')
 25  into order_tst values('24	 ','Fa la la la la la la la!')
 26  into order_tst values('25	 ','Heedless of the wind and weather,')
 27  into order_tst values('26	 ','Fa la la la la la la la!')
 28  into order_tst values('27	 ','************************************')
 29  select * From dual;

27 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>

Notice that the data was loaded 'in order'; unfortunately the table is a heap table and no order is maintained as the data is inserted. Knowing this an ORDER BY clause is used, but since the line_no values are VARCHAR2 strings they sort in what seems to be a rather strange order:


SQL>
SQL> --
SQL> -- Retrieve data
SQL> --
SQL> -- It doesn't come out as one might expect
SQL> --
SQL> -- Even with an order by
SQL> --
SQL>
SQL> select *
  2  from order_tst
  3  order by 1;

LINE_ VERSE
----- --------------------------------------------------------------------------------
1     Deck the halls with boughs of holly,
10    See the blazing yule before us,
11    Fa la la la la la la la!
12    Strike the harp and join the chorus,
13    Fa la la la la la la la!
14    Follow me in merry measure,
15    Fa la la la la la la la!
16    While I tell of Yuletide treasure,
17    Fa la la la la la la la!
18    ************************************
19    Fast away the old year passes,
2     Fa la la la la la la la!
20    Fa la la la la la la la!
21    Hail the new, year lads and lasses,
22    Fa la la la la la la la!
23    Sing we joyous all together!
24    Fa la la la la la la la!
25    Heedless of the wind and weather,
26    Fa la la la la la la la!
27    ************************************
3     'Tis the season to be jolly,
4     Fa la la la la la la la!
5     Don we now our gay apparel,
6     Fa la la la la la la la!
7     Troll the ancient Yuletide carol,
8     Fa la la la la la la la!
9     ************************************

27 rows selected.

SQL>

More work is required since the string data needs to be converted to numeric data before the ORDER BY will perform as we expect:


SQL>
SQL> --
SQL> -- Retrieve data, and sort by line_no
SQL> -- as a number, not a character string
SQL> --
SQL> -- It is more work to convert numbers
SQL> -- stored as characters back to numbers
SQL> --
SQL>
SQL> select *
  2  from order_tst
  3  order by to_number(line_no);

LINE_ VERSE
----- --------------------------------------------------------------------------------
1     Deck the halls with boughs of holly,
2     Fa la la la la la la la!
3     'Tis the season to be jolly,
4     Fa la la la la la la la!
5     Don we now our gay apparel,
6     Fa la la la la la la la!
7     Troll the ancient Yuletide carol,
8     Fa la la la la la la la!
9     ************************************
10    See the blazing yule before us,
11    Fa la la la la la la la!
12    Strike the harp and join the chorus,
13    Fa la la la la la la la!
14    Follow me in merry measure,
15    Fa la la la la la la la!
16    While I tell of Yuletide treasure,
17    Fa la la la la la la la!
18    ************************************
19    Fast away the old year passes,
20    Fa la la la la la la la!
21    Hail the new, year lads and lasses,
22    Fa la la la la la la la!
23    Sing we joyous all together!
24    Fa la la la la la la la!
25    Heedless of the wind and weather,
26    Fa la la la la la la la!
27    ************************************

27 rows selected.

SQL>

The song lyrics are now in proper sequence, and it took an explicit conversion between string and numeric data to effect that. It is important to note that had there been mixed data, or phone numbers, Social Security numbers or any data that wasn't fully numeric the conversion would have failed with the following error:


ERROR at line 1:
ORA-01722: invalid number

...making it difficult to produce the 'proper' order for the output. That may be an acceptable trade-off to have the 'convenience' of having a 'universal' field to store actual numbers, pseudo-numbers (SSN, ID numbers, telephone numbers) and actual character data (alpha-numeric product 'numbers' and the like). It also can make more work for the database, since character strings will behave differently with indexed access. (Indexing strings goes back to the nature of an ASCII string, which is a concatenation of individual characters and causes Oracle to collate the strings and sort according to the leading character, next character, etc. to put the strings in proper order. Indexing numbers doesn't require such 'gyrations' as each number, regardless of length, is a single entity which can be ordered by magnitude.) Dropping the original table, re-creating it using the proper data type (NUMBER) for the line_no column and loading the modified data the query is run again, absent the ORDER BY since the data was loaded 'in order' and line_no is actually a number field. This fixes the additional work needed to convert strings to numbers but also reveals the inherent 'problem' of data order in a heap table (there isn't any):


SQL>
SQL> --
SQL> -- Drop existing table
SQL> --
SQL>
SQL> drop table order_tst;

Table dropped.

SQL>
SQL> --
SQL> -- Create demo table again
SQL> -- Make line_no a number
SQL> --
SQL>
SQL> create table order_tst (line_no number, verse varchar2(80));

Table created.

SQL>
SQL> --
SQL> -- Add primary key
SQL> --
SQL>
SQL> alter table order_tst add constraint ord_tst_pk primary key(verse, line_no);

Table altered.

SQL>
SQL> --
SQL> -- Insert data, again in order
SQL> --
SQL>
SQL> insert all
  2  into order_tst values(1,'Deck the halls with boughs of holly,')
  3  into order_tst values(2,'Fa la la la la la la la!')
  4  into order_tst values(3,'''Tis the season to be jolly,')
  5  into order_tst values(4,'Fa la la la la la la la!')
  6  into order_tst values(5,'Don we now our gay apparel,')
  7  into order_tst values(6,'Fa la la la la la la la!')
  8  into order_tst values(7,'Troll the ancient Yuletide carol,')
  9  into order_tst values(8,'Fa la la la la la la la!')
 10  into order_tst values(9,'************************************')
 11  into order_tst values(10,'See the blazing yule before us,')
 12  into order_tst values(11,'Fa la la la la la la la!')
 13  into order_tst values(12,'Strike the harp and join the chorus,')
 14  into order_tst values(13,'Fa la la la la la la la!')
 15  into order_tst values(14,'Follow me in merry measure,')
 16  into order_tst values(15,'Fa la la la la la la la!')
 17  into order_tst values(16,'While I tell of Yuletide treasure,')
 18  into order_tst values(17,'Fa la la la la la la la!')
 19  into order_tst values(18,'************************************')
 20  into order_tst values(19,'Fast away the old year passes,')
 21  into order_tst values(20,'Fa la la la la la la la!')
 22  into order_tst values(21,'Hail the new, year lads and lasses,')
 23  into order_tst values(22,'Fa la la la la la la la!')
 24  into order_tst values(23,'Sing we joyous all together!')
 25  into order_tst values(24,'Fa la la la la la la la!')
 26  into order_tst values(25,'Heedless of the wind and weather,')
 27  into order_tst values(26,'Fa la la la la la la la!')
 28  into order_tst values(27,'************************************')
 29  select * From dual;

27 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>

Retrieving the data from the table without an ORDER BY produces interesting results, as the expected 'order' doesn't exist. Oracle places rows where they will fit in the heap, so the fifth row inserted may end up in the 'second' slot in the data block because it fits better than any prior row, behavior which produces the results shown below:


SQL>
SQL> --
SQL> -- Retrieve data
SQL> --
SQL> -- It doesn't come out as one might expect
SQL> --
SQL>
SQL> select *
  2  from order_tst;

   LINE_NO VERSE
---------- --------------------------------------------------------------------------------
         3 'Tis the season to be jolly,
         9 ************************************
        18 ************************************
        27 ************************************
         1 Deck the halls with boughs of holly,
         5 Don we now our gay apparel,
         2 Fa la la la la la la la!
         4 Fa la la la la la la la!
         6 Fa la la la la la la la!
         8 Fa la la la la la la la!
        11 Fa la la la la la la la!
        13 Fa la la la la la la la!
        15 Fa la la la la la la la!
        17 Fa la la la la la la la!
        20 Fa la la la la la la la!
        22 Fa la la la la la la la!
        24 Fa la la la la la la la!
        26 Fa la la la la la la la!
        19 Fast away the old year passes,
        14 Follow me in merry measure,
        21 Hail the new, year lads and lasses,
        25 Heedless of the wind and weather,
        10 See the blazing yule before us,
        23 Sing we joyous all together!
        12 Strike the harp and join the chorus,
         7 Troll the ancient Yuletide carol,
        16 While I tell of Yuletide treasure,

27 rows selected.

SQL>

The ORDER BY is necessary, even for numeric data, but it doesn't require additional thought to produce the expected results:


SQL>
SQL> --
SQL> -- Retrieve data, and sort by line_no
SQL> --
SQL>
SQL> select *
  2  from order_tst
  3  order by 1;

   LINE_NO VERSE
---------- --------------------------------------------------------------------------------
         1 Deck the halls with boughs of holly,
         2 Fa la la la la la la la!
         3 'Tis the season to be jolly,
         4 Fa la la la la la la la!
         5 Don we now our gay apparel,
         6 Fa la la la la la la la!
         7 Troll the ancient Yuletide carol,
         8 Fa la la la la la la la!
         9 ************************************
        10 See the blazing yule before us,
        11 Fa la la la la la la la!
        12 Strike the harp and join the chorus,
        13 Fa la la la la la la la!
        14 Follow me in merry measure,
        15 Fa la la la la la la la!
        16 While I tell of Yuletide treasure,
        17 Fa la la la la la la la!
        18 ************************************
        19 Fast away the old year passes,
        20 Fa la la la la la la la!
        21 Hail the new, year lads and lasses,
        22 Fa la la la la la la la!
        23 Sing we joyous all together!
        24 Fa la la la la la la la!
        25 Heedless of the wind and weather,
        26 Fa la la la la la la la!
        27 ************************************

27 rows selected.

SQL>

Database design is an important aspect of application design, and choices that appear on their face to be logical may not be. There are still applications that are 'database agnostic', relying on 'standard' ANSI SQL so that no matter which RDBMS is chosen the application will function. Such a practice allows for more 'universal' code but it also ignores any RDBMS-specific modifications to that ANSI SQL that improve performance for that particular engine. Oracle provides functionality that may not be available in other RDBMS engines, or may not be implemented the same way, requiring RDBMS-specific functions and procedure calls that, with 'standard' ANSI SQL aren't provided in the standard. Along with that the practice of using character strings for storing numbers may be convenient but it isn't good practice if the data stored is actually numeric. Mixed types of data stored in a character field can, and often do, prevent any implicit conversion the database engine can perform. And, as illustrated, such decisions can produce results that aren't logical, such as the sort order, possibly confusing the end users.

There may not be much of a choice in how a vendor chooses to store data, but knowing such limitations can make it easier to produce 'correct' results even though it may take a bit more work.

See all articles by David Fitzjarrell

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved