How to take advantage of globalization techniques to perform case insensitivity in Oracle.
Up to now in this series on globalization, we have discussed how to set particular environment variables or database parameters. This was done to get our client machines configured in such a way that they truly represented their true locale and for the proper translation of character strings when communicating between the client and database server.
This article shifts gears, taking us past the setting of variables, and diving into an implementation of how we can use these settings to our benefit--in particular our ability to perform case insensitivity comparisons.
Over the years, the ability, and desire of the end use community, to equate 'a' with 'A' has dumbfounded many a programmer and DBA. The reason for wanting to perform a case insensitive search is quite simplistic. An end user, when searching for a person, place or thing, typically wants to just key in for instance 'suzy smith' with no regard for how another user may have entered the name at data entry time. Hopefully you can see that not only does case insensitivity help with searching for items but also during data entry as true duplicates may be found. This of course assumes there is no difference between 'suzy smith', 'Suzy Smith' and 'SUZY SMITH'.
Two articles ago in The Globalization of Language in Oracle - The NLS_COMP and NLS_SORT variables, we touched on the ability to have case insensitivity and accent insensitivity for a character. In this article we set NLS_COMP=ANSI and NLS_SORT=BINARY_CI (case insensitivity) or NLS_SORT=BINARY_AI (accent and case insensitivity). The difference being that under NLS_SORT=BINARY_CI (case insensitivity) a=A and d=D. BUT under NLS_SORT=BINARY_AI (accent and case insensitivity) a=A=d=D.
In pre-Oracle10g versions of Oracle, in order to get this type of equality of characters you needed to use a combination of the NLS_UPPER and NLS_LOWER functions or use the NLS_SORT setting of GENERIC_BASELETTER. While these approaches worked they do have the flaw of having to programmatically code the NLS_UPPER or NLS_LOWER functions into every select statement. Or when using the GENERIC_BASELETTER you had a solution that was not a true linguistic solution for character comparison.
Now, in Oracle 10g if we have a table called NAMES and have the following entries in that table:
SQL> select name from names; NAME ------------- suzy smith Suzy Smith SUZY SMITH
If we wanted to perform a search on 'Suzy Smith', under normal setting of case sensitive (NLS_SORT=BINARY) searching we would get the following:
SQL> select name from names where name = 'Suzy Smith'; NAME ------------ Suzy Smith
If we then wanted to turn on case insensitivity (NLS_SORT=BINARY_CI) we would get the following results:
SQL> alter session set NLS_COMP=ANSI; SQL> alter session set NLS_SORT=BINARY_CI; SQL> select name from names where name = 'Suzy Smith' NAME ------------- suzy smith Suzy Smith SUZY SMITH
As you have noticed these examples use strict equality (where name = 'Suzy Smith'). Often times, more often than not, we want to perform a search where we only know part of a name. So under normal settings of case sensitivity NLS_SORT=BINARY when we perform a search for 'Suzy%' we get the following:
SQL> select name from names where name like 'Suzy%'; NAME ------------ Suzy Smith
And when we switch to case insensitivity (NLS_SORT=BINARY_CI) we get this result:
SQL> alter session set NLS_COMP=ANSI; SQL> alter session set NLS_SORT=BINARY_CI; SQL> select name from names where name like 'Suzy%'; NAME ------------ Suzy Smith
This is the exact same results and thus we gain nothing by using an NLS_SORT setting for case insensitivity. As you can see case insensitivity ONLY works for straight equality searches.
In order to perform a case insensitive search that does not use strict equality we must now introduce the function REGEXP_LIKE. The REGEXP_LIKE function does regular expression matching instead of the simple pattern matching of the LIKE comparison operator. If you have ever used any of the various arrays of Unix type scripting languages that have pattern matching you will feel right at home. If not I would encourage you to do a bit of research on pattern matching and you will soon see the power behind the types of searching, should say pattern matching, you can do. For this article, we will only present the most simplistic use of this function for finding 'Suzy%'.
The following is the format for the
REGEXP_LIKE(source_string, pattern [, match_parameter] )
the column or characters we are wanting to match against.
the pattern or regular expression we would like the source string compared against.
is, for our case insensitivity, is 'i' for case insensitivity or 'c' for case sensitivity. If this match parameter is not supplied then the value of the NLS_SORT parameter is used.
So for our example of searching for 'Suzy%' and having an NLS_SORT for case sensitive searching (NLS_SORT=BINARY) we could issue the following types of SELECT statements and get the following results:
SQL> select name from names where REGEXP_LIKE(name,'Suzy'); NAME ----------- Suzy Smith SQL> select name from names where REGEXP_LIKE(name,'Suzy','i'); NAME ------------ suzy smith Suzy Smith SUZY SMITH SQL> select name from names where REGEXP_LIKE(name,'[sS][uU][zZ][yY]'); NAME ----------- suzy smith Suzy Smith SUZY SMITH
If we switched to an NLS_SORT=BINARY_CI for case insensitivity we now get the following types of results. As you can see we do not need to specify the match_parameter of 'i' for case insensitivity because it defaults to the NLS_SORT variable.
SQL> select name from names where REGEXP_LIKE(name,'Suzy'); NAME ------------ suzy smith Suzy Smith SUZY SMITH
There are literally dozens of ways you can pattern match for finding character strings with this function that extend the power of normal searches you can do by strict equality or by using the simplistic LIKE comparison operator.
While case insensitivity can extend your application and allow your users to literally find dozens of hidden artifacts, the fact still remains that there is no direct way to fully convert an application to case insensitivity unless you do not use the LIKE operator and every SQL statement is an exact equality match. In order to provide for complete case insensitivity to your end user community you need to start using the REGEXP_LIKE function. The only real issue is that you need to change your applications. Maybe in the next release of Oracle they will be able to extend the functionality of the LIKE comparison to provide true case insensitivity matching.