The Globalization of Language in Oracle – The NLS_COMP and NLS_SORT variables

How does Oracle compare and sort data. There is more to it than
meets the eye.

Ever since I was a newbie to databases I never questioned how sorting
occurred in a database. I just knew that there was a predefined order to the
character set (English) and I never thought much more of it. We all get too
complacent about equating predicates in a WHERE clause or adding an ORDER BY
clause to sort our output to put priority to the data we view. If you care to
delve into how Oracle determines how to sort, there is a good write up in the
Oracle documentation set that discusses primary (first) level, secondary, and
tertiary level sorting mechanisms. I would encourage you to search on these
sorting levels as the reading gives great insight into the complexities that
need to be overcome. We are, in this article, more concerned with the setting
of variables to get the desired outcome for the locale we need to present data
to.

Well these past few months, yes months, I have been playing
around with this globalization stuff and have continually been stumped by what
is in Oracle documentation and what I can actually get to work on my local
databases. After getting over the hang-up of operating system variables over
riding my Oracle client’s session variables it is now time to get a bit deeper
into what these variables do. In my opinion the two most important variables
that can get set for globalization are the NLS_COMP and NLS_SORT variables.
These two variables themselves define how a client will experience the data
that gets presented to them. They control how characters get compared to each
other and how they sort. Table 1 gives a definition of these two
variables.

Table 1 NLS_COMP & NLS_SORT Defined

PARAMETER

Descripton

NLS_COMP

The sole purpose of this variable is to determine how the
predicates in SQL statements will evaluate to each other when a comparison is
required. The typical comparison operator is the WHERE clause but also
includes such comparisons as ORDER BY, START WITH, HAVING,…etc…

There are only two settings.

BINARY : All comparisons will be done in accordance
to the binary value of the characters.

ANSI : All comparisons will be done linguistically
in accordance to the setting of the variable NLS_SORT.

How to set

  1. Set with an operating system environment variable (SET
    NLS_COMP=<binary|ansi>)
  2. ALTER SESSION SET NLS_COMP=<binary|ansi>;

NLS_SORT

NLS_SORT
specifies the collating sequence for ORDER
BY
queries.

If the value is BINARY,
then the collating sequence for ORDER
BY
queries is based on the numeric value of characters.

If the value is a named linguistic sort, sorting is based
on the order of the defined linguistic sort. Most (but not all) languages
supported by the NLS_LANGUAGE
parameter also support a linguistic sort with the same name.

There are many values.

BINARY : All sorting is done in accordance to the binary
value of the characters.

V$NLS_VALID_VALUES : All sorting is done in
accordance to a named linguistic definition. This means that Oracle will sort
in accordance to a particular locale (GERMAN, POLISH, FRENCH, etc.).These
valid named definitions may be obtained by querying the V$NLS_VALID_VALUES
where parameter = `SORT’. Just be aware that all the values in this view also
have a hybrid definition by adding a suffix of _CI (case insensitivity or _AI
(accent-insensitive and case-insensitive). But more on these two hybrids in
an upcoming article.

How to set

  1. Set with an operating system environment variable (SET
    NLS_SORT=<valid_value>)
  2. ALTER SESSION SET NLS_SORT=<valid_value>;

 

Newsflash Setting NLS_LANG environment
variable

It has come to my
attention that the setting of your NLS_LANG environment variable is of the
utmost importance. This little variable, if not set properly, will create havoc
within your database. It should be set to a valid setting that takes the form
of <nls_language>_<nls_territory>.<nls_characterset> . The
concern comes when you have a registry entry on your client machine that is not
compatible with the database you are connecting to. In my case it was a
difference in character set which caused incompatibilities between clients at
different locales. You can find these values from the view V$NLS_VALID_VALUES.

For example my NLS_LANG
variable was set to AMERICAN_AMERICA.WE8MSWIN1252

In the first
article on globalization
in Oracle I presented you with a table on some
common characters in the English and German language. Table 2 re-visits
this data. You should revisit the first article if you want to create this
table and query it. Table 2 gives you a few examples so that you can see
how setting the NLS_SORT variable will alter the output of a SELECT … ORDER BY
statement.

Table 2 Example data sorted on different NLS_COMP
& NLS_SORT values

select xgraphic from char_table order by xgraphic;

NLS_COMP=ANSI

NLS_SORT=BINARY

NLS_COMP=ANSI

NLS_SORT=BINARY_CI

NLS_COMP=ANSI

NLS_SORT=BINARY_AI

XDECIMAL XBINARY XGRAPHIC

——– ———- ———

65 1000001 A

66 1000010 B

67 1000011 C

68 1000100 D

79 1001111 O

85 1010101 U

97 1100001 a

98 1100010 b

99 1100011 c

100 1100100 d

111 1101111 o

117 1110101 u

196 11000100 Ä

214 11010110 Ö

220 11011100 Ü

228 11100100 ä

246 11110110 ö

252 11111100 ü

XDECIMAL XBINARY XGRAPHIC

——– ———- ———

65 1000001 A

97 1100001 a

66 1000010 B

98 1100010 b

67 1000011 C

99 1100011 c

68 1000100 D

100 1100100 d

79 1001111 O

111 1101111 o

85 1010101 U

117 1110101 u

196 11000100 Ä

228 11100100 ä

214 11010110 Ö

246 11110110 ö

220 11011100 Ü

252 11111100 ü

XDECIMAL XBINARY XGRAPHIC

——– ———- ——–

65 1000001 A

97 1100001 a

228 11100100 ä

196 11000100 Ä

66 1000010 B

98 1100010 b

67 1000011 C

99 1100011 c

68 1000100 D

100 1100100 d

79 1001111 O

111 1101111 o

214 11010110 Ö

246 11110110 ö

85 1010101 U

117 1110101 u

252 11111100 ü

220 11011100 Ü

To take a look at what is really happing behind the scene
here we need to extract the numerical representation of a character to see how
it sorts. This can be done by imposing the NLSSORT function around the
character string in question. Table 3 presents the output of just such a
query. The second column now uses the _CI (case insensitivity) extension. The
item to glean from this output is that the upper case letter A has the same
NLSSORT value as the lower case letter a and the ä character now has the same nlssort
value as Ä. Now when we get to _AI (accent insensitivity & case
insensitivity) in column three we have a condition where the nlssort value is
the same for a set of characters such that a=A=ä=Ä. WOW! Now I encourage you to
run under these three scenarios and issue the following SQL.

SELECT xgraphic from char_table where xgraphic = `a’;

Table 3 Example data sorted on different NLS_COMP
& NLS_SORT values with NLSSORT

select xdecimal, nlssort(xgraphic) nlssort,xgraphic from char_table
order by xgraphic;

NLS_COMP=ANSI

NLS_SORT=BINARY

NLS_COMP=ANSI

NLS_SORT=BINARY_CI

NLS_COMP=ANSI

NLS_SORT=BINARY_AI

XDECIMAL NLSSORT XGRAPHIC

——– ———- ———

65 4100 A

66 4200 B

67 4300 C

68 4400 D

79 4F00 O

85 5500 U

97 6100 a

98 6200 b

99 6300 c

100 6400 d

111 6F00 o

117 7500 u

196 C38400 Ä

214 C39600 Ö

220 C39C00 Ü

228 C3A400 ä

246 C3B600 ö

252 C3BC00 ü

XDECIMAL NLSSORT XGRAPHIC

——– ———- ——–

65 6100 A

97 6100 a

66 6200 B

98 6200 b

67 6300 C

99 6300 c

68 6400 D

100 6400 d

79 6F00 O

111 6F00 o

85 7500 U

117 7500 u

196 C3A400 Ä

228 C3A400 ä

214 C3B600 Ö

246 C3B600 ö

220 C3BC00 Ü

252 C3BC00 ü

XDECIMAL NLSSORT XGRAPHIC

——– ———- ——–

65 6100 A

97 6100 a

228 6100 ä

196 6100 Ä

66 6200 B

98 6200 b

67 6300 C

99 6300 c

68 6400 D

100 6400 d

79 6F00 O

111 6F00 o

214 6F00 Ö

246 6F00 ö

85 7500 U

117 7500 u

252 7500 ü

220 7500 Ü

Dealing with a multi lingual database is confusing at best. But
when we can get a glimpse at the internal numbers associated with a sort, it
starts to give us some insights into how things work and why we are sorting in
a particular order. Unfortunately this article has brought up just as many
questions as I think I have answered for myself. Hopefully for you also. If
there is just one item I would like everyone to get out of this article, it is
that you MUST set the NLS_LANG variable properly. DO NOT rely upon past
settings and PLEASE verify the setting in your registry. Globalization is a
wild ride, please hold on.

»


See All Articles by Columnist
James Koopmann

James Koopmann
James Koopmann
James Koopmann has fourteen years of database design, development and performance tuning experience. In addition, he has extensive database administration experience in Oracle and other relational databases in production environments, specializing in performance tuning of database engines and SQL based applications. Koopmann is an accomplished author with several technical papers in various Oracle related publications such as Oracle Magazine, Oracle Professional and SQL>UPDATE_RMOUG. He is a featured author and database expert for DatabaseJournal, a member of the editorial review committee for Select Journal (The Magazine for the International Oracle Users Group), an Oracle Certified Professional DBA and noted speaker at local Oracle User Groups around the country.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles