This is a nuts-and-bolts discussion of some potential issues with nulls. While BOL has extensive documentation, it doesn't give you a concise list of "gotchas." These are some of the things you should watch for when working with existing objects.
Watch your DML, particularly with inequality
Either provide for nullability in every statement where it is a potential issue or have a look at the database to see if you should be addressing it. For instance, if you have a table with 900,000 rows, 899,990 of which already have the desired value, you might use the following statement:
update t set col3 = 'Y' where col3 <> 'Y' or col3 is null
If col3 isn't nullable, then the last line is superfluous. If it is nullable,
you'd best include
is null or risk not changing a row or two with a
null in that column. Based on your data, you may be justified omitting a search
condition and changing every row in the table.
There are many ways to determine if a field is nullable: the Enterprise
Manager, sp_columns, information_schema.columns, or go fishing in syscolumns.
Programmatically, you can use ADO's
Variables can be a major problem because they're often used in multiple expressions. They may evaluate to null; i.e., during division by zero, or they may inadvertently never be set.
My employer's senior management commissioned an Income Statement that allocates very specific corporate expenses into several components of Cost of Sales. I keep the sum of these expenses in a variable and use it in successive queries. It initially looked like this:
This worked well until our Finance people started running it on just part of the company. Since there were no corporate expenses, @nonGA was null and so was every expression that used it. Fortunately, the effect, while disconcerting, wasn't subtle and I quickly fixed it.
ISNULL replaces a null with a specified value. In this case,
zero. Be careful that
ISNULL does not hide problems with your
queries. They may be returning null because of logic or typing errors. Speaking
of typing errors, did you notice the distinction between the
function and the
IS NULL predicate?
Consider the functions available to you
ISNULL, you have
CASE at your disposal.
CASE is a
jack-of-all-trades. The others are more specialized.
Suppose Stephen Wynkoop is feeling generous and wants to see the effect of doubling his columnists' hits on a specific date. Further, everyone, even columnists with no hits that day, gets at least 10.
He'll probably left join his columnist table with his hit table. Because many rows will return null hits, he can't just multiply them by two. Same for the columnists with one to four hits. They would get fewer than 10 when doubled. Here's what he might do:
select c.full_name, case when h.hits > 5 then h.hits * 2 else 10 end from columnists c left join hitstats h on h.columnist_id = c.id and h.hit_date = 'April 1, 2001' -- arbitrary date for illustration order by 2 desc
Columnists with no hits, or fewer than 5, are swept up by
and given 10.
COALESCE returns the first non-null argument given it. It can be
more concise than
NULLIF(arg1, arg2) returns null if its two arguments are equal,
and arg1's value if not. It is most useful when combined with other functions.
Say a college sends solicitations to its alumni. Part of the solicitation is a reminder of how much they contributed last time. Their table doesn't allow nulls so years without contributions have the value zero.
select alum, coalesce(nullif(contrib_2000,0),nullif(contrib_1999,0),nullif contrib_1998,0),0) from contributions
If the alumnus's 2000 contribution is zero,
NULLIF will convert
it to null and
COALESCE will move on to the next year. Otherwise,
will return the year's contribution and
COALESCE will stop there.
The last zero is a catchall for alumni who have not contributed in those years.
If the last zero is omitted, those alumni will return null.
SUM, AVG, MAX, MIN, STDEV, STDEVP, VAR, and VARP)
are straightforward as they ignore null values. So does
only issue you should ever have is running the function against a column that
has all null values; i.e., one that was just added to the table and not yet
populated. The server will return null.
COUNT(*) counts all rows, including rows with nulls. The college
can use this distinction to analyze its 2000 contributions. Remember that their
table doesn't allow nulls.
select count(*) from contributions -- count of all alumni in the table select count(nullif(contrib_2000,0)) from contributions -- count of alumni who contributed in 2000 select avg(contrib_2000) from contributions -- average contribution from all alumni select avg(nullif(contrib_2000,0)) from contributions -- average from alumni who contributed
DISTINCT, null values are considered equal.
ORDER BY, nulls are considered the lowest possible value; they are at the beginning of an ascending list.
- ANSI compatibility has considerable effects on the results of evaluating nulls. For instance, whether "= null" will work. See below.
I have to refer you to your version's (or versions') BOL for the gory details. Some things to keep in mind are:
- Your server's version. 6.5 and below behave differently than later versions.
- The database's compatibility level (sp_dbcmptlevel).
- The database options (sp_dboption).
- How you connect to the server. ODBC and OLE DB give different default ANSI settings than DB-Library.
- The settings can be changed session-by-session; they're not necessarily uniform or under the control of the dbo or sa.
- Depending on the server version, the behavior of stored procedures can depend upon either what the settings were when the procedure was compiled or what they are when the procedure is run.