Unpack an Integer with SQL Server Recursion

Thursday Apr 11th 2013
Share:

Often overlooked, recursion offers developers an option that leads to clean, efficient solutions. Frank Solomon explains the ways SQL Server 2012 recursion works, and uses them to unpack an integer into its multiple-of-two additive components.

By Frank Solomon

Starting with SQL Server 2005, developers have had recursion available as a T-SQL language feature. This article will describe recursion and its SQL Server implementations, complete with examples. It will also include SQL Server functions and a stored procedure that unpacks, or parses, an integer into its multiple-of-two components.

An algorithm that references, or calls, itself a finite number of times is called recursive. Although a recursive solution can use more memory and server resources compared to a more conventional approach, recursion can offer faster, simpler development and for some problems, recursion is the ideal solution. The factorial algorithm is a classic example of recursion. Mathematically, the factorial of a positive integer n is the product of the positive integers less than or equal to n. For example, this calculates the factorial of seven:

7! = 7 x 6 x 5 x 4 x 3 x 2 x 1 = 5040

We can certainly write a looping algorithm to calculate the factorial, but we can also write a recursive algorithm. In pseudo-code, it would look like this:

function fact(n)
{
   if (n > 1)
      return (n * fact(n - 1)); -- recursive call
   else
      return 1;                  -- base case
}

For a positive integer, n greater than one, the function recursively calls itself with (n - 1). This is key - the recursive step drives the function closer to the base case, to reach that case in a finite number of steps. At the final step, n = 1, the function returns the computed value.

Recursive stored procedure SP_factorial SP_1 calculates the factorial for an integer between 0 and 20. Like all the functions and stored procedures in this article, it starts with USE [master], but it can go in any database. To prevent overflow crashes, it restricts @param to values between 0 and 20. Otherwise, it looks and works a lot like the pseudo-code above. Starting with input parameter @param, the recursive calls drive closer to the base case, to finally finish the stored procedure calculation. When run in the SQL Server 2012 debugger, this behavior becomes clear when called with this code:

DECLARE  @fact bigint
EXEC     SP_factorial 5, @fact OUTPUT
SELECT   @fact

Recursive function FN_factorial FN_1 calculates the factorial for an integer between 0 and 20. This also looks and works like the pseudo-code and the stored procedure above. Starting with input parameter @param, the recursive calls drive closer to the base case, to finally finish the function calculation. This statement:

SELECT dbo.FN_factorial(5)

shows the recursive behavior in the SQL Server 2012 debugger.

The third recursive element available in SQL Server is the recursive Common Table Expression (CTE). A CTE works like a view, except it scopes to the query that declared it. Both functions and stored procedures can have them. A recursive factorial CTE in a function might look something like FN_CTE_factorial FN_2. We can easily modify this function to return a single value for the computed factorial for input parameter @param, but the present structure of the function better illustrates how it works, and this will help explain the parsing stored procedure and functions we'll see shortly. Unlike the recursive stored procedure and function above, the base case must come before the recursive call. Also, unlike the examples above, this CTE "moves away" from the base case, and the WHERE clause in the recursive step tells the CTE when to stop - in this case, when CTE column param_val exceeds input parameter @param. This statement:

SELECT param_val, fact from FN_CTE_factorial (10)

returns this:

param_val  fact
---------- -------
0          1     -->   base case
1          1     -->   recursive calls
2          2        
3          6
4          24
5          120
6          720
7          5040
8          40320
9          362880
10         3628800

result set. The added comments in the stored procedure show how the CTE works. Starting with the base case, each recursive call recalculates param_val by adding one to the most recent param_val value above it, and recalculates fact by multiplying the most recent fact value above it by (param_val + 1), also using the most recent param_val value for this calculation. The recursive calls proceed as long as param_val is less than input parameter value @param. Recursive CTE's can handle between 0 and 32767 recursion levels, with 100 as the default. To change the limit, place the MAXRECURSION hint below the SELECT statement that calls the CTE. Remember that recursion has a potential price - slow performance and high system resource demand. Therefore, it must be used with care. However, it is a useful tool and we'll now use it to solve a real-world problem.

This screenshot PNG_1 shows a Visual Studio listbox with the SelectionMode property set to MultiSimple. This property setting allows multiple row selections. Each row shows its data value in parentheses for clarification, although a production application probably would not do this. Black maps to 1, clear maps to 4, silver maps to 256, etc. Notice that each data value is a multiple of two. We want to use integer multiples of two here because in a positive number, the largest count of component integers will involve multiples of two. Using the listbox example above, we see:

999 = (9 * 10

2

) + (9 * 10

1

) + (9 * 10

0

)   -- Base 10 integers
   
    = 900 + 99 + 9

999 = (1 * 2

9

) + (1 * 2

8

) + (1 * 2

7

) +    -- Integer multiples
      (1 * 2

6

) + (1 * 2

5

) + (1 * 2

2

) +    -- of 2
      (1 * 2

1

) + (1 * 2

0

)

    = 512 + 256 + 128 + 64 + 32 + 4 + 2 + 1

So to get the most information with the most component integers possible, we should use integer multiples of 2. An application using this listbox will gather all the data values together and ultimately pass them as parameters to an embedded query, a database function, or a stored procedure. We can easily build a string with the individual selected values at this point, but we could also add all the selected data values and pass that sum as a single integer value to the database. In this example, a single integer substituting for eight integers would help decrease the workload on the server for a high volume application, and make the application logic simpler. However, the function or stored procedure receiving it must somehow "unpack", or parse, that single integer into its component integers. SQL Server 2012 stored procedure SP_parseparam_output_varchar and SQL Server 2012 functions FN_parseparam_TV_four_cols and FN_convert_TV_to_varchar will handle this. In the function names, "TV" means "Table Variable". Stored procedure SP_parseparam_output_varchar sets a VARCHAR output variable with all the parsed components of a positive integer, and function FN_parseparam_TV_four_cols returns a table variable. Function FN_convert_TV_to_varchar also returns these values as a VARCHAR. We'll first look at stored procedure SP_parseparam_output_varchar.

Stored procedure SP_parseparam_output_varchar SP_2 takes BIGINT parameter @param and returns the component integers in a comma-delimited varchar. Although this stored procedure operates only on integer values, it has a BIGINT datatype parameter to offer flexibility if a call to it passes a BIGINT value. These lines:

DECLARE  @parse_string NVARCHAR(250) = ''
EXEC     SP_parseparam_output_varchar 999, @parse_string OUTPUT
SELECT   @parse_string AS 'parse_string'

Return:

parse_string
--------------------------------
(512, 256, 128, 64, 32, 4, 2, 1)
(1 row(s) affected)

The stored procedure defines @parse_string as an output parameter because each recursive step would overwrite @parse_string if defined as an internal variable. First, it validates @param, allowing positive values up to (231 - 1) and returning NULL for anything else. A stored procedure can handle at most 32 recursion levels; this upper limit guarantees coverage of every possible positive INT value. Remember that 20 = 1, and the stored procedure must account for this component. This is why the upper limit is not (232 - 1). Next, when @param = 0, the stored procedure has reached the base case. If @parse_string is NULL here, it means @param is zero and the stored procedure returns (0). Otherwise, the IF-block cleans the finished @parse_string. The next block handles the recursive step. First, it uses FLOOR(LOG(@param, 2)) to find the base-two LOG value of the largest multiple of two integer component of @param, and uses this value in the POWER function to get the component itself. For example,

DECLARE @param INT
SET @param = 872


SELECT FLOOR(LOG(@param, 2))             --> 9
SELECT POWER(2.0, FLOOR(LOG(@param, 2))) --> 512.0 
SELECT POWER(2.0, FLOOR(LOG(872, 2)))    --> 512.0 
SELECT POWER(2.0, 9)                     --> 512.0

works because 29 = 512. The power function uses 2.0 instead of 2 for the base to prevent overflow issues when the exponent exceeds 30. The stored procedure then appends the new value for @component to @parse_string, and sets @param to @param - @component as the step driving it toward the base case. Then, it calls itself recursively.

SQL Server 2012 function FN_parseparam_TV_four_cols FN_3 uses a CTE to get past the integer parameter datatype restriction of SP_parseparam_output_varchar. This function takes a BIGINT input parameter and returns a four-column table. The largest positive BIGINT number is (263 - 1), but this function can only cover values up to (247 - 1) because the SQL Server LOG function won't work reliably for numbers of this size or larger. Still, this function will parse numbers with as many as 47 components. An example call to the function:

SELECT      param_val, largest_exp_of_two_in_param_val,
            component, new_param_val
FROM        FN_parseparam_TV_four_cols (122)

Returns the result set below.

param_val   largest_exp_of_two_in_param_val component   new_param_val
---------   ------------------------------- ---------   -------------
122         6                                64           58
58          5                                32           26
26          4                                16           10
10          3                                 8            2
2           1                                 2            0

In each row, the CTE calculates the exponent of the largest multiple of two component in param_val, the component itself, and then subtracts that component from param_val to get new_param_val. This value then becomes param_val for the next row.

The function first validates @param, allowing positive values up to (247 - 1) and returning a row of NULLs for anything else. Next, it calculates the first "largest exponent of two" value for @param. Then comes the CTE itself. The first SELECT builds the base case, and the second SELECT is the recursive call, based on the values of the most recent row above it. The recursive calls continue as long as the new_param_val is greater than zero. Finally, the stored procedure SELECTs from the CTE into table @parsed_vals. The SQL Server 2012 function FN_parseparam_TV_component_col FN_4 returns a table with only the component column values, because a production environment would probably need only these values.

Function FN_convert_TV_to_varchar FN_5 returns the component integers of @param in a comma-delimited VARCHAR, for all positive integer values of @param up to (247 - 1). This contrasts with stored procedure SP_parseparam_output_varchar, which works only up to (231 - 1). This function first calls FN_parseparam_TV_component_col, placing the values into table variable @local_TV. Then, using COALESCE in a SELECT from @local_TV, it concatenates VARCHAR variable @component_list with the values in @local_TV and returns @component_list.

Although recursion does have potential performance and resource drawbacks, it can provide simple, flexible solutions and this makes it a viable option to consider.

(SP1 SP_factorial)

USE [master]
 
IF OBJECT_ID('dbo.SP_factorial') IS NOT NULL
BEGIN
    DROP PROCEDURE dbo.SP_factorial
END
GO
 
CREATE PROCEDURE [dbo].[SP_factorial]
    @param INT,
    @fact  BIGINT OUTPUT
AS
BEGIN
    DECLARE @prev_param    INT
 
    IF  (@param > 20) OR
       (@param < 0)                  --  Return NULL if @param
                                     --  exceeds 21 or @param
                                      --  is negative
    BEGIN
       SET @fact = NULL
       RETURN
    END
 
    IF (@param = 0)
    BEGIN                            --  Base case
       SELECT @fact = 1
    END
    ELSE
    BEGIN                            --  Recursive call
 
       SET @prev_param = @param - 1 --  Use @prev_param as the recursive variable
                                     --  because subtracting one from @param will
                                     --  change its initial value, and the first
                                     --  recursive step with that value will never
                                     --  happen
 
       EXEC SP_factorial @prev_param, @fact OUTPUT
       SET @fact = @param * @fact
    END
END
GO

 

FN_1 (FN_factorial)

USE [master]
 
IF OBJECT_ID('FN_factorial') IS NOT NULL
BEGIN
    DROP FUNCTION FN_factorial
END
GO
 
CREATE FUNCTION FN_factorial
( @param BIGINT )
RETURNS BIGINT
 
AS
BEGIN
    IF (@param > 20) OR (@param < 0) --  Return NULL if @param
                                     --  exceeds 21 or @param
                                    --  is negative.
 
    BEGIN
       RETURN NULL
    END
 
    IF (@param > 1)                     --  Recursive call.
    BEGIN
       SET @param = @param * dbo.FN_factorial(@param - 1)
    END
    ELSE
    BEGIN                           --  Base case.
       SET @param = 1
    END
 
    RETURN @param
 
END

 

FN_2 (FN_CTE_factorial)

USE [master]
 
IF OBJECT_ID('dbo.FN_CTE_factorial') IS NOT NULL
BEGIN
    DROP FUNCTION dbo.FN_CTE_factorial
END
GO
 
CREATE FUNCTION [dbo].[FN_CTE_factorial] ( @param BIGINT )
RETURNS @fact_vals TABLE
(
    param_val  BIGINT,
    fact       BIGINT
)
AS
BEGIN
 
    IF (@param > 20) OR (@param < 0) --  Return the value if @param
                                     --  exceeds 21 or @param
                                     --  is negative
 
    BEGIN
       INSERT INTO @fact_vals   (param_val, fact)
       VALUES                (NULL, NULL)
       RETURN
    END
 
    ;WITH CTE_factorial (param_val, fact) AS
    (
       SELECT CAST(0 AS BIGINT) AS 'param_val', CAST(1 AS BIGINT) AS 'fact'
 
       UNION ALL
 
       SELECT (param_val + 1) AS 'param_val',
               (param_val + 1) * fact AS 'fact'
       FROM   CTE_factorial C
       WHERE  param_val < @param
    )
 
    INSERT INTO @fact_vals
    SELECT param_val, fact
    FROM CTE_factorial
    OPTION (MAXRECURSION 100)
 
    RETURN
 
END
 
GO

 

PNG_1 (PNG file 1)

multiselect listbox


SP_2 (SP_parseparam_output_varchar)

 
	USE [master] 
	GO 
	  
	IF OBJECT_ID('dbo.SP_parseparam_output_varchar') IS NOT NULL 
	BEGIN 
	DROP PROCEDURE dbo.SP_parseparam_output_varchar 
	END 
	GO 

CREATE PROCEDURE [dbo].[SP_parseparam_output_varchar]
    @param BIGINT,                        --  BIGINT because @PARAM might be a number "close" to
                                         --  and greater than ((2 ^ 31) - 1). A number like this
                                         --  would exceed the INT datatype limit and crash the SP.
 
    @parse_string NVARCHAR(220) OUTPUT   --  LEN(string for (2 ^ 31 - 1)) = 219.
AS
BEGIN
    DECLARE @component BIGINT
 
    IF  (@param > (POWER(2.0, 31) - 1)) OR   --  Simply return a NULL if the @param exceeds the INT
        (@param < 0)                     --  limit or @param < 0. Use 2.0 instead of 2 for the
                                              --  base to prevent overflow issues when the exponent
                                              --  is 31. 
    BEGIN
        SET @parse_string = NULL
        RETURN
    END
 
    IF (@param = 0)                          --  Base case.
    BEGIN
        IF (LEN(@parse_string) = 0)          --  Handle case where @param = 0.
        BEGIN
            SET @parse_string = '(0)'
        END
        ELSE
        BEGIN
            SET @parse_string = '(' + LEFT(@parse_string, LEN(@parse_string) - 1) + ')'
        END
    END
    ELSE
    BEGIN                                    --  Recursive call.
        SET     @component = POWER(2.0, FLOOR(LOG(@param, 2)))
        SET     @parse_string += CAST(@component AS NVARCHAR(25)) + ', '
        SET     @param -= @component
        EXEC    SP_parseparam_output_varchar @param, @parse_string OUTPUT
    END
 
END

 

FN_3 (FN_parseparam_TV_four_cols)

USE [master]
 
IF OBJECT_ID('dbo.FN_parseparam_TV_four_cols') IS NOT NULL
BEGIN
    DROP FUNCTION dbo.FN_parseparam_TV_four_cols
END
GO
 
CREATE FUNCTION [dbo].[FN_parseparam_TV_four_cols] ( @param BIGINT )
RETURNS @parsed_vals TABLE
(
    param_val                            BIGINT,
    largest_exp_of_two_in_param_val          INT,
    component                            BIGINT,
    new_param_val                            BIGINT
)
AS
BEGIN
 
    DECLARE @running_sum                     BIGINT
    DECLARE @parsed_val                      BIGINT
    DECLARE @largest_exp_of_two_in_param_val     INT
 
    IF  (
            (@param > (POWER(2.0, 47) - 1))  --  Because the LOG function will not work reliably in
                 OR                       --  this SP for all numbers exceeding (POWER(2.0, 47) - 1),
            (@param < 0)                 --  simply return one row with NULL values if the @param
        )                                --  exceeds this value, or if @param < 0. Use 2.0 instead
                                         --  of 2 for the base to prevent overflow issues when the
                                         --  exponent exceeds 31. This means that this SP can handle
                                         --  parameters composed of at most 47 base-two integers,
                                         --  not 63 - the biggest BIGINT integer is ((2 ^ 63) - 1).
    BEGIN
        INSERT INTO @parsed_vals (param_val, largest_exp_of_two_in_param_val, component, new_param_val)
        VALUES                   (NULL, NULL, NULL, NULL)
        RETURN
    END
    ELSE IF (@param = 0)         --  Return one row with zero(es) if @param = 0.
    BEGIN
        INSERT INTO @parsed_vals (param_val, largest_exp_of_two_in_param_val, component, new_param_val)
        VALUES                   (0, 0, 0, 0)
        RETURN
    END
 
    --  Get the first base-two component of @param.
 
    SET @largest_exp_of_two_in_param_val = FLOOR(LOG(@param, 2.0))
 
    ;WITH CTE_parseparam (param_val, largest_exp_of_two_in_param_val, component, new_param_val) AS
    (
        --  In the POWER functions, use 2.0 instead of 2 for the
        --  base to prevent overflow issues when the exponent
        --  exceeds 30.
 
        --  Base case.
 
        SELECT  CAST(@param AS BIGINT)                   AS 'param_val',
                    @largest_exp_of_two_in_param_val AS 'largest_exp_of_two_in_param_val',
                    CAST(POWER(2.0, @largest_exp_of_two_in_param_val) AS BIGINT)
                        AS 'component',
                     CAST((@param - POWER(2.0, @largest_exp_of_two_in_param_val)) AS BIGINT)
                        AS 'new_param_val'
        UNION ALL
 
        --  Recursive call.
 
        SELECT  CAST((C.param_val - C.component) AS BIGINT)       AS 'param_val',
                    CAST(FLOOR(LOG(C.new_param_val, 2)) AS INT)   AS 'largest_exp_of_two_in_param_val',
                    CAST(POWER(2.0, (FLOOR(LOG((C.new_param_val), 2)))) AS BIGINT)    AS 'component',
                     CAST((C.new_param_val) - POWER(2.0, FLOOR(LOG(C.new_param_val, 2))) AS BIGINT)
                         AS 'new_param_val'
        FROM    CTE_parseparam C
        WHERE   new_param_val > 0    --  The recursion stops when
                                 --  new_param_val hits zero.
    )
 
    INSERT INTO @parsed_vals
    SELECT param_val, largest_exp_of_two_in_param_val, component, new_param_val
    FROM CTE_parseparam
 
    RETURN
END
GO

FN_4 (FN_parseparam_TV_component_col)

USE [master]
 
IF OBJECT_ID('dbo.FN_parseparam_TV_component_col') IS NOT NULL
BEGIN
    DROP FUNCTION dbo.FN_parseparam_TV_component_col
END
GO
 
CREATE FUNCTION [dbo].[FN_parseparam_TV_component_col] ( @param BIGINT )
RETURNS @parsed_vals TABLE
(
    component   BIGINT
)
AS
BEGIN
 
    DECLARE @running_sum                     BIGINT
    DECLARE @parsed_val                      BIGINT
    DECLARE @largest_pwr_of_two_in_param_val INT
 
    IF  (
            (@param > (POWER(2.0, 47) - 1))  --  Because the LOG function will not work reliably in
                 OR                           --  this SP for all numbers exceeding (POWER(2.0, 47) - 1),
            (@param < 0)                     --  simply return one row with NULL values if the @param
        )                                    --  exceeds this value, or if @param < 0. Use 2.0 instead
                                             --  of 2 for the base to prevent overflow issues when the
                                             --  exponent exceeds 31. This means that this SP can handle
                                             --  parameters composed of at most 47 base-two integers,
                                             --  not 63 - the biggest BIGINT integer is ((2 ^ 63) - 1).
    BEGIN
        INSERT INTO @parsed_vals (component) VALUES (NULL)
        RETURN
    END
    ELSE IF (@param = 0)                     --  Return zero if @param = 0.
    BEGIN
        INSERT INTO @parsed_vals (component) VALUES (0)
        RETURN
    END
 
    --  Get the first base-two component of @param.
 
    SET @largest_pwr_of_two_in_param_val = FLOOR(LOG(@param, 2.0))
 
    ;WITH CTE_parseparam (param_val, largest_pwr_of_two_in_param_val, component, new_param_val) AS
    (
        --  In the POWER functions, use 2.0 instead of 2 for the
        --  base to prevent overflow issues when the exponent
        --  exceeds 30.
 
        SELECT  CAST(@param AS BIGINT)                       AS 'param_val',
                 @largest_pwr_of_two_in_param_val         AS 'largest_pwr_of_two_in_param_val',
                 CAST(POWER(2.0, @largest_pwr_of_two_in_param_val) AS BIGINT)
                     AS 'component',
                 CAST((@param - POWER(2.0, @largest_pwr_of_two_in_param_val)) AS BIGINT)
                    AS 'new_param_val'
        UNION ALL
        SELECT  CAST((C.param_val - C.component) AS BIGINT)   AS 'param_val',
                 CAST(FLOOR(LOG(C.new_param_val, 2)) AS INT)   AS 'largest_pwr_of_two_in_param_val',
                 CAST(POWER(2.0, (FLOOR(LOG((C.new_param_val), 2)))) AS BIGINT)
                     AS 'component',
                 CAST((C.new_param_val) - POWER(2.0, FLOOR(LOG(C.new_param_val, 2))) AS BIGINT)
                     AS 'new_param_val'
        FROM    CTE_parseparam C
        WHERE   new_param_val > 0    --  The recursion stops when
                                 --  new_param_val hits zero.
    )
 
    INSERT INTO @parsed_vals
    SELECT component
    FROM CTE_parseparam
 
    RETURN
END
GO

FN_5 (FN_convert_TV_to_varchar)

USE [master]
 
IF OBJECT_ID('dbo.FN_convert_TV_to_varchar') IS NOT NULL
BEGIN
   DROP FUNCTION dbo.FN_convert_TV_to_varchar
END
GO
 
-- Use this function for integers between ((2 ^ 31) - 1) and ((2.0, 47) - 1),
-- because stored procedure
--
--    SP_parseparam_output_string_2012
--
-- will handle integers up to ((2 ^ 31) - 1), and function
--
--    FN_parseparam_table_variable_component_col_2012
--
-- will handle integers up to ((2.0 ^ 47) - 1). Note that this function
-- will handle ALL integers from 0 to ((2.0 ^ 47) - 1).
 
CREATE FUNCTION [dbo].[FN_convert_TV_to_varchar] ( @param BIGINT )
RETURNS VARCHAR(500)
 
AS
BEGIN
   DECLARE   @local_TV TABLE ( component BIGINT )
   DECLARE   @component_list varchar(500)
   SET       @component_list = ''
 
   INSERT INTO @local_TV (component)
   SELECT component FROM FN_parseparam_TV_component_col (@param)
 
   -- Concatenate @component_list with the
   -- values in @local_TV returned by the
   -- SELECT statement, row by row.
 
   SELECT    @component_list = COALESCE(@component_list + ', ', '') + 
             CAST(component as varchar(25))
   FROM      @local_TV
   ORDER BY  component
 
   SET @component_list = '(' + SUBSTRING(@component_list, 3, 500) + ')'
 
   RETURN @component_list
 
END

Frank Solomon handles all aspects of SQL Server 2005 / 2008 / 2012 development, including stored procedures, functions, jobs, and general database design. He handles VB.net, C#, and ASP.net development as well. Additionally, he handles technical writing and he has an iApp on sale in the iTunes store (BSD by Panda Cub Productions). He is looking for a SQL Server / front-end development opportunity in the Los Angeles area. See more at www.linkedin.com/in/franksolomonand reach him at fbs.author@gmail.com

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved