DB/Query

CONNECT BY to DB2

시처럼 음악처럼 2014. 6. 17. 18:52
  • developerWorks
  • Technical topics
  • Information Management
  • Technical library
  • Port CONNECT BY to DB2

    A recipe to map recursive queries from Oracle to DB2 using recursive common table expressions

    Using examples, this article describes step-by-step how to map CONNECT BY syntax and its associated pseudo columns to common table expressions that work in DB2® Universal Database™ (DB2 UDB) for Linux®, UNIX®, and Windows®.

    Serge Rielau (srielau@ca.ibm.com), Senior Software Developer, IBM

    Serge RielauSerge Rielau is part of the DB2 Solutions Development team, where he closely works with customers and business partners to port or migrate their applications from competitive RDBMS to DB2 for Linux. UNIX, and Windows. Prior to this role, he worked for 7 years as a team lead and technical manager in the DB2 SQL Compiler Development team. As an expert in the SQL language, Serge is an active participant in the comp.databases.ibm-db2 newsgroup.



    13 October 2005

     

    Motivation

    IBM DB2 e-kit for Database Professionals

    Learn how easy it is to get trained and certified for DB2 for Linux, UNIX, and Windows with the IBM DB2 e-kit for Database Professionals. Register now, and expand your skills portfolio, or extend your DBMS vendor support to include DB2.

    When porting or migrating an application from Oracle to DB2 UDB for Linux, UNIX, and Windows, recursive queries present a significant obstacle. A developer who understands and uses Oracle-style recursion often does not naturally understand DB2 recursion and the same is true the other way around.

    On first blush, the reason appears to be that DB2 implements recursion as it is defined in the SQL standard using common table expressions (CTE) and UNION ALL. Oracle, on the other hand, uses a CONNECT BY clause and a set of so-called pseudo columns and system procedures to define recursion. The difference, however, goes beyond syntax. The underlying problem is that DB2 processes recursion by essentially producing one level of recursion at a time (breadth first), while Oracle uses depth first. The later approach naturally produces an output that matches an org chart.

    In this article, I use a running example that matches CONNECT BY and the related pseudo columns to DB2 recursion, one step at a time.


    Installation

    To perform some of the queries, you must download and expand the CONNECT_BY.zip file available in the Download section at the end of this article.

    If you are working on a Windows 32-bit, a Linux on Intel 32-bit, or an AIX® 64-bit system, all you need to do is copy the connect_by (connect_by.dll on Windows) executable found in the appropriate subdirectory of the CONNECT_BY.zip file into your sqllib/function directory.

    If you are working on any other operating system or hardware platform with an installed C compiler, you can copy the files from the Connect_By/FunctionSource directory into sqllib/samples/c. After changing to that directory, simply run bldrtn connect_by. This command will compile the provided C-source code. bldrtn will also copy the produced executable into sqllib/function and you are ready to go.

    For convenience, running the following files in CONNECT_BY.zip when connected to a test database will reproduce all the DB2 SQL in this article:

    1. db2 -tvf connect_by_functions.sql
    2. db2 -tvf connect_by_sample.sql

    Setting up the example

    To start, we will define and fill an employee table. The table has an employee ID, which is the primary key, a manager ID, which links the employee to his manager, and name and salary columns:

     1 CREATE TABLE emp(empid  INTEGER NOT NULL PRIMARY KEY,
     2                    name   VARCHAR(10),
     3                    salary DECIMAL(9, 2),
     4                    mgrid  INTEGER);
    
     5 INSERT INTO emp
     6       VALUES ( 1, 'Jones',    30000, 10),
     7               ( 2, 'Hall',     35000, 10),
     8               ( 3, 'Kim',      40000, 10),
     9               ( 4, 'Lindsay',  38000, 10),
    10               ( 5, 'McKeough', 42000, 11),
    11               ( 6, 'Barnes',   41000, 11),
    12               ( 7, 'O''Neil',  36000, 12),
    13               ( 8, 'Smith',    34000, 12),
    14               ( 9, 'Shoeman',  33000, 12),
    15               (10, 'Monroe',   50000, 15),
    16               (11, 'Zander',   52000, 16),
    17               (12, 'Henry',    51000, 16),
    18               (13, 'Aaron',    54000, 15),
    19               (14, 'Scott',    53000, 16),
    20               (15, 'Mills',    70000, 17),
    21               (16, 'Goyal',    80000, 17),
    22               (17, 'Urbassek', 95000, NULL);

    Looking at the inserted rows, it is apparent that employee 'Urbassek' is the top manager for whom 'Goyal' and 'Mills' work. 'Scott' in turn works for 'Goyal' and has no employees of his own. 'Monroe,' on the other hand, manages 'Hall,' 'Kim,' and 'Lindsay' and is an employee of 'Mills.'

    A reasonable question to ask is: "Who works directly or indirectly for 'Goyal'?"


    A simple recursive query

    To answer the question of who works for 'Goyal,' an Oracle developer might write the following query:

    1 SELECT name 
    2   FROM emp
    3   START WITH name = 'Goyal'
    4   CONNECT BY PRIOR empid = mgrid

    START WITH denotes the seed of the recursion while CONNECT BY describes the recursive step. That is how to get from step n to step (n + 1). Since it is important to distinguigh between the nth and the (n + 1)th step during name resolution, PRIOR is used to show that empid belongs to the nth step while mgrid belongs to step (n + 1)th. So with empid being 16 for step 1, mgrid must be 16 as well and hence step 2 produces 'Scott,' 'Henry,' and 'Zander.' Their empids will now serve as PRIOR to step 3, and so on and so forth.

    The Oracle syntax is very concise. The SQL standard and DB2 syntax does not use any recursion-specific keywords and uses regular SQL to describe the exact same relationships. As you will see, it is more verbose, but equally straightforward:

    1 WITH n(empid, name) AS 
    2           (SELECT empid, name 
    3              FROM emp
    4              WHERE name = 'Goyal'
    5            UNION ALL
    6            SELECT nplus1.empid, nplus1.name 
    7              FROM emp as nplus1, n
    8              WHERE n.empid = nplus1.mgrid)
    9 SELECT name FROM n;

    The WITH clause allows for the definition of a named query within a statement, which can be referred to at a later point in the same statement. Its technical term is a common table expression (CTE). CTEs are by now supported in most major SQL-based DBMS, including Oracle 9i, SQL Server 2005, and, of course, in DB2.

    What makes this CTE special is that it is referred to within its very own defintion. This is what distinguishes a regular CTE from a recursive CTE. Here I named the CTE n to correlate to the recursive steps. A recursive CTE consists of two parts combined with a UNION ALL:

    1. The seed or step 1 of the recursion. This is what is described in Oracle using START WITH. In a recursive CTE, it is simply any query providing a set of rows. In this case we query the emp table and filter for 'Goyal.' We select the name of course and also the empid, because we need it for the recursive step.
    2. The recursive step going from n to (n + 1). Here we refer to step n (the CTE n) and join in step (n + 1) using the same predicate used in CONNECT BY. Instead of PRIOR, regular correlation names are used to distinguish n from (n + 1).

    It is noteworthy to look at the output of this query:

    NAME
    ----------
    SQL0347W  The recursive common table expression "SRIELAU.N" 
    may contain an infinite loop.  SQLSTATE=01605
    
    Goyal
    Zander
    Henry
    Scott
    McKeough
    Barnes
    O'Neil
    Smith
    Shoeman
    
      9 record(s) selected with 1 warning messages printed.

    DB2 does not perform any automatic cycle checking. That is why it will return this warning when it determines that cycles are possible. Those DB2 developers who care usually add a level column that increments and is limited by a fixed value. DB2 recognises this pattern and will suppress the warning. For the purpose of mapping CONNECT BY, we will ignore the warning from now on since we will add a cycle check later on.

    Further, it is interesting to note that the order in which DB2 has returned the results exposes how the recursion is processed one level at a time, instead of using a tree-walk.

    Before moving on to pseudo columns and more complex examples, I want to briefly explain where the WHERE predicate of an Oracle recursion needs to be placed. We will modify the example to return all employees, including their salaries, working for 'Goyal' who earn more than 40,000.

    1 SELECT name, salary
    2   FROM emp
    3   WHERE salary > 40000
    4   START WITH name = 'Goyal'
    5   CONNECT BY PRIOR empid = mgrid

    It is interesting to note how the WHERE clause preceded the recursive specification. For a DB2 developer, it might seem that the predicate belongs to the set of rows being considered to begin with. This, however, is not correct. Instead, the WHERE filters the final result and belongs at the end of the matching DB2 query:

    1 WITH n(empid, name, salary) AS 
    2           (SELECT empid, name, salary 
    3              FROM emp
    4              WHERE name = 'Goyal'
    5            UNION ALL
    6            SELECT nplus1.empid, nplus1.name, nplus1.salary 
    7              FROM emp as nplus1, n
    8              WHERE n.empid = nplus1.mgrid)
    9 SELECT name FROM n WHERE salary > 40000;
    
    NAME      SALARY
    ---------- -----------
    Goyal         80000.00
    Zander        52000.00
    Henry         51000.00
    Scott         53000.00
    McKeough      42000.00
    Barnes        41000.00
    
      6 record(s) selected

    Be aware that what is true for the WHERE clause is not true for the ON clause. I will get back to this issue in a later example. First we will dive into pseudo columns.


    LEVEL pseudo column

    The most trivial of the pseudo columns is LEVEL. The purpose of this column is to show the number of the recursive step n that produced the row. In our running example, it indicates the levels of management between 'Goyal' and the employee plus 1 (because LEVEL starts with 1). Here is the original Oracle example enhanced with LEVEL:

    1 SELECT LEVEL, name 
    2   FROM emp
    3   START WITH name = 'Goyal'
    4   CONNECT BY PRIOR empid = mgrid

    The SQL standard saw no need to add syntax for this feature because it can be expressed using regular SQL:

    1 WITH n(level, empid, name) AS 
    2           (SELECT 1, empid, name 
    3              FROM emp
    4              WHERE name = 'Goyal'
    5            UNION ALL
    6            SELECT n.level + 1, nplus1.empid, nplus1.name 
    7              FROM emp as nplus1, n
    8              WHERE n.empid = nplus1.mgrid)
    9 SELECT level, name FROM n;
    
    LEVEL      NAME
    ----------- ----------
              1 Goyal
              2 Zander
              2 Henry
              2 Scott
              3 McKeough
              3 Barnes
              3 O'Neil
              3 Smith
              3 Shoeman
    
      9 record(s) selected

    All I have done here is to introduce a level column, which starts with 1 and increments by 1. Of course, any semantics is possible, but this one happens to provide the same semantics as LEVEL. It is up to the developer to choose which information to pass along and how as the next example will show.


    The CONNECT_BY_ROOT expression

    CONNECT_BY_ROOT operates on a column and returns the value of the root ancestor of the row. To show what it does and how it gets translated, I change the example to use 'Goyal's' direct reports as root:

    1 SELECT CONNECT_BY_ROOT  name AS root, name
    2   FROM emp
    3   START WITH name IN ('Zander', 'Henry', 'Scott')
    4   CONNECT BY PRIOR empid = mgrid

    To map to DB2, all that is required is to pass along the root unmodified.

     1 WITH n(empid, root, name) AS 
     2           (SELECT empid, name, name 
     3              FROM emp
     4              WHERE name IN ('Zander', 'Henry', 'Scott')
     5            UNION ALL
     6            SELECT nplus1.empid, n.root,
     7                    nplus1.name 
     8              FROM emp as nplus1, n
     9              WHERE n.empid = nplus1.mgrid)
    10 SELECT root, name FROM n;
    
    ROOT      NAME
    ---------- ----------
    Zander     Zander
    Henry      Henry
    Scott      Scott
    Zander     McKeough
    Zander     Barnes
    Henry      O'Neil
    Henry      Smith
    Henry      Shoeman
    
      8 record(s) selected

    Next, I will show how to present the whole management chain and not just the root.


    The SYS_CONNECT_BY_PATH() procedure

    A common question when running queries is: "How does this element relate to the beginning of the recursion?" or, in other words, "What are the ancestors of this row?" In Oracle, SYS_CONNECT_BY_PATH() can be used to concatenate values from each recursive step and thus build an ancestor path. Find below the by now well-known example showing for each employee of 'Goyal' the "reports to" chain:

    1 SELECT SYS_CONNECT_BY_PATH(name, ':') AS chain
    2   FROM emp
    3   START WITH name = 'Goyal'
    4   CONNECT BY PRIOR empid = mgrid

    In this example, each value is separated by a ':'. To answer this question in DB2, the same technique is used as for LEVEL. The only difference is that concatenation is used instead of addition.

     1 WITH n(empid, chain) AS 
     2           (SELECT empid, CAST(name AS VARCHAR(30)) 
     3              FROM emp
     4              WHERE name = 'Goyal'
     5            UNION ALL
     6            SELECT nplus1.empid, 
     7                    n.chain || ':' || nplus1.name 
     8              FROM emp as nplus1, n
     9              WHERE n.empid = nplus1.mgrid)
    10 SELECT chain FROM n;
    
    CHAIN
    ------------------------------
    Goyal
    Goyal:Zander
    Goyal:Henry
    Goyal:Scott
    Goyal:Zander:McKeough
    Goyal:Zander:Barnes
    Goyal:Henry:O'Neil
    Goyal:Henry:Smith
    Goyal:Henry:Shoeman
    
      9 record(s) selected

    There is complete freedom on how to build the chain. For instance, the concatenation could be reversed to put 'Goyal' at the end of the chain. Also note the CAST that has been performed. It is needed for string data types so DB2 knows how long the chain can grow. The rules for recursive CTE specify that the length of the data types shall be defined by the seed, or the first half of the UNION ALL query.

    So far mapping from CONNECT BY to UNION ALL has been a breeze. But now that you are comfortable with recursive CTE, it is time to crank up the volume and attack the harder problems. One such problem is ordering. If all that is required is to order the result in the same way as the elements in the path that is selected back anyway, then an ORDER BY on the path in the final SELECT will suffice:

     1 WITH n(empid, chain) AS 
     2           (SELECT empid, CAST(name AS VARCHAR(30)) 
     3              FROM emp
     4              WHERE name = 'Goyal'
     5            UNION ALL
     6            SELECT nplus1.empid, 
     7                    n.chain || ':' || nplus1.name 
     8              FROM emp as nplus1, n
     9              WHERE n.empid = nplus1.mgrid)
    10 SELECT chain FROM n
    11 ORDER BY chain;
    
    CHAIN
    ------------------------------
    Goyal
    Goyal:Henry
    Goyal:Henry:O'Neil
    Goyal:Henry:Shoeman
    Goyal:Henry:Smith
    Goyal:Scott
    Goyal:Zander
    Goyal:Zander:Barnes
    Goyal:Zander:McKeough
    
      9 record(s) selected

    This is one form of the common org-chart. But what if the required order is on some other column and it is not a string?


    ORDER SIBLINGS BY expression

    In Oracle ORDER SIBLINGS BY defines the order in which siblings under the same parent shall be returned. To order all employees recursively by their salary using CONNECT BY, the query is formulated like this:

    1 SELECT name, level, salary
    2   FROM emp
    3   START WITH name = 'Goyal'
    4   CONNECT BY PRIOR empid = mgrid
    5   ORDER SIBLINGS BY salary

    The SQL Standard has specific syntax for this requirement. However, DB2 UDB for LUW has not (yet) implemented this part of the SQL standard. To map this feature in DB2, it is time to get inventive. On a case-by-case basis, it is possible to build a path again and then sort the string, but this can get a bit ugly since numeric values need to be converted to strings, which increases the length of the path. Also subsequent mappings will show that it is beneficial to provide a generic solution.

    At this point, I therefore introduce a user defined function (UDF). A UDF is an extension to DB2's capabilities. In this case the UDF (and those following below) is implemented in C and the executable is available to DB2, if you have followed the Installation chapter above.

     1 CREATE FUNCTION CONNECT_BY_POS
     2  (POS       VARCHAR(256) FOR BIT DATA,
     3   N         INTEGER,
     4   POSPLUS1  BIGINT)
     5  RETURNS VARCHAR(256) FOR BIT DATA 
     6  SPECIFIC CONNECT_BY_POS 
     7  EXTERNAL NAME 'connect_by!connect_by_pos'
     8  NOT FENCED RETURNS NULL ON NULL INPUT
     9  DETERMINISTIC NO SQL NO EXTERNAL ACTION
    10  LANGUAGE C PARAMETER STYLE SQL ALLOW PARALLEL;

    This function has two purposes:

    1. The function will take in the BIGINT value of the global position of the row posplus1 and translate this, given the pattern pos of the parent into the pattern for the next row. The level of the parent n is used to speed this process up. In principle the function does nothing other than building a binary-encoded path. The provided implementation uses integers internally. So, the arbitrarily chosen length of 256 for the binary string allows for 64 nesting levels, which can be trivially expanded.
    2. As an additional feature, the function also tests whether the global position of the new row already appears in the parent path. If that is the case, a runtime error is raised. Oracle does not allow cycles unless explicitly specified. So this feature makes the DB2 query compatible.

    The usage is best described by showing the example matching the CONNECT BY version above:

     1 WITH source(empid, name, salary, mgrid, rownum) AS
     2           (SELECT empid, name, salary, mgrid,
     3                    ROW_NUMBER() OVER(ORDER BY salary)
     4              FROM emp),
     5       n(empid, name, salary, level, pos) AS 
     6           (SELECT empid, name, salary, 1,
     7                    CONNECT_BY_POS('', 0, rownum)
     8              FROM source
     9              WHERE name = 'Goyal'
    10            UNION ALL
    11            SELECT nplus1.empid, nplus1.name,
    12                    nplus1.salary, level + 1,
    13                    CONNECT_BY_POS(n.pos, level, rownum) 
    14              FROM source AS nplus1, n
    15              WHERE n.empid = nplus1.mgrid)
    16 SELECT name, level, salary 
    17   FROM n
    18 ORDER BY pos;
    
    NAME       LEVEL       SALARY
    ---------- ----------- -----------
    Goyal                1    80000.00
    Henry                2    51000.00
    Shoeman              3    33000.00
    Smith                3    34000.00
    O'Neil               3    36000.00
    Zander               2    52000.00
    Barnes               3    41000.00
    McKeough             3    42000.00
    Scott                2    53000.00
    
      9 record(s) selected

    What just happened? Adding another CTE source allows us to issue a global order for all the rows according to the one requested by ORDER SIBLINGS BY. Once this is done, the recursive CTE n refers to source to pick up the rownum. The rest of the query has already been explained. In essence, CONNECT_BY_POS() produces a PATH as a binary string not unlike what SYS_CONNECT_BY_PATH() does as a regular string. In the end, the query gets ordered by the path that provides the position within the hierarchy pos.

    At this point, I want to come back to the possibility of an ON-clause in the CONNECT BY query. Any joins, or even more complex queries, can easily be expressed in the same way by simply pushing the composition of the source for the recursion into a CTE.


    The NOCYCLE keyword

    Now that I have introduced the capability to record the path of a recursion and to detect recursion at runtime, it is possible to map the NOCYCLE keyword of Oracle's CONNECT BY query syntax. In short, NOCYCLE prevents the recursion from entering a cycle. Any candidate row for step (n + 1) that already exists in the ancestor path is simply ignored. In Oracle, the usage looks like this:

    1 SELECT name, level, salary
    2   FROM emp
    3   START WITH name = 'Goyal'
    4   CONNECT BY NOCYCLE PRIOR empid = mgrid

    Given the existence of the ancestor path, mapping the semantics is pretty simple. A second UDF will take care of it. The UDF will return 0 if a cycle is detected. If none is detected, the value 1 is returned. With the registration of the UDF I will also introduce a cycle into the management chain. 'Urbassek' shall now report to 'O'Neil.'

     1 CREATE FUNCTION CONNECT_BY_NOCYCLE
     2  (N       VARCHAR(256) FOR BIT DATA, 
     3   LEVEL   INTEGER, 
     4   NPLUS1  BIGINT)
     5 RETURNS SMALLINT 
     6 SPECIFIC CONNECT_BY_NOCYCLE 
     7   EXTERNAL NAME 'connect_by!connect_by_nocycle'
     8   NOT FENCED RETURNS NULL ON NULL INPUT
     9   DETERMINISTIC NO SQL NO EXTERNAL ACTION
    10   LANGUAGE C PARAMETER STYLE SQL ALLOW PARALLEL;
    
    11 UPDATE emp SET mgrid = 7 WHERE name = 'Urbassek';
    
    12 WITH source(empid, name, salary, mgrid, rownum) AS
    13           (SELECT empid, name, salary, mgrid,
    14                    ROW_NUMBER() OVER(ORDER BY salary)
    15              FROM emp),
    16       n(empid, name, salary, level, pos) AS 
    17           (SELECT empid, name, salary, 1,
    18                    CONNECT_BY_POS('', 0, rownum)
    19              FROM source
    20              WHERE name = 'Goyal'
    21            UNION ALL
    22            SELECT nplus1.empid, nplus1.name,
    23                    nplus1.salary, level + 1,
    24                    CONNECT_BY_POS(n.pos, level, rownum) 
    25              FROM source AS nplus1, n
    26              WHERE n.empid = nplus1.mgrid
    27               AND CONNECT_BY_NOCYCLE(n.pos, level, rownum) = 0
    27 SELECT name, level, salary 
    28   FROM n
    29 ORDER BY pos;
    
    NAME      LEVEL      SALARY
    ---------- ----------- -----------
    Goyal                1    80000.00
    Henry                2    51000.00
    Shoeman              3    33000.00
    Smith                3    34000.00
    O'Neil               3    36000.00
    Urbassek             4    95000.00
    Mills                5    70000.00
    Monroe               6    50000.00
    Jones                7    30000.00
    Hall                 7    35000.00
    Lindsay              7    38000.00
    Kim                  7    40000.00
    Aaron                6    54000.00
    Zander               2    52000.00
    Barnes               3    41000.00
    McKeough             3    42000.00
    Scott                2    53000.00
    
      17 record(s) selected

    As you can see, 'Urbassek' has been added to the output, including all of 'Mills's' employees. 'Goyal' has not been added again, however. This is pretty good, but how do you find out where the cycle is?


    CONNECT_BY_ISCYCLE

    For the purpose of "coloring" a cycle in the recursion, Oracle provides yet another pseudo column, CONNECT_BY_ISCYCLE. This column returns 1 if the row is part of a cycle and 0 if it is not.

    1 SELECT name, level, 
    2         CONNECT_BY_ISCYCLE AS cycle 
    3   FROM emp
    4   START WITH name = 'Goyal'
    5   CONNECT BY NOCYCLE PRIOR empid = mgrid

    To map this feature to DB2, I will once more introduce a UDF. CONNECT_BY_ISCYCLE() detects a cycle, just like CONNECT_BY_NOCYCLE() did. But when CONNECT_BY_ISCYCLE() detects a cycle, it also compares the given path to that of another path. If the two paths overlap within the cycle, a 1 is returned; otherwise a 0 is returned. Here is the declaration of the UDF:

     1 CREATE FUNCTION CONNECT_BY_ISCYCLE
     2  (POS           VARCHAR(256) FOR BIT DATA,
     3   LEVELPOS      INTEGER,
     4   NPLUS1        BIGINT,
     5   POSANCESTOR   VARCHAR(256) FOR BIT DATA,
     6   LEVELANCESTOR INTEGER)
     7 RETURNS SMALLINT 
     8 SPECIFIC CONNECT_BY_ISCYCLE 
     9   EXTERNAL NAME 'connect_by!connect_by_iscycle'
    10   NOT FENCED RETURNS NULL ON NULL INPUT
    11   DETERMINISTIC NO SQL NO EXTERNAL ACTION
    12   LANGUAGE C PARAMETER STYLE SQL ALLOW PARALLEL;

    To achieve the goal, the new query has to:

    1. Return all rows from the recursion
    2. Join in child rows of each row for comparions
    3. Drive one recursive step for each child row to detect whether that step leads to a cycle

    In the end, what is needed is a three-way join: one self join on the output of the recursion, and a reference to the numbered source, similar to what was done in the original recursion. There is one advantage, though: It is sufficient to find any child row that introduces recursion. There is no need to find them all, and indeed, it would be harmful, because we want to return each row only once. Note that with a small change, one could count in how many cycles a given row participates. I'll leave that riddle for the interested reader to figure out on his own.

     1 WITH source(empid, name, salary, mgrid, rownum) AS
     2           (SELECT empid, name, salary, mgrid,
     3                    ROW_NUMBER() OVER(ORDER BY salary)
     4              FROM emp),
     5       n(empid, name, mgrid, level, pos) AS 
     6           (SELECT empid, name, mgrid, 1,
     7                    CONNECT_BY_POS('', 0, rownum)
     8              FROM source
     9              WHERE name = 'Goyal'
    10            UNION ALL
    11            SELECT nplus1.empid, 
    12                    nplus1.name,
    13                    nplus1.mgrid,
    14                    level + 1,
    15                    CONNECT_BY_POS(n.pos, level, rownum) 
    16              FROM source AS nplus1, n
    17              WHERE n.empid = nplus1.mgrid
    18                AND CONNECT_BY_NOCYCLE(n.pos, level, rownum) = 0),
    19        cycles(name, level, pos, iscycle) AS
    20            (SELECT name, level, pos,
    21                     COALESCE((SELECT 1 AS iscycle
    22                                  FROM n
    23                                  LEFT OUTER JOIN source
    24                                    ON n.mgrid = source.empid
    25                                  WHERE CONNECT_BY_ISCYCLE(n.pos, 
    26                                                              n.level, 
    27                                                              source.rownum,
    28                                                              ancestor.pos, 
    29                                                              ancestor.level) = 1
    30                                FETCH FIRST ROW ONLY), 0) AS iscycle
    31               FROM n AS ancestor) 
    32 SELECT name, level, iscycle
    33   FROM cycles
    34 ORDER BY pos;
    
    NAME      LEVEL      ISCYCLE
    ---------- ----------- -----------
    Goyal                1           1
    Henry                2           1
    Shoeman              3           0
    Smith                3           0
    O'Neil               3           1
    Urbassek             4           1
    Mills                5           1
    Monroe               6           0
    Jones                7           0
    Hall                 7           0
    Lindsay              7           0
    Kim                  7           0
    Aaron                6           0
    Zander               2           0
    Barnes               3           0
    McKeough             3           0
    Scott                2           0
    
      17 record(s) selected

    OK: This one was a piece of work, but at least now you know why you read that far. As in any good story, it is now time to add the Hollywood ending, where everything is nice and easy.


    CONNECT_BY_ISLEAF

    Compared to CONNECT_BY_ISCYCLE, CONNECT_BY_ISLEAF is indeed lighter fare. All this pseudo column does is to return 1 if a given row is a leaf in recursion. In other words: The row did not produce any further rows. In this example, a leaf is an employee who is not a manager.

    1 SELECT name, level, 
    2         CONNECT_BY_ISLEAF AS isleaf 
    3   FROM emp
    4   START WITH name = 'Goyal'
    5   CONNECT BY PRIOR empid = mgrid

    Indeed, all that is required here is to take out most of the previous additions. For the purpose of readability, we start fresh and 'Urbassek' becomes her own boss again. Also note that I took out the NOCYCLE option above to concentrate on the matter at hand. After all, there is no need to dwell on the past, especially when it was successfully mastered. By now the query practically writes itself.

     1 UPDATE emp SET mgrid = NULL WHERE name = 'Urbassek';
    
     2 WITH n(empid, name) AS 
     3           (SELECT empid, name 
     4              FROM emp
     5              WHERE name = 'Goyal'
     6            UNION ALL
     7            SELECT nplus1.empid, nplus1.name 
     8              FROM emp as nplus1, n
     9              WHERE n.empid = nplus1.mgrid)
    10 SELECT name,
    11         COALESCE((SELECT 0 FROM emp 
    12                     WHERE n.empid = emp.mgrid
    13                     FETCH FIRST ROW ONLY), 1) AS isleaf 
    11   FROM n;
    
    NAME     ISLEAF
    ---------- -----------
    Goyal                0
    Zander               0
    Henry                0
    Scott                1
    McKeough             1
    Barnes               1
    O'Neil               1
    Smith                1
    Shoeman              1
    
      9 record(s) selected

    This was indeed quite painless. Unless Oracle has introduced yet another pseudo column or keywords that I'm not aware of, all of Oracle's recursive features have now been mapped to DB2 UDB for Linux, UNIX, and Windows.


    Conclusion

    In this article I provided generic mappings from the Oracle style CONNECT BY recursive query syntax to DB2's standard compliant recursive common table expressions using UNION ALL. All the individual mappings work well together and I have shown that DB2's recursion feature is as least as expressive.

    While the Oracle syntax is less verbose because it provides keywords for various common semantics, this also means that it is less expressive since little new semantics can be added without changes to the DBMS.

    As an ISV or end-customer trying to port an application from Oracle to DB2, you probably do not care too much about that at this point. All you want and need to know now is how to translate from Oracle to DB2. Showing this has been the main purpose of this article. Once you get comfortable with the SQL standard syntax, I'm confident you will find value in DB2's expressiveness, even if it means typing a few more lines.


    Acknowledgements

    Thanks to the countless developers, both the IBMers and customers, who share their SQL challenges with me. I learn more from you than you from me.


    Download

    Description Name Size
    Sample code CONNECT_BY.zip  ( HTTP | FTP ) 22 KB