Table Functions and Pipelining
|
The following Tip is from the outstanding book "Oracle PL/SQL Tuning: Expert Secrets for High Performance Programming" by Dr. Tim Hall, Oracle ACE of the year, 2006:
Functions that return collections of rows are known as table functions. They can be used like database tables in the FROM clause of a query or in the SELECT list of a query as a column name.
A regular table function creates an entire collection before returning it to the requesting query. But the performance of table functions can be improved by the implementation of pipelining and parallelization, giving the following benefits:
Pipelining allows rows to be passed out of table functions as they are produced, rather than waiting for whole collections to be produced before the results are returned. The outcome is a reduction in the time taken for the first rows to be produced and a reduction in the total amount of memory consumed by the table function.
Parallel enabling table functions allows their workload to be split between multiple slave processes, which may result in faster execution.
Like regular functions, table functions can accept input parameters including collection types and REF CURSORS. Accepting these parameters allows them to be chained together to perform complex transformation pipelines, or streams. These transformation pipelines can be used as a replacement for traditional Extraction Transformation Load (ETL) processes, removing the need for intermediate staging areas.
The next section shows how table functions are created and demonstrates the performance improvements associated with pipelined table functions.
Pipelining Table Functions
The earlier section revealed that pipelining table functions results in a reduction in the time it takes to visualize the first rows of the collection and a reduction in overall memory usage. The obvious next step is to define some table functions and prove these statements are true.
The create_square_root_schema_objects.sql script defines two database types that represent the row and table types used by the table functions. Table functions require these types to be created as database objects, while pipelined table functions can use PL/SQL types defined in a package specification, provided that Oracle 9.2 or later is being used. In order to make the comparison as close as possible, the same database objects for both types of table function will be used.
create_square_root_schema_objects.sql
CREATE OR REPLACE TYPE t_square_root_row AS OBJECT (
start_number NUMBER,
square_root NUMBER,
description VARCHAR2(50)
);
/
CREATE OR REPLACE TYPE t_square_root_tab AS TABLE OF t_square_root_row;
/
With the database types in place, the next step is to define some table functions. The create_square_root_functions.sql script defines a package with two table functions that return the square roots of a specified range of numbers. One of the table functions is pipelined, the other is not.
create_square_root_functions.sql
CREATE OR REPLACE PACKAGE tf_api AS
FUNCTION get_square_roots_tf (p_start_range IN NUMBER,
p_end_range IN NUMBER,
p_pause IN VARCHAR2 DEFAULT
'TRUE')
RETURN t_square_root_tab;
FUNCTION get_square_roots_ptf (p_start_range IN NUMBER,
p_end_range IN NUMBER,
p_pause IN VARCHAR2 DEFAULT 'TRUE')
RETURN t_square_root_tab PIPELINED;
END tf_api;
/
SHOW ERRORS
CREATE OR REPLACE PACKAGE BODY tf_api AS
FUNCTION get_square_roots_tf (p_start_range IN NUMBER,
p_end_range IN NUMBER,
p_pause IN VARCHAR2 DEFAULT
'TRUE')
RETURN t_square_root_tab
AS
l_row t_square_root_row := t_square_root_row(NULL, NULL, NULL);
l_tab t_square_root_tab := t_square_root_tab();
BEGIN
FOR i IN p_start_range .. p_end_range LOOP
-- Perform a conditional delay.
IF p_pause = 'TRUE' AND MOD(i, 10) = 0 THEN
DBMS_LOCK.sleep(1);
END IF;
-- Build up a new row.
l_row.start_number := i;
l_row.square_root := ROUND(SQRT(i), 2);
l_row.description := 'The square root of ' || i || ' is ' || l_row.square_root;
-- Extend the collection and add the row.
l_tab.extend;
l_tab(l_tab.last) := l_row;
END LOOP;
-- Return the collection.
RETURN l_tab;
END get_square_roots_tf;
FUNCTION get_square_roots_ptf (p_start_range IN NUMBER,
p_end_range IN NUMBER,
p_pause IN VARCHAR2 DEFAULT 'TRUE')
RETURN t_square_root_tab PIPELINED
AS
l_row t_square_root_row := t_square_root_row(NULL, NULL, NULL);
BEGIN
FOR i IN p_start_range .. p_end_range LOOP
-- Perform a conditional delay.
IF p_pause = 'TRUE' AND MOD(i, 10) = 0 THEN
DBMS_LOCK.sleep(1);
END IF;
-- Build up a new row.
l_row.start_number := i;
l_row.square_root := ROUND(SQRT(i), 2);
l_row.description := 'The square root of ' || i || ' is ' || l_row.square_root;
-- Pipe the row out.
PIPE ROW (l_row);
END LOOP;
-- Perform return.
RETURN;
END get_square_roots_ptf;
END tf_api;
/
SHOW ERRORS
The get_square_roots_tf function is a regular table function because it creates the entire collection before returning it. In contrast, the get_square_roots_ptf function pushes out each row as it is created using the PIPE ROW command and ends with an empty return statement. Notice that both functions contain an optional pause every 10 rows to make the query artificially slow.
Once the table functions have been created, the first test can then be run using the query_square_root_functions.sql script shown below. This script uses the TABLE function to make the output from the table functions resemble a real table.
query_square_root_functions.sql
-- Query the regular table function.
SELECT *
FROM TABLE(tf_api.get_square_roots_tf(1, 100)) a;
-- Query the pipelined table function.
SELECT *
FROM TABLE(tf_api.get_square_roots_ptf(1, 100)) a;
Both queries in the script return output similar to that displayed below. But how the output is returned is the focal point, not the output itself.
START_NUMBER SQUARE_ROOT DESCRIPTION
------------ ----------- ------------------------------
1 1 The square root of 1 is 1
2 1.41 The square root of 2 is 1.41
3 1.73 The square root of 3 is 1.73
.
.
98 9.9 The square root of 98 is 9.9
99 9.95 The square root of 99 is 9.95
100 10 The square root of 100 is 10
100 rows selected.
The reason for performing this test is that it highlights the difference in how the results are returned from the functions. The regular table function builds the whole collection before returning it, so a pause is seen followed by all the results being returned in a single block. In contrast, the pipelined table function returns rows as they are created, so results are returned in chunks by SQL*Plus.
In addition to the difference in the speed of returning the first rows, the difference in memory consumption should also be proven. Imagine a situation in which the table function is used to return 100,000 rows. The regular table function would build up the whole collection in memory before returning the data, while the pipelined table function would never hold more than a single row in memory at any time. The expected result is that the profile of the memory usage between the two methods would be vastly different. The test_table_function_memory_usage.sql script provides a method for testing this difference.
test_table_function_memory_usage.sql
-- Create a function to retrieve current PGA usage.
CREATE OR REPLACE FUNCTION get_used_memory RETURN NUMBER AS
l_used_memory NUMBER;
BEGIN
SELECT ms.value
INTO l_used_memory
FROM v$mystat ms,
v$statname sn
WHERE ms.statistic# = sn.statistic#
AND sn.name = 'session pga memory';
RETURN l_used_memory;
END get_used_memory;
/
SHOW ERRORS
conn test/test
-- Test regular table function.
SET SERVEROUTPUT ON
DECLARE
l_start NUMBER;
BEGIN
l_start := get_used_memory;
FOR cur_rec IN (SELECT *
FROM TABLE(tf_api.get_square_roots_tf(1, 100000, 'FALSE')))
LOOP
NULL;
END LOOP;
DBMS_OUTPUT.put_line('Regular table function : ' ||
(get_used_memory - l_start));
END;
/
conn test/test
-- Test pipelined table function.
SET SERVEROUTPUT ON
DECLARE
l_start NUMBER;
BEGIN
l_start := get_used_memory;
FOR cur_rec IN (SELECT *
FROM TABLE(tf_api.get_square_roots_ptf(1, 100000, 'FALSE')))
LOOP
NULL;
END LOOP;
DBMS_OUTPUT.put_line('Pipelined table function : ' ||
(get_used_memory - l_start));
END;
/
This script defines a function that returns the amount of PGA memory currently assigned to the session, which is used before and after calls to the table functions defined previously, allowing the memory consumption associated with the table function calls to be quantified.
Each test is separated by a new connection to make sure a clean session is being used. Notice that the artificial pause is not needed for this test. The output from this script is listed below and clearly demonstrates the difference in memory consumption by the two methods.
SQL> @test_table_function_memory_usage.sql
Connected.
Function created.
No errors.
Connected.
Regular table function : 34734080
PL/SQL procedure successfully completed.
Connected.
Pipelined table function : 65536
PL/SQL procedure successfully completed.
SQL>
In this example the regular table function consumes more than 500 times the PGA memory of the pipelined table function.
These two tests clearly demonstrate the performance improvements associated with pipelined table functions over conventional table functions.
The next section shows the affect of parallelizing table functions on their performance.