Skip to main content

Table Functions, Part 4: Streaming table functions

August 2018 update: please do feel encourage to read this and my other posts on table functions, but you will learn much more about table functions by taking my Get Started with PL/SQL Table Functions class at the Oracle Dev Gym. Videos, tutorials and quizzes - then print a certificate when you are done!

In my last post on table functions, I showed how I was able to reduce many Application Express interactive reports into one by pushing all the complexity into a table function, thereby reducing the query in the report to nothing more than a "parameterized view":

SELECT *
  FROM TABLE (
     qdb_rankings.ir_other_ranking_tf (category_in  => :p443_category,
                                period_type_in      => :p443_period_type,
                                competition_id_in   => :p443_competition_id,
                                period_in           => :p443_period)) pr

I hope you agree that this is a nice trade-off: keep the code in Application Express really simple, move the complexity to the backend.

Another common usage of table functions is to stream data directly from one process or transformation, to the next process without intermediate staging. Hence, a table function used in this way is called a streaming table function.

As you might be able to tell from the reference to transformation above, this technique is most often used in data warehouses. 



Before getting into the details of implementation, here's what such a streaming process might look like in SQL:

INSERT INTO tickertable
   SELECT *
     FROM TABLE (stockpivot (CURSOR (SELECT *
                                       FROM stocktable)))

Here's an explanation of the transformation (I will show below all the details of the database objects referenced):
  1. Take all the data from the stocktable....
  2. Convert it into a cursor variable with the CURSOR expression....
  3. Pass that cursor variable to the stockpivot table function....
  4. The function returns a nested table of object type instances....
  5. The TABLE operator converts that collection into a relational table format....
  6. SELECT all the rows from that pseudo-table....
  7. Insert them into the ticker table.
And as the image indicates, you can perform multiple transformations, as in:

INSERT INTO ticker table
SELECT *
  FROM TABLE (ticker pivot (
                CURSOR (SELECT *
                          FROM TABLE(stockpivot (
                                  CURSOR (SELECT * 
                                            FROM stocktable
       ))))))

Hopefully that is enough to get you thinking about the possibilities. As for the code....here's the fairly artificial example I will implement:

The stocktable contains the open and close prices for each stock market ("ticker") symbol and date:

CREATE TABLE stocktable
(
   ticker        VARCHAR2 (20)
 , trade_date    DATE
 , open_price    NUMBER
 , close_price   NUMBER
)
/

I need to transform each row in stocktable to two rows in tickertable (one for the open price and another for the close price):

CREATE TABLE tickertable
(
   ticker      VARCHAR2 (20)
 , pricedate   DATE
 , pricetype   VARCHAR2 (1)
 , price       NUMBER
)
/

And I want to do it entirely in a single SQL statement.

Now here's where "artificial" comes into play: you do not need to use a table function to implement this transformation. The following SQL statement does the trick:

INSERT ALL
   INTO tickertable (ticker, trade_date, price_type, price) 

        values (ticker, trade_date, 'O', open_price)
   INTO tickertable (ticker, trade_date, price_type, price) 

        values (ticker, trade_date, 'C', close_price)
SELECT * FROM stocktable;


or you could use unpivot (thanks, Chris Saxon @chrisrsaxon, for this technique!):

insert into ticker table 
   (ticker, trade_date, price_type, price)
 select *
 from   stocktable
 unpivot (price for price_type in (
             open_price as 'O', close_price as 'C'))

But please assume for the sake of my blog post that the transformation is way more complex. In fact, let's be clear: the actual implementation of the transformation (table) function is obviously going to be application specific in the extreme. So I keep my transformation logic very simple and will zoom in on the key steps that you would take for whatever real-world transformation you need to implement.

As you surely know by now, a table function returns a collection. And as I explained in a earlier post in this series, when you need to return a collection whose elements contain more than a single scalar value, you need to create an object type for the datatype of the collection. So here goes:

/* Gee, looks just like the table! */

CREATE TYPE tickertype AS OBJECT
   (ticker VARCHAR2 (20)
  , pricedate DATE
  , pricetype VARCHAR2 (1)
  , price NUMBER
   );
/

CREATE TYPE tickertype_nt AS TABLE OF tickertype;
/

So a collection of tickertypeset is what the function will be returning. But what will I pass into the function? When you are building a streaming table function, that generally means that you are feeding into the function the result set of a SELECT statement. Now, you cannot pass a SELECT statement itself as an argument to a function.

So, instead, I will use the CURSOR expression to instantiate a cursor variable that points to the SELECT's result set. To do that, I need to define a REF CURSOR type whose RETURN type matches the SELECT list of the query:

CREATE OR REPLACE PACKAGE refcur_pkg
IS
   TYPE refcur_t IS REF CURSOR
      RETURN stocktable%ROWTYPE;
END refcur_pkg;
/

And now I can write the header of my function:

CREATE OR REPLACE FUNCTION stockpivot (
      dataset refcur_pkg.refcur_t)
   RETURN tickertype_nt

I will also need to declare some local constants and variables:

IS
   /* Avoid hard-coding the BULK COLLECT LIMIT 
      in the fetch statement */
   c_limit CONSTANT PLS_INTEGER := 100;

   /* Container for rows fetched from the cursor variable. */
   TYPE dataset_tt IS TABLE OF stocktable%ROWTYPE
                         INDEX BY PLS_INTEGER;
   l_dataset   dataset_tt;

   /* The nested table that will be returned. */
   retval      tickertype_nt := tickertype_nt ();

The body of function fetches N rows at a time, and then converts each of those single stocktable rows into two object types instances, each of which are added to the retval nested table. The code in purple is my very specific transformation logic. Yours will vary greatly and almost certainly be lots more complicated.

BEGIN
   LOOP
      /* Fetch next 100 rows. */
      FETCH dataset BULK COLLECT INTO l_dataset
         LIMIT c_limit;

      EXIT WHEN l_dataset.COUNT = 0;

      /* Iterate through each row.... */
      FOR l_row IN 1 .. l_dataset.COUNT
      LOOP
         /* First the opening price row */
         retval.EXTEND;         
         retval (retval.LAST) := 
            tickertype (
               l_dataset (l_row).ticker, 
               'O', 
               l_dataset (l_row).open_price,
               l_dataset (l_row).trade_date);         

         /* Next the closing price row */
         retval.EXTEND;
         retval (retval.LAST) := 
            tickertype (
               l_dataset (l_row).ticker, 
               'C', 
               l_dataset (l_row).close_price,
               l_dataset (l_row).trade_date); 
      END LOOP;
   END LOOP;

   CLOSE dataset;

   RETURN retval;
END;

There, that wasn't so hard, was it? :-)

But what if you are transforming lots and lots and LOTS of data, and you've invested in the Parallel Query option to speed things up?

Unfortunately, the default behavior of a PL/SQL function is that when you call it, your session halts, waiting for the function to return its data. In other words, it forces serialization. So if you want to use a table function in a parallelized query, you need to take the next step (in understanding and complexity) re: table functions and explore....

Pipelined table functions: one of the most interesting constructs in PL/SQL

And that, dear reader, is the subject of my next post in this series on table functions.

Links to Table Function Series

Comments

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of wo...

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch, ...

PL/SQL 101: Three ways to get error message/stack in PL/SQL

The PL/SQL Challenge quiz for 10 September - 16 September 2016 explored the different ways you can obtain the error message / stack in PL/SQL. Note: an error stack is a sequence of multiple error messages that can occur when an exception is propagated and re-raised through several layers of nested blocks. The three ways are: SQLERRM - The original, traditional and (oddly enough) not currently recommended function to get the current error message. Not recommended because the next two options avoid a problem which you are unlikely  to run into: the error stack will be truncated at 512 bytes, and you might lose some error information. DBMS_UTILITY.FORMAT_ERROR_STACK - Returns the error message / stack, and will not truncate your string like SQLERRM will. UTL_CALL_STACK API - Added in Oracle Database 12c, the UTL_CALL_STACK package offers a comprehensive API into the execution call stack, the error stack and the error backtrace.  Note: check out this LiveSQL script if...