Table Functions, Part 5a: An introduction to pipelined table functions

[Gee, that was embarrassing. I start a series on table functions, and then it falters and stops. My apologies; I got so busy eating and drinking and breathing and sleeping and....anyway, back to the series!]

In this post I introduce PL/SQL developers to the pipelined table function.

Here's my first piece of advice regarding this interesting and in many ways unique (within PL/SQL) feature:
You are unlikely to ever need a pipelined table function, but when you do, it is going to be amazing.
I bring this up right at the start because I have found over the years that many developers talk in the most glowing terms about pipelined table functions. Yet when I look at what they are doing with those functions, it becomes clear that they are not pumped up about the pipelining. They are not, in fact, benefiting at all from the pipelining feature.

Instead, they are excited simply about table functions.

And that's cool, that's fine. Table functions are, indeed, quite awesome. Read my series! :-)

[Strikeout applied 25 August - see Table Functions, Part 5b: Table Functions vs Pipelined Table Functions for an explanation.]

A pipelined table function is a specialized variant of the more general species. So far as I know, the pipelined table function (ok, I give up: PTF from here on) has three primary use cases:
  1. Make sure that a table function executed within a parallel query does not cause serialization. That is, parallelize (which is strikingly different from paralyze) execution of the table function.
  2. Reduce user perception of the elapsed time needed to retrieve data.
  3. Reduce the amount of PGA (process global area) consumed by a table function.
I discuss these briefly below, and then describe the architecture and flow of PTFs.

Parallel Query

I am not an expert in parallel query, but the basic idea if obvious: you have a long-running query (it takes days!). You need to make it run faster. So you give the Oracle Database SQL engine sufficient information so that it can break up that one big monster query into multiple queries that run in parallel. The results from each "partitioned" query are then merged into a single result set, and you have your answer - in much less time.



Want to learn more about parallel execution of queries? You won't find the information in my head. I will show you the basic syntax for enabling a pipelined table function to execute in parallel, but you really should read Using Parallel Execution and Parallel Enabled Pipelined Table Functions

User Perception

This one's a bit harder to explain, but should be clear enough once you get through the architecture/flow section below. The basic idea is a user visits your website, enters some criteria, and presses Submit. Surprise! They've just asked for 127,544 rows of data to be returned to the screen. And so they wait and wait and wait, as the page gets longer and longer.

Users hate to wait.

So you build that query around a PTF, and then the scripting language (or Java or whatever else you are using) can consume and display, say, the first 100 rows virtually instantly, and the user can start looking at and working with the data, as the rest of the data is being sent to the website.

The total time taken to retrieve all data is not faster, but the user's perception is: "Wow! I didn't have to wait at all!"

I think that sounds so cool, and I know (or believe) that people have used PTFs for this, because they told me. In my classes. In front of other developers. So they wouldn't lie, right?

But no one has ever pointed me to a demonstration. So if you have done something like this, please let us know via a comment on this post.

According to the documentation, you need to follow an "interface approach, the consumer and producers run on the same execution thread. Producer explicitly returns the control back to the consumer after producing a set of results. In addition, the producer caches the current state so that it can resume where it left off when the consumer invokes it again." You can do this in C or Java, and you will find examples here.

Reduced PGA Memory Consumption

When you populate data in collections (and table functions return collections), you consume Process Global Area memory. If you use too much of this memory (an OS setting), your session dies.

If your "normal" (non-pipelined) table function is consuming too much PGA, you can switch to a PTF and watch PGA consumption plummet.

Why? Well, to answer that question, you need to understand how PTFs work. And so....

Pipelined Table Function Architecture and Flow

Before I get into the details, it's important to remember the following about PL/SQL:
PL/SQL is not a multi-threaded language. 
When you invoke a subprogram (procedure or function), the invoker waits (is blocked) till the subprogram returns control.
To drive this point home, I create a simple table function that includes a call to DBMS_LOCK.SLEEP (note: you may need to have someone with DBA authority grant you the EXECUTE privilege on this package to reproduce what I am doing):

CREATE OR REPLACE TYPE list_of_names_t
   IS TABLE OF VARCHAR2 (100);
/

CREATE OR REPLACE FUNCTION names_tf (delay_in IN INTEGER)
   RETURN list_of_names_t
IS
   just_me   list_of_names_t
      := list_of_names_t ('Steven');
BEGIN
   DBMS_LOCK.SLEEP (delay_in);
   RETURN just_me;
END;
/

SET SERVEROUTPUT ON

SELECT COLUMN_VALUE FROM TABLE (names_tf (5) )
/
COLUMN_VALUE                                                                   
------------------------
Steven                                                                          

Elapsed: 00:00:05.043

SELECT COLUMN_VALUE FROM TABLE (names_tf (10) )
/

COLUMN_VALUE                                                                   
------------------------
Steven                                                                          

Elapsed: 00:00:10.019

As you can see, the time it took for the query to complete doubled when I doubled the time spent "sleeping" inside the function.

This, then, is the norm in the world of PL/SQL: Call me and you must wait for me to finish.

Clearly, that approach doesn't work very well if you want to execute a parallel query and one of the "tables" you are querying from is a table function. You can't parallelize an operation when one component of that operation insists on serializing, right?

Same goes for the user perception use case: if a user submits a request that executes a table function that must return 100,000 rows, everyone must wait till all those rows return.

And it returns those rows in the form of a PL/SQL collection, so that table function could and often will consume large amounts of PGA memory. 

Sigh. What's a programmer to do?

Pipeline!

Well, that's just a word. So what does pipelining do in a table function? 

It allows you to "pipe" a row of data from inside the function out to the calling query. The query can then immediately use that row of data, without waiting for the function to complete. 

Rather than bloat up this post (and further delay its publication) with a full-bore explanation of the syntax for pipelining, I will simply offer a rewrite from my streaming table function post as a PTF (pipeline-specific syntax in purple). I will then in the next post in this series provide a step-by-step explanation.

CREATE OR REPLACE FUNCTION stockpivot (
      dataset refcur_pkg.refcur_t)
   RETURN tickertype_nt PIPELINED
IS
   /* Avoid hard-coding the BULK COLLECT LIMIT 
      in the fetch statement */
   c_limit CONSTANT PLS_INTEGER := 100;

   /* Container for rows fetched from the cursor variable. */
   TYPE dataset_tt IS TABLE OF stocktable%ROWTYPE
                         INDEX BY PLS_INTEGER;
   l_dataset  dataset_tt;
   l_onerow   tickertype;
BEGIN
   LOOP
      FETCH dataset BULK COLLECT INTO l_dataset
         LIMIT c_limit;
      EXIT WHEN l_dataset.COUNT = 0;

      FOR l_row IN 1 .. l_dataset.COUNT
      LOOP
         l_onerow := 
            tickertype (
               l_dataset (l_row).ticker, 
               'O', 
               l_dataset (l_row).open_price,
               l_dataset (l_row).trade_date);         
         PIPE ROW (l_onerow);         l_onerow :=  
            tickertype (
               l_dataset (l_row).ticker, 
               'C', 
               l_dataset (l_row).close_price,
               l_dataset (l_row).trade_date);        
         PIPE ROW (l_onerow); 
      END LOOP;
   END LOOP;
   CLOSE dataset;

   RETURN;END;

To Sum Up

This post offers an overview and use cases for pipelined table function. It ends with a simple example of a PTF, with a promise to describe it more fully in the next post of this series.

In the meantime, check out these fine resources on PTFs:
And please do add a comment to this post any other use cases you have found for PTFs and especially an example of the user perception use case if you've done that. That would really make my day!

Links to Rest of Series






Comments

  1. Shameless plug: http://otechmag.com/magazine/2015/spring/patrick-barel.html
    'Dear Patrick: If pipelined table functions provide their results faster, why would you want to use a normal table function?'

    ReplyDelete
  2. You got me wondering what RETURN _; does, it seems it does nothing but give a compile error :-)

    ReplyDelete
  3. You wondered oh so well, Jobo. I caught that typo when I tested the code (and produced the timings) but then forgot to apply the fix to the code in the post. I have since fixed it (thanks to Patrick Barel for first pointing it out - and oh so politely via email - an unnecessary gracious act. I am fine with being called out in public!).

    ReplyDelete
  4. For one client we once had to implement a specific case for Oracle Siebel audit trail. Audit trail in Siebel is "encoded" as string divided by * by some special logic.

    The client's security department wanted to query decoded audit trail but they claimed that is has to be a select from a table but they can't execute a procedure before it. And of course the data had to be actual date without any delay from a specific date.

    So we implemented it as a pipeline function encapsulated into the view.

    ReplyDelete
  5. Hello Steven, All,

    In addition to the mentioned features that are characteristic for the pipelined function,
    there exists also a PL/SQL implementation aspect that makes them appealing:

    When we need to define a table function that returns a collection of a composite data type, making the function pipelined allows for nicely organizing the code, by keeping both the function and the necessary record and collection type in a single package.

    Otherwise, a similar non-pipelined table function would require to create an separate schema level object type and collection type.
    For pipelined functions, Oracle does this work for you behind the scenes :)

    This can be especially attractive when the involved types are only necessary for that specific function or package, that is, without having a wider usage in the application.

    Regarding the pipelined functions specific features, the reduction in the PGA consumption is indeed an essential one.

    I had the opportunity to experience this effectively when I converted a table function used in Oracle8i, before the pipelined functions were introduced, into a pipelined function in later versions, and the improvement in memory usage was huge :)

    Sometimes "the feel of an improvement" could be more rewarding than being
    "born directly" into a version with all these goodies already at hand :)

    Best Regards,
    Iudith

    ReplyDelete
    Replies
    1. I fully agree with the implementation aspect: It makes code more readable (shorter sql), better testable (e.g unit testing) and less side effects.
      But I wonder how you made it work with local types. It worked for me with schema level object type and collection type, e.g.

      CREATE OR REPLACE TYPE T_OBJ AS OBJECT
      (
      ...
      );
      CREATE OR REPLACE TYPE T_SET AS TABLE OF T_OBJ;


      but not with the following local declaration (neither in header nor in body of my package):

      type T_Obj is record
      (
      ...
      );

      type T_Set is table of T_Obj;

      I've got:

      460/22 PLS-00642: local collection types not allowed in SQL statements

      Perhaps you have an example?

      Best regards
      Peter

      Delete
    2. It'd be easier to reply with confidence if I saw your full code example, but I am pretty sure this is the problem: you can ONLY define an object type at the schema level, never inside a PL/SQL block.

      Delete
    3. Of course.
      I defined a pl/sql collection, not an object type in my package. But Iudith wrote:

      "When we need to define a table function that returns a collection of a composite data type, making the function pipelined allows for nicely organizing the code, by keeping both the function and the necessary record and collection type in a single package."

      I tried and failed...

      Delete
    4. My apologies. I did not read your code closely enough. Yes, you declared a record. That will not work because when you invoke the function inside SQL, that record type will not be recognizable. So you create the object type at the schema level, but you can declare your collection type of that object type in the package specification. Make sense? [Hope I got it right this time!]

      Delete
    5. Absolutely! I hope Iudith can explain what she meant...

      Delete
  6. A question please. I have seen recently that pipeline functions are working very well with ROWNUM pseudocolumn to limit the returned results.

    Is it known how it behaves regarding:
    1. Pagination (especially for the last pages - even if the probability to require them is small - I knew people who did this)
    2. ORDER BY in PTF is preserved at the output level? I have encountered an example in SQL Server where the order had a very strange behavior being modified from what was inside of a cursor to what revealed to be outside of the cursor (i.e. after the cursor prematurely exited because of this unforeseen behavior) and I am curios how Oracle behaves from this perspective regarding the PL/SQL features.

    ReplyDelete
  7. Cristi, do you mean an ORDER BY inside the PTF that determines the order in which rows are piped out?

    ReplyDelete
  8. Nice explanation...Thank you Steven

    ReplyDelete

Post a Comment

Popular posts from this blog

Table Functions, Part 1: Introduction and Exploration

Recommendations for unit testing PL/SQL programs

The future of Oracle PL/SQL: some thoughts on Sten Vesterli's thoughts