Friday, August 28, 2015

Optimizing PL/SQL Panel at Oracle Open World 2015 - Questions welcome!

At Oracle Open World 2014, I was a member of a panel on SQL and PL/SQL Performance. It was an awful lot of fun, but as I had expected, most of the questions focused on SQL performance, and most of them were answered by Tom Kyte (no complaints from me!).

I came away with a strong feeling that we should offer separate panels on SQL and PL/SQL, so that each could get the proper amount of attention (or as "proper" as you can get in 45 minutes. Sigh...).

Well, my wish has come true!

There will be a panel on SQL for Functionality, Performance and Maintainability, and its future, and another on PL/SQL:
Optimizing PL/SQL for Performance and Maintainability (CON8416)
Moscone South - 306, Wednesday, October 28th, 13:45 - 14:30
Notice that we are suggesting that when it comes to "optimization," we should all be thinking about more than performance. Performance is, justifiably, a primary concern for any application developer (and DBA), but that doesn't mean it should be the sole concern.

A truly optimal application is also optimized for maintainability - the ability, namely, to fix and enhance that application over time in a way that does not consume too much of the resources of the dev team (such that it is hard to find time to build new apps, always our preference). 

We've pulled together panel that is composed mostly of expert users, not (just) Oracle employees. Sure, Bryn Llewellyn (Distinguished Product Manager, PL/SQL and EBR) will be there. And I will moderate the panel, throwing in my two cents now and then. 

But we are most excited to welcome the following "real world" expert PL/SQL developers to answer questions and share their experiences:
  • Martin Buechi,  Lead Software Architect at Avaloq Evolution AG and 2009 PL/SQL Developer of the Year
  • Kim Berg Hansen,  Senior System Developer at T.Hansen Gruppen A/S and SQL Quizmaster at the PL/SQL Challenge
  • Marcelle Kratochvil,  CTO and Co-Founder of Piction and an Oracle Database multimedia expert
So if you are attending Oracle OpenWorld and you develop applications using PL/SQL, please do join us! We will be taking questions from the audience, but if you cannot attend the session, we also encourage you to post your question as a comment on this post.

We will then answer as many as possible during the session.

Of course, you can always ask a question about SQL and PL/SQL at the OTN Community Forum, or tweet me or Bryn.

Because we all know that Twitter is the best way to discuss complex technology like Oracle Database.

:-) 

Wednesday, August 26, 2015

Table Functions, Part 5c: Another use case for Pipelined Table Functions (and simple example)

From Oracle Help Center (a.k.a., documentation), we read:
Data is said to be pipelined if it is consumed by a consumer (transformation) as soon as the producer (transformation) produces it, without being staged in tables or a cache before being input to the next transformation.  
Pipelining enables a table function to return rows faster and can reduce the memory required to cache a table function's results.  
A pipelined table function can return the table function's result collection in subsets. The returned collection behaves like a stream that can be fetched from on demand. This makes it possible to use a table function like a virtual table.
In a nutshell, this means that the calling query can put to use the rows returned by the pipelined table function (PTF) before the function has returns all rows.

A simple way to demonstrate this (and highlight another nice use case for PTFs) is with the SQL IN operator.

IN is used in the WHERE clause to determine if a column or expression is in the specified list. The list can be a literal list:

SELECT * FROM my_table WHERE my_col IN (1,2,3,4)

or the list can be a subquery, which is really important, since you cannot have more than 1,000 literals in a list.

There is no such limit with this kind of IN list:

SELECT * FROM my_table 
WHERE my_col IN (SELECT num_col FROM my_list)

Hey, that's a FROM clause! That means we could call a table function there! And yes you can:


SELECT * FROM my_table 
WHERE my_col IN (
   SELECT num_col FROM TABLE (my_tf))

To evaluate whether or not a value is in the IN list, the SQL engine needs to go through the list, looking for a match. When it finds a match, it stops searching.

Gee, so that should mean that if the table function is pipelined, and the function is returning rows as it generates them, the SQL engine should be able to get to an answer faster with pipelining than without. 

Shall we test that? Yes, we shall!

First, I create a table and two functions, the second of which is pipelined:

CREATE TABLE plch_data (n NUMBER)
/

CREATE OR REPLACE TYPE numbers_t IS TABLE OF NUMBER;
/

CREATE OR REPLACE FUNCTION my_list_tf
   RETURN numbers_t AUTHID DEFINER
IS
   ns   numbers_t := numbers_t ();
BEGIN
   ns.EXTEND (1000000);

   FOR indx IN 1 .. 1000000
   LOOP
      ns (indx) := indx;
   END LOOP;

   RETURN ns;
END;
/

CREATE OR REPLACE FUNCTION my_list_ptf
   RETURN numbers_t
   PIPELINED AUTHID DEFINER
IS
BEGIN
   FOR indx IN 1 .. 1000000
   LOOP
      PIPE ROW (indx);
   END LOOP;
   RETURN;
END;
/

Let's take a closer look at the pipelined version. There are just three items to note:

  1. I add the PIPELINED keyword.
  2. Instead of populating a nested table to return, I pipe the row directly out of the function.
  3. I return nothing but control at the end of my function. 
Oh and here's an odd thing to note: if you leave off the RETURN; statement in a pipelined table function, you will get no complaint from the SQL or PL/SQL engines. Since no data is being returned by RETURN, the function will simply terminates and returns control.

But I suggest you include the RETURN anyway. Looks better, less confusing to someone maintaining the code later.


DECLARE
   l_count   INTEGER;
   l_start   PLS_INTEGER;

   PROCEDURE mark_start
   IS
   BEGIN
      l_start := DBMS_UTILITY.get_cpu_time;
   END mark_start;

   PROCEDURE show_elapsed (NAME_IN IN VARCHAR2)
   IS
   BEGIN
      DBMS_OUTPUT.put_line (
            '"'
         || NAME_IN
         || '" elapsed CPU time: '
         || TO_CHAR (DBMS_UTILITY.get_cpu_time - l_start)
         || ' centiseconds');
      mark_start;
   END show_elapsed;
BEGIN
   INSERT INTO plch_data VALUES (1);

   COMMIT;
   
   mark_start;

   SELECT COUNT (*)
     INTO l_count
     FROM plch_data
    WHERE n IN (SELECT * FROM TABLE (my_list_tf));

   show_elapsed ('TF match on first');

   SELECT COUNT (*)
     INTO l_count
     FROM plch_data
    WHERE n IN (SELECT * FROM TABLE (my_list_ptf));

   show_elapsed ('PTF match on first');

   UPDATE plch_data
      SET n = 1000000;

   SELECT COUNT (*)
     INTO l_count
     FROM plch_data
    WHERE n IN (SELECT * FROM TABLE (my_list_tf));

   show_elapsed ('TF match on last');

   SELECT COUNT (*)
     INTO l_count
     FROM plch_data
    WHERE n IN (SELECT * FROM TABLE (my_list_ptf));

   show_elapsed ('PTF match on last');
END;
/

And when I run this block with server output turned on, I see:

"TF match on first" elapsed CPU time: 11 centiseconds
"PTF match on first" elapsed CPU time: 1 centiseconds
"TF match on last" elapsed CPU time: 13 centiseconds
"PTF match on last" elapsed CPU time: 5 centiseconds

Yep. Pipelining resulted in a signficant boost in performance. Go, PTF, go!

By the way, here are the clean-up steps for the above script:

DROP TYPE numbers_t
/

DROP TABLE plch_data
/

DROP FUNCTION my_list_tf
/

DROP FUNCTION my_list_ptf
/

Tuesday, August 25, 2015

Table Functions, Part 5b: Table Functions vs Pipelined Table Functions

Last week, I published my first post in the Table Function series on pipelined table functions. I included this advice at the top of the post:

Here's my first piece of advice regarding this interesting and in many ways unique (within PL/SQL) feature:
You are unlikely to ever need a pipelined table function, but when you do, it is going to be amazing.
I bring this up right at the start because I have found over the years that many developers talk in the most glowing terms about pipelined table functions. Yet when I look at what they are doing with those functions, it becomes clear that they are not pumped up about the pipelining. They are not, in fact, benefiting at all from the pipelining feature.

Instead, they are excited simply about table functions. And that's cool, that's fine. Table functions are, indeed, quite awesome. Read my series! :-)

Part of the reason I gave this advice is that I'd run some performance tests on TFs and PTFs and noticed that PTFs seemed to run a bit slower than TFs when you weren't taking advantage of pipelining. So I didn't want to be too encouraging of users to change all their TFs into PTFs.

But then Patrick Barel pointed readers to his article on PTFs, which concluded with this advice:
Bottom line is to always use pipelined table functions instead of normal table functions, except when the function is only meant for PL/SQL or client consumption
Well, gee, that sorta conflicted with my advice. But I didn't get mad at Patrick. No, sir, I did not. And that's because I am aware, more than anyone else in the world (I hope), of how often I have been wrong, and how much I have learned from others.

So instead, Patrick, Erik Van Roon and Kim Berg Hansen and I all embarked on a small exploration into the comparable performances of TFs and PTFs. And that (along with a conversation with PL/SQL Product Manager, Bryn Llewellyn, led me to re-evaluate my thinking (um, fancy way of saying: I was wrong).

So I have now changed my original post on pipelined table functions, and I will share with you some of what Patrick, Erik and Kim discovered. If I have not done each of you justice, let us know!

Let's start with Kim's nice encapsulation from Patrick's article of a recommendation for choosing between TFs and PTFs:
  • If your table function is only used within SQL [[ SELECT ... FROM TABLE( myfunc() ) ... ]], then just go ahead and always use pipelined table functions.
  • If your table function is only assigned directly to collection variable (either PL/SQL or client language), then you must use a regular table function.
  • If your table function needs to be used in both places (SQL and PL/SQL):
    • If it is relatively small amounts of data, just use the regular table function, it'll be OK for your SQL too.
    • If it is relatively large amounts of data, consider having two versions of the function:
      • One version pipelined for SQL TABLE() calls.
      • One version regular for collection assignments. This may be a simple wrapper that populates and returns a collection with the results of the pipelined version.
Hopefully that is clear enough. So the big question we tried to get our heads around was: are table functions faster or slower than pipelined table functions (when you are not taking advantage of the pipelining effect)? 

I will hand over the microphone to Erik to sum up the results we got, after running a variety of tests:

There seems to be a slight difference in performance between the two types of table functions. Apart from Kim’s test they all seem to be in favor of the pipelined version. 

However, the difference is minimal, ranging from 0.5 – 15%, while the absolute differences range from 0.05 – 3 seconds. And that is for processing a million rows.

As Patrick says, and as we all know “the argument for using Pipelined Table Functions is mostly the impact on the PGA”. In fact, with my current settings, if I run my script for 10 million records the PTF’s still run fine, but the TF as can be expected crashes with memory errors.

I think in general the cost of overhead of one or the other should not be a consideration.

Erik

I will leave it to Kim, Erik and Patrick to decide if they would like to publish the details of their test runs (including code? My readers love to get their hands on code written by experts!).

But I would like to thank all three of you for helping keep me honest and accurate and up-to-date. I had fallen into a trap I warn others of ("A long, long time ago, my tests showed that PTFs were slower than TFs, so watch out.").

Any time your statement starts (silently or otherwise) with "A long time ago..." or "Back in Oracle9i...", it is time to challenge those assumptions and run some new tests.

Especially before posting a new entry on my blog.

Developer Choice Awards Update: Let the judging begin!

After a one week extension for the nominations period, we have now closed nominations for the first-ever round of Oracle Database Developer Choice Awards.

We have a total of 75 nominations across five categories (SQL, PL/SQL, Application Express, Database Design and ORDS). That's quite a healthy number and we look forward to the judges now winnowing that list down to a set of finalists in each group.

Judges? What judges? Isn't Oracle simply going to decide who the finalists - and then winners - will be?

NO WAY.

These are developer choice awards. By this we mean that our users decide the finalists and then vote among the finalists to determine the winners of the awards.

We've set up a panel of judges for each category. Each panel consists of four ACEs and/or ACE Directors, plus one Oracle employee. For example, I am on the PL/SQL panel (surprise!).

And now you are probably wondering: so who are the other PL/SQL judges? And the judges on the other panels?

Ha. You expect us to give away all our secrets? :-)

No, seriously, we are going to release the names of the judges after we publish our lists of finalists, and popular voting begins.


Otherwise, who knows? Judges might start receiving boxes of chocolates flown in by Amazon drones within ONE HOUR of being ordered by a nominee or that nominee's devoted significant other. We can't allow that.


But I do like to recognize initiative. So if you want to influence votes via chocolate, make sure they have nuts in them (see above photo), and send them straight to:

Steven Feuerstein
Chicago, Illinois
Planet Earth

We will announce the finalists on 15 September, and voting will commence until 15 October.

We will then announce the winners of the 2015 Oracle Database Developer Choice Awards at the second annual YesSQL celebration at Oracle Open World 2015.

Wednesday, August 19, 2015

Table Functions, Part 5a: An introduction to pipelined table functions

[Gee, that was embarrassing. I start a series on table functions, and then it falters and stops. My apologies; I got so busy eating and drinking and breathing and sleeping and....anyway, back to the series!]

In this post I introduce PL/SQL developers to the pipelined table function.

Here's my first piece of advice regarding this interesting and in many ways unique (within PL/SQL) feature:
You are unlikely to ever need a pipelined table function, but when you do, it is going to be amazing.
I bring this up right at the start because I have found over the years that many developers talk in the most glowing terms about pipelined table functions. Yet when I look at what they are doing with those functions, it becomes clear that they are not pumped up about the pipelining. They are not, in fact, benefiting at all from the pipelining feature.

Instead, they are excited simply about table functions.

And that's cool, that's fine. Table functions are, indeed, quite awesome. Read my series! :-)

[Strikeout applied 25 August - see Table Functions, Part 5b: Table Functions vs Pipelined Table Functions for an explanation.]

A pipelined table function is a specialized variant of the more general species. So far as I know, the pipelined table function (ok, I give up: PTF from here on) has three primary use cases:
  1. Make sure that a table function executed within a parallel query does not cause serialization. That is, parallelize (which is strikingly different from paralyze) execution of the table function.
  2. Reduce user perception of the elapsed time needed to retrieve data.
  3. Reduce the amount of PGA (process global area) consumed by a table function.
I discuss these briefly below, and then describe the architecture and flow of PTFs.

Parallel Query

I am not an expert in parallel query, but the basic idea if obvious: you have a long-running query (it takes days!). You need to make it run faster. So you give the Oracle Database SQL engine sufficient information so that it can break up that one big monster query into multiple queries that run in parallel. The results from each "partitioned" query are then merged into a single result set, and you have your answer - in much less time.



Want to learn more about parallel execution of queries? You won't find the information in my head. I will show you the basic syntax for enabling a pipelined table function to execute in parallel, but you really should read Using Parallel Execution and Parallel Enabled Pipelined Table Functions

User Perception

This one's a bit harder to explain, but should be clear enough once you get through the architecture/flow section below. The basic idea is a user visits your website, enters some criteria, and presses Submit. Surprise! They've just asked for 127,544 rows of data to be returned to the screen. And so they wait and wait and wait, as the page gets longer and longer.

Users hate to wait.

So you build that query around a PTF, and then the scripting language (or Java or whatever else you are using) can consume and display, say, the first 100 rows virtually instantly, and the user can start looking at and working with the data, as the rest of the data is being sent to the website.

The total time taken to retrieve all data is not faster, but the user's perception is: "Wow! I didn't have to wait at all!"

I think that sounds so cool, and I know (or believe) that people have used PTFs for this, because they told me. In my classes. In front of other developers. So they wouldn't lie, right?

But no one has ever pointed me to a demonstration. So if you have done something like this, please let us know via a comment on this post.

According to the documentation, you need to follow an "interface approach, the consumer and producers run on the same execution thread. Producer explicitly returns the control back to the consumer after producing a set of results. In addition, the producer caches the current state so that it can resume where it left off when the consumer invokes it again." You can do this in C or Java, and you will find examples here.

Reduced PGA Memory Consumption

When you populate data in collections (and table functions return collections), you consume Process Global Area memory. If you use too much of this memory (an OS setting), your session dies.

If your "normal" (non-pipelined) table function is consuming too much PGA, you can switch to a PTF and watch PGA consumption plummet.

Why? Well, to answer that question, you need to understand how PTFs work. And so....

Pipelined Table Function Architecture and Flow

Before I get into the details, it's important to remember the following about PL/SQL:
PL/SQL is not a multi-threaded language. 
When you invoke a subprogram (procedure or function), the invoker waits (is blocked) till the subprogram returns control.
To drive this point home, I create a simple table function that includes a call to DBMS_LOCK.SLEEP (note: you may need to have someone with DBA authority grant you the EXECUTE privilege on this package to reproduce what I am doing):

CREATE OR REPLACE TYPE list_of_names_t
   IS TABLE OF VARCHAR2 (100);
/

CREATE OR REPLACE FUNCTION names_tf (delay_in IN INTEGER)
   RETURN list_of_names_t
IS
   just_me   list_of_names_t
      := list_of_names_t ('Steven');
BEGIN
   DBMS_LOCK.SLEEP (delay_in);
   RETURN just_me;
END;
/

SET SERVEROUTPUT ON

SELECT COLUMN_VALUE FROM TABLE (names_tf (5) )
/
COLUMN_VALUE                                                                   
------------------------
Steven                                                                          

Elapsed: 00:00:05.043

SELECT COLUMN_VALUE FROM TABLE (names_tf (10) )
/

COLUMN_VALUE                                                                   
------------------------
Steven                                                                          

Elapsed: 00:00:10.019

As you can see, the time it took for the query to complete doubled when I doubled the time spent "sleeping" inside the function.

This, then, is the norm in the world of PL/SQL: Call me and you must wait for me to finish.

Clearly, that approach doesn't work very well if you want to execute a parallel query and one of the "tables" you are querying from is a table function. You can't parallelize an operation when one component of that operation insists on serializing, right?

Same goes for the user perception use case: if a user submits a request that executes a table function that must return 100,000 rows, everyone must wait till all those rows return.

And it returns those rows in the form of a PL/SQL collection, so that table function could and often will consume large amounts of PGA memory. 

Sigh. What's a programmer to do?

Pipeline!

Well, that's just a word. So what does pipelining do in a table function? 

It allows you to "pipe" a row of data from inside the function out to the calling query. The query can then immediately use that row of data, without waiting for the function to complete. 

Rather than bloat up this post (and further delay its publication) with a full-bore explanation of the syntax for pipelining, I will simply offer a rewrite from my streaming table function post as a PTF (pipeline-specific syntax in purple). I will then in the next post in this series provide a step-by-step explanation.

CREATE OR REPLACE FUNCTION stockpivot (
      dataset refcur_pkg.refcur_t)
   RETURN tickertype_nt PIPELINED
IS
   /* Avoid hard-coding the BULK COLLECT LIMIT 
      in the fetch statement */
   c_limit CONSTANT PLS_INTEGER := 100;

   /* Container for rows fetched from the cursor variable. */
   TYPE dataset_tt IS TABLE OF stocktable%ROWTYPE
                         INDEX BY PLS_INTEGER;
   l_dataset  dataset_tt;
   l_onerow   tickertype;
BEGIN
   LOOP
      FETCH dataset BULK COLLECT INTO l_dataset
         LIMIT c_limit;
      EXIT WHEN l_dataset.COUNT = 0;

      FOR l_row IN 1 .. l_dataset.COUNT
      LOOP
         l_onerow := 
            tickertype (
               l_dataset (l_row).ticker, 
               'O', 
               l_dataset (l_row).open_price,
               l_dataset (l_row).trade_date);         
         PIPE ROW (l_onerow);         l_onerow :=  
            tickertype (
               l_dataset (l_row).ticker, 
               'C', 
               l_dataset (l_row).close_price,
               l_dataset (l_row).trade_date);        
         PIPE ROW (l_onerow); 
      END LOOP;
   END LOOP;
   CLOSE dataset;

   RETURN;END;

To Sum Up

This post offers an overview and use cases for pipelined table function. It ends with a simple example of a PTF, with a promise to describe it more fully in the next post of this series.

In the meantime, check out these fine resources on PTFs:
And please do add a comment to this post any other use cases you have found for PTFs and especially an example of the user perception use case if you've done that. That would really make my day!

Links to Rest of Series