Skip to main content

Inline your subprograms - but only within same program unit

Got an email from an expert Oracle Database developer (let's call him Sheldon, in honor of my dad, who taught himself programming - RPG, no less!) yesterday:
Is inlining (via PRAGMA INLINE) even something worth talking about?  I’m getting the feeling that it’s not. I’m trying to come up with an example to show folks the beauty of inlining function calls.  And I’m getting the idea that either oracle is doing something like inlining code all the time or is just really super-efficient on the calls these days.
Well, sure, Oracle Database is really super-efficient. Always. Automagically. Don't have to worry about doing any tuning yourself. Oh, wow, what a wonderful world!

Pinch. Wake up! That was a dream you were having, Steven.

Oracle Database does an awful lot to ensure high performance for our applications, but it's not yet fully automatic or automagic. We can do things to drastically screw up performance. And we can also take actions, both in the way we write our code and in the features we utilize, to improve performance.

Inlining is one of those. First, an explanation of the basic concept, then I will address Sheldon's question.

Suppose that I have written very nicely modularized code, with lots of functions defining APIs to underlying rules and data, with nested subprograms to improve readability, etc. 

If you want to write an application that is easy to debug, maintain and enhance, you simply must modularize your logic.

But there is a price to pay: each time the PL/SQL engine executes the invocation of a subprogram, there is some overhead involved (look up the code, transfer data back and forth, etc.).

The idea behind inlining is that at the time of compilation, the compiler replaces the invocation of a subprogram with that subprogram's implementation. Then at runtime, the cost of executing the subprogram is reduced.

Here's a very crude example. I write this code:

CREATE PROCEDURE my_proc
IS
   n NUMBER;

   FUNCTION nested_func (p NUMBER)
      RETURN PLS_INTEGER
   IS
   BEGIN
      RETURN p * 10;
   END;
BEGIN
   FOR indx IN 1 .. 10000000
   LOOP
      n := nested_func (indx);
   END LOOP;
END;
/

But with inlining enabled, what is actually executed looks more like this:

CREATE PROCEDURE my_proc
IS
   n NUMBER;
BEGIN
   FOR indx IN 1 .. 10000000
   LOOP
      n := indx * 10;
   END LOOP;
END;
/

Simple enough, right?

OK. In Oracle Database 11g, we (not that I was a part of "we" - Oracle Corporation - back when 11.1 was released) added a new PL/SQL optimization level (3) and a new pragma (INLINE) to allow PL/SQL developers to take advantage of inlining. You can turn it on globally via the optimization level or in a very granular fashion with the pragma (which means: command to the compiler).

I won't go into all the details here. Check out the doc or Tim Hall's fine article or Adrian Billington's helpful exploration for lots more information.

So then - back to Sheldon's question. He was unable to verify the performance benefit from inlining. What's the problem? Well, let's read the introduction to Subprogram Inlining from the doc:
One optimization that the compiler can perform is subprogram inlining. Subprogram inlining replaces a subprogram invocation with a copy of the invoked subprogram (if the invoked and invoking subprograms are in the same program unit). 
Note that last section within parentheses:

The invoked and invoking subprograms must be in the same program unit.

What exactly does that mean? 

It means that the function you want to inline must be:
  1. Defined as a nested subprogram within the subprogram that invokes it, or...
  2. Defined within the same package. 
You've already seen an example of the former. But before I move on to an example of the latter (inlining subprograms from same package) or even showing you an example where inlining will not happen, I'd like to cover:

How do I know whether or not inlining is working?

My approach has been to perform timings on various implementations and see if a particular scenario makes things faster. That can be productive, except when the optimization technique is fairly fine-tuned in its applicability (which can be the case with inlining).

Bryn Llewellyn, PL/SQL Product Manager (follow @brynlite), suggests a far superior approach: use compile-time warnings (though in this case, less of a warning than an FYI).

Let's compile the my_proc procedure with inlining enabled (that's what level 3 optimization means) and compile-time warnings enabled:

alter session set Plsql_Warnings = 'Enable:All'
/

alter session set Plsql_Optimize_Level = 3
/

CREATE PROCEDURE my_proc
IS
   n NUMBER;

   FUNCTION nested_func (p NUMBER)
      RETURN PLS_INTEGER
   IS
   BEGIN
      RETURN p * 10;
   END;
BEGIN
   FOR indx IN 1 .. 10000000
   LOOP
      n := nested_func (indx);
   END LOOP;
END;
/

Then I issue a SHOW ERRORS command and see in SQL Developer the following:


Inlining was performed, and once that was done, there were no more calls to NESTED_FUNC, so the  compiler also removed the nested subprogram definition. Cool!

Now let's do the same thing for an invocation of a subprogram that is defined within the same package, but outside of the invoking subprogram:

alter session set Plsql_Warnings = 'Enable:All'
/

alter session set Plsql_Optimize_Level = 3
/

CREATE OR REPLACE PACKAGE pkg
   AUTHID DEFINER
IS
   PROCEDURE test_inlining;
END;
/

CREATE OR REPLACE PACKAGE BODY pkg
IS
   FUNCTION private_func (p NUMBER)
      RETURN PLS_INTEGER
   IS
   BEGIN
      RETURN p * 10;
   END;

   PROCEDURE test_inlining
   IS
      n              NUMBER;
   BEGIN
      FOR indx IN 1 .. 10000000
      LOOP
         n := private_func (indx);
      END LOOP;
   END;
END;
/

And SHOW ERRORS reveals:



Finally, let's try to inline a subprogram defined outside of the invoking subprogram's program unit.


alter session set Plsql_Warnings = 'Enable:All'
/

alter session set Plsql_Optimize_Level = 3
/

CREATE OR REPLACE FUNCTION my_func (p NUMBER)
      RETURN PLS_INTEGER
IS
BEGIN
   RETURN p * 10;
END;
/

CREATE OR REPLACE PROCEDURE test_inlining
IS
   n NUMBER;
BEGIN
   FOR indx IN 1 .. 10000000
   LOOP
      n := my_func (indx);
   END LOOP;
END;
/

And what warnings do I see now?





Not a single mention of inlining. Even with the optimization level set to 3. That should be a pretty good hint, right?

But let's make sure. I will explicitly request inlining of the call to MY_FUNC (which should not be necessary, given the optimization level, but bear with me). 

CREATE OR REPLACE PROCEDURE test_inlining
IS
   n NUMBER;
BEGIN
   FOR indx IN 1 .. 10000000
   LOOP
      PRAGMA INLINE (my_func, 'YES');
      n := my_func (indx);
   END LOOP;
END;
/

And then I see the following warnings:



Ah....so you can that the compiler tells me that I've made the request - but it was not "done", only "requested". 

As stated in the documentation, since my_func is not defined within the same program unit as the invoking subprogram, inlining does not apply.

So now I've shown you when inlining will take place and how you can verify that the compiler performed as expected. 

Compile-time warnings are really quite cool. Turn them on in your IDE and pay attention to what they say!

Going back to the original question from Sheldon, though: "Is inlining (via PRAGMA INLINE) even something worth talking about?  I’m getting the feeling that it’s not."

In my (admittedly limited) experience, it can be hard to come up with scripts that demonstrate massive performance improvements. It will definitely help most in "extreme" scenarios involving a very large number of subprogram invocations (loops, recursion, etc.).

[2016-09-23] The recommendation from the PL/SQL Development Team is unequivocal on this matter: you should set the PL/SQL optimization level to 3 at the instance level. Give the compiler the maximum number of tools to use to optimize your code. It may not make a big difference, but it will surely help overall.

I could, I suppose, take the position that you should just turn on inlining globally (optimization level) and let the compiler have at it, except that the documentation also states:
If a particular subprogram is inlined, performance almost always improves. However, because the compiler inlines subprograms early in the optimization process, it is possible for subprogram inlining to preclude later, more powerful optimizations. 
If subprogram inlining slows the performance of a particular PL/SQL program, then use the PL/SQL hierarchical profiler (explained in Oracle Database Development Guide) to identify subprograms for which you want to turn off inlining. 
 So I would suggest that you not bump up the optimization level to 3 globally, and instead put in specific, explicit requests to inline precisely in those programs in which the use case fits clearly and you can demonstrate a noticeable improvement.

Comments

  1. Hello Steve,

    I would like to ask for your assistance on the question below for my assignment on my Online Masteral. Currently, I do not have any background in SQL and I am starting to learn the basic of it. Hope you can assist me so that I may use as a reference same with the other questions. Thank you in advance!

    ------------------------
    Create a procedure that displays the name of the customer with the highest amount of orders among all customers. Name the procedure PRINT_CUST_PROC. The procedure will also invoke a function called GET_HIGHORDER_FUNC that retrieves the name and total amount ordered of customer with the highest amount of order. Invoke the procedure and view the results.
    ------------------------

    ReplyDelete
    Replies
    1. I am glad to hear you are getting started with SQL and PL/SQL. I am sorry, though, that I do not have time to do this assignment for you. You will find many examples of PL/SQL programs and code at livesql.oracle.com. I also encourage you check out devgym.oracle.com.

      Delete
  2. Hi Steven,

    Thank you for the nice write-up.

    I have a question, if you don't mind. Using the compilation warnings method, I was able to observe the following (using PL/SQL 12.2.0.1 on top of Oracle Linux Server 7.9).

    Using the first example from your post, with plsql_optimize_level set to 3 and plsql_code_type set to 'interpreted', I can see that the compiler inlines the call to the nested function. However, if I change the compilation type to 'native', the nested function is no longer inlined. And it's not just a mirage; if I specifically add PRAGMA INLINE(nested_func, 'yes') before the invocation, the warnings show that the function is again inlined.

    Any thoughts as to why, without a specific request to inline, with optimization level set to 3 and compilation type set to native, the compiler would choose to NOT inline the function call?

    This is not just academic curiosity; for example, I found that inlining calls to PUSH and POP in a simple stack implementation, called a very large number of times from a small procedure, reduces execution time by a factor of two. Not huge, but not negligible either, especially given the extremely low cost of inlining (at least in this case).

    Thank you! - mathguy

    ReplyDelete
    Replies
    1. Interesting.

      I do not have an answer for this, but I will see what I can find out.

      Delete

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of work, revamping algorithms, ensuring correctness, you know the score. Then my eyes snagge

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch,

Quick Guide to User-Defined Types in Oracle PL/SQL

A Twitter follower recently asked for more information on user-defined types in the PL/SQL language, and I figured the best way to answer is to offer up this blog post. PL/SQL is a strongly-typed language . Before you can work with a variable or constant, it must be declared with a type (yes, PL/SQL also supports lots of implicit conversions from one type to another, but still, everything must be declared with a type). PL/SQL offers a wide array of pre-defined data types , both in the language natively (such as VARCHAR2, PLS_INTEGER, BOOLEAN, etc.) and in a variety of supplied packages (e.g., the NUMBER_TABLE collection type in the DBMS_SQL package). Data types in PL/SQL can be scalars, such as strings and numbers, or composite (consisting of one or more scalars), such as record types, collection types and object types. You can't really declare your own "user-defined" scalars, though you can define subtypes  from those scalars, which can be very helpful from the p