Skip to main content

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today:
In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit?
First I give the quick answer, then I provide support for that answer

Quick Answer
  • Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption.
  • Test to see if that's fast enough (likely will be for many cases).
  • If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory. 
  • Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification.
  • Don't put anything in the collection you don't need. [from Giulio Dottorini]
  • Remember: each session that runs this code will use that amount of memory.
Background

When you use BULK COLLECT, you retrieve more than row with each fetch, reducing context switching between the SQL and PL/SQL engines, and thereby improving performance. Here, for example, is a small piece of code that gets all the rows from the employees table in one round-trip to the SQL engine, displays the number of elements in the collection, and then iterates through the collection displaying the last name.

DECLARE
   TYPE employees_t IS TABLE OF employees%ROWTYPE;
   l_employees   employees_t;
BEGIN
     SELECT *
       BULK COLLECT INTO l_employees
       FROM employees;

   DBMS_OUTPUT.put_line (l_employees.COUNT);

   FOR indx IN 1 .. l_employees.COUNT
   LOOP
      DBMS_OUTPUT.put_line (l_employees (indx).last_name);
   END LOOP;
END;
/

As you can see, we fetch those multiple rows into a collection (aka, array). That collection consumes per-session Process Global Area memory or PGA. So you face a typical tradeoff when it comes to performance optimization: reduce CPU cycles, but use more memory.

And with the above block of code, using an "unlimited" BULK COLLECT, you really are taking a risk of running out of memory. As the number of rows in the table grows, more memory will be consumed.

So the general recommendation for production code, working with tables that may grow greatly in size, is to avoid SELECT BULK COLLECT INTO (an implicit query) and instead use the FETCH BULK COLLECT with a LIMIT clause.

Here's a rewrite of the above block using the LIMIT clause, retrieving 100 rows with each fetch.

DECLARE
   c_limit   CONSTANT PLS_INTEGER DEFAULT 100;

   CURSOR emp_cur IS SELECT * FROM employees;

   TYPE employee_aat IS TABLE OF employees%ROWTYPE INDEX BY BINARY_INTEGER;
   l_employee         employee_aat;
BEGIN
   OPEN emp_cur;

   LOOP
      FETCH emp_cur BULK COLLECT INTO l_employee LIMIT c_limit;
      EXIT WHEN l_employee.COUNT = 0;

      DBMS_OUTPUT.put_line ('Retrieved ' || l_employee.COUNT);

      FOR indx IN 1 .. l_employee.COUNT
      LOOP
         DBMS_OUTPUT.put_line (
            l_employees (indx).employee_id || ' - ' || 
            l_employees (indx).last_name);
      END LOOP;
   END LOOP;

   CLOSE emp_cur;
END;
/

Now, no matter how many rows are in the employees table, my session only uses the memory required for 100 rows.

Back in Oracle Database 10g, the (then) brand-new PL/SQL optimizer played a neat trick with cursor FOR loops: it automatically converted it to C code that retrieves 100 rows with each fetch! The thinking is that the amount of memory for most tables need for 100 rows is never going to be that much, and you get a really nice burst in performance with that level of "bulk". This LiveSQL script demonstrates that optimization.

And from tests various people are run over the years, increasing that limit to 500 or 1000 doesn't really seem to offer that much of an improvement.

But for sure if you are working your way through millions of rows, you might see a very nice boost with a limit of 10000 or more. You just need to keep an eye on memory consumption. And that will be much less of a concern if you are running a batch job, not writing code that will be run by many users simultaneously.

You might also find this StackOverflow Q&A helpful.

Don't put anything in the collection you don't need. 

Giulio Dottorini made this suggestion in a comment on this post. It is a very good point. The more data you store in every element ("row") of the collection, the more PGA memory will be required to store that data. So don't put anything in the collection you don't need. 

Take another look at the block above showing how to use the LIMIT clause. My query is "SELECT * FROM employees", but when I look at how I use the data in that collection, I only need employee_id and last_name. So this would make a lot more sense for the declaration of my cursor and then of the collection type itself:

DECLARE
   c_limit   CONSTANT PLS_INTEGER DEFAULT 100;

   CURSOR emp_cur IS SELECT employee_id, last_name FROM employees;

   TYPE employee_aat IS TABLE OF emp_cur INDEX BY BINARY_INTEGER;
   l_employee         employee_aat;


In fact, generally, you should declare your collection type based on the cursor you are fetching from.

Now my collection will use even less PGA - only what I need and nothing more. Thanks, Giulio!

Comments

  1. I request you to provide advice on how to "keep any eye on memory consumption". The V$ views for monitoring PGA memory consumption have improved in 11g and 12c since 10g

    ReplyDelete
    Replies
    1. Thanks, Hemant. I have don't have lots of experience with using these views, but you will find lots of information and the list of views to query here: https://docs.oracle.com/en/database/oracle/oracle-database/18/tgdba/tuning-program-global-area.html#GUID-903DB64B-3019-4272-8812-4F00A05371FB

      Delete
  2. Hi Steven,
    in your experience, to reduce PGA memory whta do you think if we get in the array only the primary key or rowid and get record during elaboration?
    Thanks in advance.

    ReplyDelete
    Replies
    1. Giulio, that is a great idea. No reason to load up the collection with information that is not needed in the collection itself!

      Delete

Post a Comment

Popular posts from this blog

Quick Guide to User-Defined Types in Oracle PL/SQL

A Twitter follower recently asked for more information on user-defined types in the PL/SQL language, and I figured the best way to answer is to offer up this blog post. PL/SQL is a strongly-typed language . Before you can work with a variable or constant, it must be declared with a type (yes, PL/SQL also supports lots of implicit conversions from one type to another, but still, everything must be declared with a type). PL/SQL offers a wide array of pre-defined data types , both in the language natively (such as VARCHAR2, PLS_INTEGER, BOOLEAN, etc.) and in a variety of supplied packages (e.g., the NUMBER_TABLE collection type in the DBMS_SQL package). Data types in PL/SQL can be scalars, such as strings and numbers, or composite (consisting of one or more scalars), such as record types, collection types and object types. You can't really declare your own "user-defined" scalars, though you can define subtypes  from those scalars, which can be very helpful from the p

The differences between deterministic and result cache features

 EVERY once in a while, a developer gets in touch with a question like this: I am confused about the exact difference between deterministic and result_cache. Do they have different application use cases? I have used deterministic feature in many functions which retrieve data from some lookup tables. Is it essential to replace these 'deterministic' key words with 'result_cache'?  So I thought I'd write a post about the differences between these two features. But first, let's make sure we all understand what it means for a function to be  deterministic. From Wikipedia : In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.  Another way of putting this is that a deterministic subprogram (procedure or function) has no side-effects. If you pass a certain set of arguments for the parameters, you will always get

My two favorite APEX 5 features: Regional Display Selector and Cards

We (the over-sized development team for the PL/SQL Challenge - myself and my son, Eli) have been busy creating a new website on top of the PLCH platform (tables and packages): The Oracle Dev Gym! In a few short months (and just a part time involvement by yours truly), we have leveraged Oracle Application Express 5 to create what I think is an elegant, easy-to-use site that our users will absolutely love.  We plan to initially make the Dev Gym available only for current users of PL/SQL Challenge, so we can get feedback from our loyal user base. We will make the necessary adjustments and then offer it for general availability later this year. Anyway, more on that as the date approaches (the date being June 27, the APEX Open Mic Night at Kscope16 , where I will present it to a packed room of APEX experts). What I want to talk about today are two features of APEX that are making me so happy these days: Regional Display Selector and Cards. Regional Display Sel