Skip to main content

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today:
In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit?
First I give the quick answer, then I provide support for that answer

Quick Answer
  • Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption.
  • Test to see if that's fast enough (likely will be for many cases).
  • If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory. 
  • Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification.
  • Don't put anything in the collection you don't need. [from Giulio Dottorini]
  • Remember: each session that runs this code will use that amount of memory.
Background

When you use BULK COLLECT, you retrieve more than row with each fetch, reducing context switching between the SQL and PL/SQL engines, and thereby improving performance. Here, for example, is a small piece of code that gets all the rows from the employees table in one round-trip to the SQL engine, displays the number of elements in the collection, and then iterates through the collection displaying the last name.

DECLARE
   TYPE employees_t IS TABLE OF employees%ROWTYPE;
   l_employees   employees_t;
BEGIN
     SELECT *
       BULK COLLECT INTO l_employees
       FROM employees;

   DBMS_OUTPUT.put_line (l_employees.COUNT);

   FOR indx IN 1 .. l_employees.COUNT
   LOOP
      DBMS_OUTPUT.put_line (l_employees (indx).last_name);
   END LOOP;
END;
/

As you can see, we fetch those multiple rows into a collection (aka, array). That collection consumes per-session Process Global Area memory or PGA. So you face a typical tradeoff when it comes to performance optimization: reduce CPU cycles, but use more memory.

And with the above block of code, using an "unlimited" BULK COLLECT, you really are taking a risk of running out of memory. As the number of rows in the table grows, more memory will be consumed.

So the general recommendation for production code, working with tables that may grow greatly in size, is to avoid SELECT BULK COLLECT INTO (an implicit query) and instead use the FETCH BULK COLLECT with a LIMIT clause.

Here's a rewrite of the above block using the LIMIT clause, retrieving 100 rows with each fetch.

DECLARE
   c_limit   CONSTANT PLS_INTEGER DEFAULT 100;

   CURSOR emp_cur IS SELECT * FROM employees;

   TYPE employee_aat IS TABLE OF employees%ROWTYPE INDEX BY BINARY_INTEGER;
   l_employee         employee_aat;
BEGIN
   OPEN emp_cur;

   LOOP
      FETCH emp_cur BULK COLLECT INTO l_employee LIMIT c_limit;
      EXIT WHEN l_employee.COUNT = 0;

      DBMS_OUTPUT.put_line ('Retrieved ' || l_employee.COUNT);

      FOR indx IN 1 .. l_employee.COUNT
      LOOP
         DBMS_OUTPUT.put_line (
            l_employees (indx).employee_id || ' - ' || 
            l_employees (indx).last_name);
      END LOOP;
   END LOOP;

   CLOSE emp_cur;
END;
/

Now, no matter how many rows are in the employees table, my session only uses the memory required for 100 rows.

Back in Oracle Database 10g, the (then) brand-new PL/SQL optimizer played a neat trick with cursor FOR loops: it automatically converted it to C code that retrieves 100 rows with each fetch! The thinking is that the amount of memory for most tables need for 100 rows is never going to be that much, and you get a really nice burst in performance with that level of "bulk". This LiveSQL script demonstrates that optimization.

And from tests various people are run over the years, increasing that limit to 500 or 1000 doesn't really seem to offer that much of an improvement.

But for sure if you are working your way through millions of rows, you might see a very nice boost with a limit of 10000 or more. You just need to keep an eye on memory consumption. And that will be much less of a concern if you are running a batch job, not writing code that will be run by many users simultaneously.

You might also find this StackOverflow Q&A helpful.

Don't put anything in the collection you don't need. 

Giulio Dottorini made this suggestion in a comment on this post. It is a very good point. The more data you store in every element ("row") of the collection, the more PGA memory will be required to store that data. So don't put anything in the collection you don't need. 

Take another look at the block above showing how to use the LIMIT clause. My query is "SELECT * FROM employees", but when I look at how I use the data in that collection, I only need employee_id and last_name. So this would make a lot more sense for the declaration of my cursor and then of the collection type itself:

DECLARE
   c_limit   CONSTANT PLS_INTEGER DEFAULT 100;

   CURSOR emp_cur IS SELECT employee_id, last_name FROM employees;

   TYPE employee_aat IS TABLE OF emp_cur INDEX BY BINARY_INTEGER;
   l_employee         employee_aat;


In fact, generally, you should declare your collection type based on the cursor you are fetching from.

Now my collection will use even less PGA - only what I need and nothing more. Thanks, Giulio!

Comments

  1. I request you to provide advice on how to "keep any eye on memory consumption". The V$ views for monitoring PGA memory consumption have improved in 11g and 12c since 10g

    ReplyDelete
    Replies
    1. Thanks, Hemant. I have don't have lots of experience with using these views, but you will find lots of information and the list of views to query here: https://docs.oracle.com/en/database/oracle/oracle-database/18/tgdba/tuning-program-global-area.html#GUID-903DB64B-3019-4272-8812-4F00A05371FB

      Delete
  2. Hi Steven,
    in your experience, to reduce PGA memory whta do you think if we get in the array only the primary key or rowid and get record during elaboration?
    Thanks in advance.

    ReplyDelete
    Replies
    1. Giulio, that is a great idea. No reason to load up the collection with information that is not needed in the collection itself!

      Delete

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of work, revamping algorithms, ensuring correctness, you know the score. Then my eyes snagge

Quick Guide to User-Defined Types in Oracle PL/SQL

A Twitter follower recently asked for more information on user-defined types in the PL/SQL language, and I figured the best way to answer is to offer up this blog post. PL/SQL is a strongly-typed language . Before you can work with a variable or constant, it must be declared with a type (yes, PL/SQL also supports lots of implicit conversions from one type to another, but still, everything must be declared with a type). PL/SQL offers a wide array of pre-defined data types , both in the language natively (such as VARCHAR2, PLS_INTEGER, BOOLEAN, etc.) and in a variety of supplied packages (e.g., the NUMBER_TABLE collection type in the DBMS_SQL package). Data types in PL/SQL can be scalars, such as strings and numbers, or composite (consisting of one or more scalars), such as record types, collection types and object types. You can't really declare your own "user-defined" scalars, though you can define subtypes  from those scalars, which can be very helpful from the p