How to Pick the Limit for BULK COLLECT
In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit?First I give the quick answer, then I provide support for that answer
- Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption.
- Test to see if that's fast enough (likely will be for many cases).
- If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.
- Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification.
- Don't put anything in the collection you don't need. [from Giulio Dottorini]
- Remember: each session that runs this code will use that amount of memory.
When you use BULK COLLECT, you retrieve more than row with each fetch, reducing context switching between the SQL and PL/SQL engines, and thereby improving performance. Here, for example, is a small piece of code that gets all the rows from the employees table in one round-trip to the SQL engine, displays the number of elements in the collection, and then iterates through the collection displaying the last name.
DECLARE TYPE employees_t IS TABLE OF employees%ROWTYPE; l_employees employees_t; BEGIN SELECT * BULK COLLECT INTO l_employees FROM employees; DBMS_OUTPUT.put_line (l_employees.COUNT); FOR indx IN 1 .. l_employees.COUNT LOOP DBMS_OUTPUT.put_line (l_employees (indx).last_name); END LOOP; END; /
As you can see, we fetch those multiple rows into a collection (aka, array). That collection consumes per-session Process Global Area memory or PGA. So you face a typical tradeoff when it comes to performance optimization: reduce CPU cycles, but use more memory.
And with the above block of code, using an "unlimited" BULK COLLECT, you really are taking a risk of running out of memory. As the number of rows in the table grows, more memory will be consumed.
So the general recommendation for production code, working with tables that may grow greatly in size, is to avoid SELECT BULK COLLECT INTO (an implicit query) and instead use the FETCH BULK COLLECT with a LIMIT clause.
Here's a rewrite of the above block using the LIMIT clause, retrieving 100 rows with each fetch.
DECLARE c_limit CONSTANT PLS_INTEGER DEFAULT 100; CURSOR emp_cur IS SELECT * FROM employees; TYPE employee_aat IS TABLE OF employees%ROWTYPE INDEX BY BINARY_INTEGER; l_employee employee_aat; BEGIN OPEN emp_cur; LOOP FETCH emp_cur BULK COLLECT INTO l_employee LIMIT c_limit; EXIT WHEN l_employee.COUNT = 0; DBMS_OUTPUT.put_line ('Retrieved ' || l_employee.COUNT); FOR indx IN 1 .. l_employee.COUNT LOOP DBMS_OUTPUT.put_line ( l_employees (indx).employee_id || ' - ' || l_employees (indx).last_name); END LOOP; END LOOP; CLOSE emp_cur; END; /
Now, no matter how many rows are in the employees table, my session only uses the memory required for 100 rows.
Back in Oracle Database 10g, the (then) brand-new PL/SQL optimizer played a neat trick with cursor FOR loops: it automatically converted it to C code that retrieves 100 rows with each fetch! The thinking is that the amount of memory for most tables need for 100 rows is never going to be that much, and you get a really nice burst in performance with that level of "bulk". This LiveSQL script demonstrates that optimization.
And from tests various people are run over the years, increasing that limit to 500 or 1000 doesn't really seem to offer that much of an improvement.
But for sure if you are working your way through millions of rows, you might see a very nice boost with a limit of 10000 or more. You just need to keep an eye on memory consumption. And that will be much less of a concern if you are running a batch job, not writing code that will be run by many users simultaneously.
You might also find this StackOverflow Q&A helpful.
Don't put anything in the collection you don't need.
Giulio Dottorini made this suggestion in a comment on this post. It is a very good point. The more data you store in every element ("row") of the collection, the more PGA memory will be required to store that data. So don't put anything in the collection you don't need.
Take another look at the block above showing how to use the LIMIT clause. My query is "SELECT * FROM employees", but when I look at how I use the data in that collection, I only need employee_id and last_name. So this would make a lot more sense for the declaration of my cursor and then of the collection type itself:
DECLARE c_limit CONSTANT PLS_INTEGER DEFAULT 100; CURSOR emp_cur IS SELECT employee_id, last_name FROM employees; TYPE employee_aat IS TABLE OF emp_cur INDEX BY BINARY_INTEGER; l_employee employee_aat;
In fact, generally, you should declare your collection type based on the cursor you are fetching from.
Now my collection will use even less PGA - only what I need and nothing more. Thanks, Giulio!