Skip to main content

A Checklist for Bulk Processing Conversions in PL/SQL

Note: download the latest version of the checklist from OTN. You can comment on the checklist there, or on this blog post. Thanks!

Bulk processing (FORALL and BULK COLLECT) is pretty much the most powerful and important performance feature available for the PL/SQL language. It addresses head-on one of the most common drags on performance in Oracle Database applications: row-by-row processing.

The basic "anti-pattern" to look for in your code is a loop that contains non-query DML (insert, update, delete merge). This "DML loop" could be in PL/SQL code, but it could also appear in Java, Python, C, etc. Here's an example (DML statements in red):
PROCEDURE upd_for_dept (
   dept_in     IN employees.department_id%TYPE
 , newsal_in   IN employees.salary%TYPE)
IS
   CURSOR emp_cur
   IS
      SELECT employee_id, salary, hire_date
        FROM employees
       WHERE department_id = dept_in;
BEGIN
   FOR rec IN emp_cur
   LOOP
      BEGIN
         INSERT INTO employee_history (employee_id, salary, hire_date)
              VALUES (rec.employee_id, rec.salary, rec.hire_date);

         rec.salary := newsal_in;

         calc_pkg.adjust_compensation (rec.employee_id, rec.salary);

         UPDATE employees
            SET salary = rec.salary
          WHERE employee_id = rec.employee_id;
      EXCEPTION
         WHEN OTHERS
         THEN
            log_error;
      END;
   END LOOP;
END upd_for_dept;
The process by which you convert from row-by-row (or as Tom Kyte likes to call it, "slow by slow") to bulk processing is straightforward but very much non-trivial. When you are done, your code volume and complexity will have increased, usually substantially. But when you can achieve an order of magnitude improvement in performance, it's a price worth paying.

While the conversion process is straightforward, it is also a process you need to follow with great care. After all, you are modifying transaction-related code (changes to tables). There is no room for error; you can't afford to mess up those transactions or allow bad data to make its way into your database.

To help you do it right, I created a checklist of items to think through and take care of in your code as you move from row-by-row to bulk. I offer the checklist below, but the latest version can always be found on the Oracle Technology Network.

Scenario: You have a loop in your code that executes one or more non-query DML statements inside of it. It is running too slowly and you need to make it go faster. Can FORALL with BULK COLLECT help? Follow this checklist for a successful conversion.


  [  ]

1. Analyze and document all aspects of your transaction.

Behavior post-conversion must be the same. Of particular importance is to be clear on what happens when an error occurs. Do you stop immediately? Do you log the error and continue? If multiple DML statements inside the loop, how does an error in one statement affect what happens in the other statements? Do you commit at the end of the process, with each iteration of the loop?

  [  ]

2. Build a regression test from the above analysis.

Sorry, but it must be said: the best (only?) way to ensure that your bulk-ified logic is error-free is to turn your documentation of current behavior into a regression test to also run against the new code. Check out SQL Developer's unit testing feature, utPLSQL or another framework.

  [  ]

3. Make sure you need to use PL/SQL's bulk processing features.

Can you do everything you need in "pure" SQL? Can you leverage DBMS_PARALLEL_EXECUTE or pipelined table functions? Generally, you need a PL/SQL-driven solution if you need to perform complex procedural steps on each row of data. So you do that first in PL/SQL on collections, then push back to the database.

  [  ]
4. Implement bulk processing phases as nested subprograms.

There are three basic "phases" with bulk processing:
1. Load collections, usually with BULK COLLECT.
2. Modify collections (complex procedural logic). That's why you can't use "pure" SQL. 3. Push collections to table(s) with FORALL. One for each DML statement.

Bulk code can get very long and complex, so the best approach is to encapsulate each step inside its own subprogram and keep the main body short and readable. And if any of your subprograms has an OUT or IN OUT collection type parameter, don't forget to add the NOCOPY hint!

  [  ]

5. Use BULK COLLECT LIMIT and soft-code the limit.

Avoid "unlimited" SELECT BULK COLLECT INTO statements, since you could use too much PGA memory (now or in the future). Instead, declare a cursor, and within a simple loop FETCH BULK COLLECT INTO with a LIMIT clause. Soft-code the limit, preferably as a parameter to the subprogram. And keep in mind:
  • A default of 100 is a good start but be ready to experiment with larger numbers.
  • To terminate the loop, use EXIT WHEN your_collection.COUNT = 0; immediately after FETCH or EXIT WHEN your_cursor%NOTFOUND; just before END LOOP statement.
  [  ]
6. Determine how to communicate between FORALL steps.

If your loop has > 1 DML statement, it will have > 1 FORALL statement. An error in one DML statement often affects what happens in a subsequent DML statement. So when you switch to bulk, you must communicate the impact of failures in one FORALL to another, later one. SAVE EXCEPTIONS and SQL%BULK_EXCEPTIONS will be key for this.

  [  ]
7. FORALL bind arrays sparse or selectively using bind array elements?

BULK COLLECT always fills sequentially, but sometimes you will have sparse arrays or need to selectively use elements in your bind array for DML processing. In this case, switch from FORALL indx IN low .. high to FORALL indx IN INDICES OF or FORALL indx IN VALUES OF.

Thanks to padders, rp2043 and BluShadow for improving this checklist.

Links to resources on bulk processing:


Comments

  1. Hi Steven,

    I Have a query

    How to assign Nested array or Varray or Associative array value with comma separated into a variable.
    e.g. nested array returns value in row
    Steven
    Frank
    Mark
    but I need array value like "Steven","Frank","Mark" or else these array value in JSON format
    ["Steven","Frank","Mark"].
    Please provide your suggestion how to achieve it.
    My requirement is pass millions of client id in JSON array format ,So that I can call web service via plsql block and passing this value using Bulk method.

    ReplyDelete
    Replies
    1. I encourage you to submit your question to asktom.oracle.com - that's make more sense than answering this question "out of scope." But I would certainly take a look at LISTAGG.

      Delete
    2. Thanks Steven.

      If data in millions then LISTAGG failed to convert it.

      Delete

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of wo...

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch, ...

PL/SQL 101: Save your source code to files

PL/SQL is a database programming language. This means that your source code is compiled into  and executed from within the Oracle Database. There are many fantastic consequences of this fact, many of which are explored in Bryn Llewellyn's Why Use PL/SQL? whitepaper. But this also can mean that developers see the database as the natural repository for the original source code , and this is a bad mistake to make. It's not the sort of mistake any JavaScript or Java or php developer would ever make, because that code is not compiled into the database (well, you can  compile Java into the database, but that's not where 99.99% of all Java code lives). But it's a mistake that apparently too many Oracle Database developers make. So here's the bottom line: Store each PL/SQL program unit in its own file . Use a source code control system to manage those files. Compile them into the database as needed for development and testing. In other words: you should never kee...