Skip to main content

A Checklist for Bulk Processing Conversions in PL/SQL

Note: download the latest version of the checklist from OTN. You can comment on the checklist there, or on this blog post. Thanks!

Bulk processing (FORALL and BULK COLLECT) is pretty much the most powerful and important performance feature available for the PL/SQL language. It addresses head-on one of the most common drags on performance in Oracle Database applications: row-by-row processing.

The basic "anti-pattern" to look for in your code is a loop that contains non-query DML (insert, update, delete merge). This "DML loop" could be in PL/SQL code, but it could also appear in Java, Python, C, etc. Here's an example (DML statements in red):
PROCEDURE upd_for_dept (
   dept_in     IN employees.department_id%TYPE
 , newsal_in   IN employees.salary%TYPE)
IS
   CURSOR emp_cur
   IS
      SELECT employee_id, salary, hire_date
        FROM employees
       WHERE department_id = dept_in;
BEGIN
   FOR rec IN emp_cur
   LOOP
      BEGIN
         INSERT INTO employee_history (employee_id, salary, hire_date)
              VALUES (rec.employee_id, rec.salary, rec.hire_date);

         rec.salary := newsal_in;

         calc_pkg.adjust_compensation (rec.employee_id, rec.salary);

         UPDATE employees
            SET salary = rec.salary
          WHERE employee_id = rec.employee_id;
      EXCEPTION
         WHEN OTHERS
         THEN
            log_error;
      END;
   END LOOP;
END upd_for_dept;
The process by which you convert from row-by-row (or as Tom Kyte likes to call it, "slow by slow") to bulk processing is straightforward but very much non-trivial. When you are done, your code volume and complexity will have increased, usually substantially. But when you can achieve an order of magnitude improvement in performance, it's a price worth paying.

While the conversion process is straightforward, it is also a process you need to follow with great care. After all, you are modifying transaction-related code (changes to tables). There is no room for error; you can't afford to mess up those transactions or allow bad data to make its way into your database.

To help you do it right, I created a checklist of items to think through and take care of in your code as you move from row-by-row to bulk. I offer the checklist below, but the latest version can always be found on the Oracle Technology Network.

Scenario: You have a loop in your code that executes one or more non-query DML statements inside of it. It is running too slowly and you need to make it go faster. Can FORALL with BULK COLLECT help? Follow this checklist for a successful conversion.


  [  ]

1. Analyze and document all aspects of your transaction.

Behavior post-conversion must be the same. Of particular importance is to be clear on what happens when an error occurs. Do you stop immediately? Do you log the error and continue? If multiple DML statements inside the loop, how does an error in one statement affect what happens in the other statements? Do you commit at the end of the process, with each iteration of the loop?

  [  ]

2. Build a regression test from the above analysis.

Sorry, but it must be said: the best (only?) way to ensure that your bulk-ified logic is error-free is to turn your documentation of current behavior into a regression test to also run against the new code. Check out SQL Developer's unit testing feature, utPLSQL or another framework.

  [  ]

3. Make sure you need to use PL/SQL's bulk processing features.

Can you do everything you need in "pure" SQL? Can you leverage DBMS_PARALLEL_EXECUTE or pipelined table functions? Generally, you need a PL/SQL-driven solution if you need to perform complex procedural steps on each row of data. So you do that first in PL/SQL on collections, then push back to the database.

  [  ]
4. Implement bulk processing phases as nested subprograms.

There are three basic "phases" with bulk processing:
1. Load collections, usually with BULK COLLECT.
2. Modify collections (complex procedural logic). That's why you can't use "pure" SQL. 3. Push collections to table(s) with FORALL. One for each DML statement.

Bulk code can get very long and complex, so the best approach is to encapsulate each step inside its own subprogram and keep the main body short and readable. And if any of your subprograms has an OUT or IN OUT collection type parameter, don't forget to add the NOCOPY hint!

  [  ]

5. Use BULK COLLECT LIMIT and soft-code the limit.

Avoid "unlimited" SELECT BULK COLLECT INTO statements, since you could use too much PGA memory (now or in the future). Instead, declare a cursor, and within a simple loop FETCH BULK COLLECT INTO with a LIMIT clause. Soft-code the limit, preferably as a parameter to the subprogram. And keep in mind:
  • A default of 100 is a good start but be ready to experiment with larger numbers.
  • To terminate the loop, use EXIT WHEN your_collection.COUNT = 0; immediately after FETCH or EXIT WHEN your_cursor%NOTFOUND; just before END LOOP statement.
  [  ]
6. Determine how to communicate between FORALL steps.

If your loop has > 1 DML statement, it will have > 1 FORALL statement. An error in one DML statement often affects what happens in a subsequent DML statement. So when you switch to bulk, you must communicate the impact of failures in one FORALL to another, later one. SAVE EXCEPTIONS and SQL%BULK_EXCEPTIONS will be key for this.

  [  ]
7. FORALL bind arrays sparse or selectively using bind array elements?

BULK COLLECT always fills sequentially, but sometimes you will have sparse arrays or need to selectively use elements in your bind array for DML processing. In this case, switch from FORALL indx IN low .. high to FORALL indx IN INDICES OF or FORALL indx IN VALUES OF.

Thanks to padders, rp2043 and BluShadow for improving this checklist.

Links to resources on bulk processing:


Comments

  1. Hi Steven,

    I Have a query

    How to assign Nested array or Varray or Associative array value with comma separated into a variable.
    e.g. nested array returns value in row
    Steven
    Frank
    Mark
    but I need array value like "Steven","Frank","Mark" or else these array value in JSON format
    ["Steven","Frank","Mark"].
    Please provide your suggestion how to achieve it.
    My requirement is pass millions of client id in JSON array format ,So that I can call web service via plsql block and passing this value using Bulk method.

    ReplyDelete
    Replies
    1. I encourage you to submit your question to asktom.oracle.com - that's make more sense than answering this question "out of scope." But I would certainly take a look at LISTAGG.

      Delete
    2. Thanks Steven.

      If data in millions then LISTAGG failed to convert it.

      Delete

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of wo...

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch, ...

PL/SQL 101: Three ways to get error message/stack in PL/SQL

The PL/SQL Challenge quiz for 10 September - 16 September 2016 explored the different ways you can obtain the error message / stack in PL/SQL. Note: an error stack is a sequence of multiple error messages that can occur when an exception is propagated and re-raised through several layers of nested blocks. The three ways are: SQLERRM - The original, traditional and (oddly enough) not currently recommended function to get the current error message. Not recommended because the next two options avoid a problem which you are unlikely  to run into: the error stack will be truncated at 512 bytes, and you might lose some error information. DBMS_UTILITY.FORMAT_ERROR_STACK - Returns the error message / stack, and will not truncate your string like SQLERRM will. UTL_CALL_STACK API - Added in Oracle Database 12c, the UTL_CALL_STACK package offers a comprehensive API into the execution call stack, the error stack and the error backtrace.  Note: check out this LiveSQL script if...