Wednesday, March 16, 2016

A Checklist for Bulk Processing Conversions in PL/SQL

[ Note: download the latest version of the checklist from OTN. You can comment on the checklist there, or on this blog post. Thanks!]

Bulk processing (FORALL and BULK COLLECT) is pretty much the most powerful and important performance feature available for the PL/SQL language. It addresses head-on one of the most common drags on performance in Oracle Database applications: row-by-row processing.

The basic "anti-pattern" to look for in your code is a loop that contains non-query DML (insert, update, delete merge). This "DML loop" could be in PL/SQL code, but it could also appear in Java, Python, C, etc. Here's an example (DML statements in red):

PROCEDURE upd_for_dept (
   dept_in     IN employees.department_id%TYPE
 , newsal_in   IN employees.salary%TYPE)
   CURSOR emp_cur
      SELECT employee_id, salary, hire_date
        FROM employees
       WHERE department_id = dept_in;
   FOR rec IN emp_cur
         INSERT INTO employee_history (employee_id, salary, hire_date)
              VALUES (rec.employee_id, rec.salary, rec.hire_date);

         rec.salary := newsal_in;

         calc_pkg.adjust_compensation (rec.employee_id, rec.salary);

         UPDATE employees
            SET salary = rec.salary
          WHERE employee_id = rec.employee_id;
         WHEN OTHERS
END upd_for_dept;

The process by which you convert from row-by-row (or as Tom Kyte likes to call it, "slow by slow") to bulk processing is straightforward but very much non-trivial. When you are done, your code volume and complexity will have increased, usually substantially. But when you can achieve an order of magnitude improvement in performance, it's a price worth paying.

While the conversion process is straightforward, it is also a process you need to follow with great care. After all, you are modifying transaction-related code (changes to tables). There is no room for error; you can't afford to mess up those transactions or allow bad data to make its way into your database.

To help you do it right, I created a checklist of items to think through and take care of in your code as you move from row-by-row to bulk. I offer the checklist below, but the latest version can always be found on the Oracle Technology Network.

Scenario: You have a loop in your code that executes one or more non-query DML statements inside of it. It is running too slowly and you need to make it go faster. Can FORALL with BULK COLLECT help? Follow this checklist for a successful conversion.

1. Analyze and document all aspects of your transaction.
Behavior post-conversion must be the same. Of particular importance is to be clear on what happens when an error occurs. Do you stop immediately? Do you log the error and continue? If multiple DML statements inside the loop, how does an error in one statement affect what happens in the other statements? Do you commit at the end of the process, with each iteration of the loop?

2. Build a regression test from the above analysis.
Sorry, but it must be said: the best (only?) way to ensure that your bulk-ified logic is error-free is to turn your documentation of current behavior into a regression test to also run against the new code. Check out SQL Developer's unit testing feature, utPLSQL or another framework.

3. Make sure you need to use PL/SQL's bulk processing features.
Can you do everything you need in "pure" SQL? Can you leverage DBMS_PARALLEL_EXECUTE or pipelined table functions? Generally, you need a PL/SQL-driven solution if you need to perform complex procedural steps on each row of data. So you do that first in PL/SQL on collections, then push back to the database.

4. Implement bulk processing phases as nested subprograms.
There are three basic "phases" with bulk processing:
1. Load collections, usually with BULK COLLECT.
2. Modify collections (complex procedural logic). That's why you can't use "pure" SQL. 3. Push collections to table(s) with FORALL. One for each DML statement.

Bulk code can get very long and complex, so the best approach is to encapsulate each step inside its own subprogram and keep the main body short and readable. And if any of your subprograms has an OUT or IN OUT collection type parameter, don't forget to add the NOCOPY hint!

5. Use BULK COLLECT LIMIT and soft-code the limit.
Avoid "unlimited" SELECT BULK COLLECT INTO statements, since you could use too much PGA memory (now or in the future). Instead, declare a cursor, and within a simple loop FETCH BULK COLLECT INTO with a LIMIT clause. Soft-code the limit, preferably as a parameter to the subprogram. And keep in mind:
  • A default of 100 is a good start but be ready to experiment with larger numbers.
  • To terminate the loop, use EXIT WHEN your_collection.COUNT = 0; immediately after FETCH or EXIT WHEN your_cursor%NOTFOUND; just before END LOOP statement.
6. Determine how to communicate between FORALL steps.
If your loop has > 1 DML statement, it will have > 1 FORALL statement. An error in one DML statement often affects what happens in a subsequent DML statement. So when you switch to bulk, you must communicate the impact of failures in one FORALL to another, later one. SAVE EXCEPTIONS and SQL%BULK_EXCEPTIONS will be key for this.

7. FORALL bind arrays sparse or selectively using bind array elements?
BULK COLLECT always fills sequentially, but sometimes you will have sparse arrays or need to selectively use elements in your bind array for DML processing. In this case, switch from FORALL indx IN low .. high to FORALL indx IN INDICES OF or FORALL indx IN VALUES OF.

Thanks to padders, rp2043 and BluShadow for improving this checklist.

Links to resources on bulk processing:

No comments:

Post a Comment