Skip to main content

A Checklist for Bulk Processing Conversions in PL/SQL

Note: download the latest version of the checklist from OTN. You can comment on the checklist there, or on this blog post. Thanks!

Bulk processing (FORALL and BULK COLLECT) is pretty much the most powerful and important performance feature available for the PL/SQL language. It addresses head-on one of the most common drags on performance in Oracle Database applications: row-by-row processing.

The basic "anti-pattern" to look for in your code is a loop that contains non-query DML (insert, update, delete merge). This "DML loop" could be in PL/SQL code, but it could also appear in Java, Python, C, etc. Here's an example (DML statements in red):
PROCEDURE upd_for_dept (
   dept_in     IN employees.department_id%TYPE
 , newsal_in   IN employees.salary%TYPE)
IS
   CURSOR emp_cur
   IS
      SELECT employee_id, salary, hire_date
        FROM employees
       WHERE department_id = dept_in;
BEGIN
   FOR rec IN emp_cur
   LOOP
      BEGIN
         INSERT INTO employee_history (employee_id, salary, hire_date)
              VALUES (rec.employee_id, rec.salary, rec.hire_date);

         rec.salary := newsal_in;

         calc_pkg.adjust_compensation (rec.employee_id, rec.salary);

         UPDATE employees
            SET salary = rec.salary
          WHERE employee_id = rec.employee_id;
      EXCEPTION
         WHEN OTHERS
         THEN
            log_error;
      END;
   END LOOP;
END upd_for_dept;
The process by which you convert from row-by-row (or as Tom Kyte likes to call it, "slow by slow") to bulk processing is straightforward but very much non-trivial. When you are done, your code volume and complexity will have increased, usually substantially. But when you can achieve an order of magnitude improvement in performance, it's a price worth paying.

While the conversion process is straightforward, it is also a process you need to follow with great care. After all, you are modifying transaction-related code (changes to tables). There is no room for error; you can't afford to mess up those transactions or allow bad data to make its way into your database.

To help you do it right, I created a checklist of items to think through and take care of in your code as you move from row-by-row to bulk. I offer the checklist below, but the latest version can always be found on the Oracle Technology Network.

Scenario: You have a loop in your code that executes one or more non-query DML statements inside of it. It is running too slowly and you need to make it go faster. Can FORALL with BULK COLLECT help? Follow this checklist for a successful conversion.


  [  ]

1. Analyze and document all aspects of your transaction.

Behavior post-conversion must be the same. Of particular importance is to be clear on what happens when an error occurs. Do you stop immediately? Do you log the error and continue? If multiple DML statements inside the loop, how does an error in one statement affect what happens in the other statements? Do you commit at the end of the process, with each iteration of the loop?

  [  ]

2. Build a regression test from the above analysis.

Sorry, but it must be said: the best (only?) way to ensure that your bulk-ified logic is error-free is to turn your documentation of current behavior into a regression test to also run against the new code. Check out SQL Developer's unit testing feature, utPLSQL or another framework.

  [  ]

3. Make sure you need to use PL/SQL's bulk processing features.

Can you do everything you need in "pure" SQL? Can you leverage DBMS_PARALLEL_EXECUTE or pipelined table functions? Generally, you need a PL/SQL-driven solution if you need to perform complex procedural steps on each row of data. So you do that first in PL/SQL on collections, then push back to the database.

  [  ]
4. Implement bulk processing phases as nested subprograms.

There are three basic "phases" with bulk processing:
1. Load collections, usually with BULK COLLECT.
2. Modify collections (complex procedural logic). That's why you can't use "pure" SQL. 3. Push collections to table(s) with FORALL. One for each DML statement.

Bulk code can get very long and complex, so the best approach is to encapsulate each step inside its own subprogram and keep the main body short and readable. And if any of your subprograms has an OUT or IN OUT collection type parameter, don't forget to add the NOCOPY hint!

  [  ]

5. Use BULK COLLECT LIMIT and soft-code the limit.

Avoid "unlimited" SELECT BULK COLLECT INTO statements, since you could use too much PGA memory (now or in the future). Instead, declare a cursor, and within a simple loop FETCH BULK COLLECT INTO with a LIMIT clause. Soft-code the limit, preferably as a parameter to the subprogram. And keep in mind:
  • A default of 100 is a good start but be ready to experiment with larger numbers.
  • To terminate the loop, use EXIT WHEN your_collection.COUNT = 0; immediately after FETCH or EXIT WHEN your_cursor%NOTFOUND; just before END LOOP statement.
  [  ]
6. Determine how to communicate between FORALL steps.

If your loop has > 1 DML statement, it will have > 1 FORALL statement. An error in one DML statement often affects what happens in a subsequent DML statement. So when you switch to bulk, you must communicate the impact of failures in one FORALL to another, later one. SAVE EXCEPTIONS and SQL%BULK_EXCEPTIONS will be key for this.

  [  ]
7. FORALL bind arrays sparse or selectively using bind array elements?

BULK COLLECT always fills sequentially, but sometimes you will have sparse arrays or need to selectively use elements in your bind array for DML processing. In this case, switch from FORALL indx IN low .. high to FORALL indx IN INDICES OF or FORALL indx IN VALUES OF.

Thanks to padders, rp2043 and BluShadow for improving this checklist.

Links to resources on bulk processing:


Comments

  1. Hi Steven,

    I Have a query

    How to assign Nested array or Varray or Associative array value with comma separated into a variable.
    e.g. nested array returns value in row
    Steven
    Frank
    Mark
    but I need array value like "Steven","Frank","Mark" or else these array value in JSON format
    ["Steven","Frank","Mark"].
    Please provide your suggestion how to achieve it.
    My requirement is pass millions of client id in JSON array format ,So that I can call web service via plsql block and passing this value using Bulk method.

    ReplyDelete
    Replies
    1. I encourage you to submit your question to asktom.oracle.com - that's make more sense than answering this question "out of scope." But I would certainly take a look at LISTAGG.

      Delete
    2. Thanks Steven.

      If data in millions then LISTAGG failed to convert it.

      Delete

Post a Comment

Popular posts from this blog

Quick Guide to User-Defined Types in Oracle PL/SQL

A Twitter follower recently asked for more information on user-defined types in the PL/SQL language, and I figured the best way to answer is to offer up this blog post. PL/SQL is a strongly-typed language . Before you can work with a variable or constant, it must be declared with a type (yes, PL/SQL also supports lots of implicit conversions from one type to another, but still, everything must be declared with a type). PL/SQL offers a wide array of pre-defined data types , both in the language natively (such as VARCHAR2, PLS_INTEGER, BOOLEAN, etc.) and in a variety of supplied packages (e.g., the NUMBER_TABLE collection type in the DBMS_SQL package). Data types in PL/SQL can be scalars, such as strings and numbers, or composite (consisting of one or more scalars), such as record types, collection types and object types. You can't really declare your own "user-defined" scalars, though you can define subtypes  from those scalars, which can be very helpful from the p

The differences between deterministic and result cache features

 EVERY once in a while, a developer gets in touch with a question like this: I am confused about the exact difference between deterministic and result_cache. Do they have different application use cases? I have used deterministic feature in many functions which retrieve data from some lookup tables. Is it essential to replace these 'deterministic' key words with 'result_cache'?  So I thought I'd write a post about the differences between these two features. But first, let's make sure we all understand what it means for a function to be  deterministic. From Wikipedia : In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.  Another way of putting this is that a deterministic subprogram (procedure or function) has no side-effects. If you pass a certain set of arguments for the parameters, you will always get

My two favorite APEX 5 features: Regional Display Selector and Cards

We (the over-sized development team for the PL/SQL Challenge - myself and my son, Eli) have been busy creating a new website on top of the PLCH platform (tables and packages): The Oracle Dev Gym! In a few short months (and just a part time involvement by yours truly), we have leveraged Oracle Application Express 5 to create what I think is an elegant, easy-to-use site that our users will absolutely love.  We plan to initially make the Dev Gym available only for current users of PL/SQL Challenge, so we can get feedback from our loyal user base. We will make the necessary adjustments and then offer it for general availability later this year. Anyway, more on that as the date approaches (the date being June 27, the APEX Open Mic Night at Kscope16 , where I will present it to a packed room of APEX experts). What I want to talk about today are two features of APEX that are making me so happy these days: Regional Display Selector and Cards. Regional Display Sel