Skip to main content

Best Type of Collection for FORALL?

I recently received this question in my In Box:

Is FORALL faster with Associative Arrays or Nested Tables? Oracle 12.2 documentation says: "The most efficient way to pass collections to and from the database server is to use associative arrays with the FORALL statement or BULK COLLECT clause." And a blog post claims Associative Arrays with "indices of" option is fastest in 10.2. Just wondering if you have noticed any differences and if so, how much faster Associative Arrays are in 12.2 than Nested Tables?

Quick Answer

  1. There is no significant difference I can see in the performance based on different collection types (with the limited tests I have run).
  2. Don't trust performance tests run on very old versions of Oracle Database (e.g,, 10.2).
  3. Use the documentation as a starting, not ending, point of your exploration.
  4. Try it yourself! Writing and running the code will teach you more than reading the doc or my blog post.
If you would like to read further, I will show you the code I used in my tests.

You can run these yourself on LiveSQL. I encourage you to build your own tests and post them on LiveSQL as well. I will be happy to add links in this blog post if you send them to me.

I create a table into which I will insert rows. I then create a package to help me calculate elapsed time of my test code.
CREATE TABLE parts
(
   partnum    NUMBER,
   partname   VARCHAR2 (15)
)
/

CREATE OR REPLACE PACKAGE tmr
IS
   PROCEDURE start_timer;

   PROCEDURE show_elapsed_time (message_in IN VARCHAR2);
END;
/

CREATE OR REPLACE PACKAGE BODY tmr
IS
   l_start   INTEGER;

   PROCEDURE start_timer
   IS
   BEGIN
      l_start := DBMS_UTILITY.get_cpu_time;
   END start_timer;

   PROCEDURE show_elapsed_time (message_in IN VARCHAR2)
   IS
   BEGIN
      DBMS_OUTPUT.put_line (
            CASE
               WHEN message_in IS NULL THEN 'Completed in:'
               ELSE '"' || message_in || '" completed in: '
            END
         || (DBMS_UTILITY.get_cpu_time - l_start)
         || ' cs');

      /* Reset timer */
      start_timer;
   END show_elapsed_time;
END;
/

With Associative Arrays

Otherwise known as "index by tables" because of their use of the INDEX BY clause, these collections have been around in PL/SQL since 7.3, greatly enhanced over the years to offer string indexing and other fun stuff.

In the block below, I populate two collections, once with the part IDs and the other with the part names. I then use FORALL to insert them into the table. I use the tmr subprograms to set the starting time, and then display the number of hundredths of seconds (centi-seconds) that elapsed to do the FORALL inserts.
DECLARE
   PROCEDURE compare_inserting (num IN INTEGER)
   IS
      TYPE numtab IS TABLE OF parts.partnum%TYPE
         INDEX BY PLS_INTEGER;

      TYPE nametab IS TABLE OF parts.partname%TYPE
         INDEX BY PLS_INTEGER;

      pnums       numtab;
      pnames      nametab;
   BEGIN
      FOR indx IN 1 .. num
      LOOP
         pnums (indx) := indx;
         pnames (indx) := 'Part ' || TO_CHAR (indx);
      END LOOP;

      tmr.start_timer;

      FORALL indx IN 1 .. num
         INSERT INTO parts
              VALUES (pnums (indx), pnames (indx));

      tmr.show_elapsed_time ('Inserted ' || num || ' rows');

      ROLLBACK;
   END;
BEGIN
   DBMS_OUTPUT.put_line ('Associative Arrays');
   compare_inserting (1000000);
END;

With Nested Tables

The nested table type was added way back in Oracle8, as part of the object-relational model. The definition of a nested table type looks just like the associative array, without the INDEX BY clause. That's because, officially, nested tables have no index, no order to their elements - they are multisets. But in practice, you can use the index value, which we certainly do in the FORALL statement.

With nested tables, you must initialize the collection using its constructor function, and explicitly extend the collection to make room for new elements.
DECLARE
   PROCEDURE compare_inserting (num IN INTEGER)
   IS
      TYPE numtab IS TABLE OF parts.partnum%TYPE;

      TYPE nametab IS TABLE OF parts.partname%TYPE;

      pnums       numtab := numtab();
      pnames      nametab := nametab();
   BEGIN
      pnums.extend (num);
      pnames.extend (num);
      
      FOR indx IN 1 .. num
      LOOP
         pnums (indx) := indx;
         pnames (indx) := 'Part ' || TO_CHAR (indx);
      END LOOP;

      tmr.start_timer;

      FORALL indx IN 1 .. num
         INSERT INTO parts
              VALUES (pnums (indx), pnames (indx));

      tmr.show_elapsed_time ('Inserted ' || num || ' rows');

      ROLLBACK;
   END;
BEGIN
   DBMS_OUTPUT.put_line ('Nested Table');
   compare_inserting (1000000);
END;

With Varrays

Finally, varrays - a shortening of "varying arrays." Which is kind of funny, because varrays are the least varying of all the collection types. With varrays, you have to declare the maximum number of elements allowed in the collection up front.

With varrays, like nested tables, you must initialize the collection using its constructor function, and explicitly extend the collection to make room for new elements.
DECLARE
   PROCEDURE compare_inserting (num IN INTEGER)
   IS
      TYPE numtab IS VARRAY (1000000) OF parts.partnum%TYPE;

      TYPE nametab IS VARRAY (1000000) OF parts.partname%TYPE;

      pnums       numtab := numtab();
      pnames      nametab := nametab();
   BEGIN
      pnums.extend (num);
      pnames.extend (num);
      
      FOR indx IN 1 .. num
      LOOP
         pnums (indx) := indx;
         pnames (indx) := 'Part ' || TO_CHAR (indx);
      END LOOP;

      tmr.start_timer;

      FORALL indx IN 1 .. num
         INSERT INTO parts
              VALUES (pnums (indx), pnames (indx));

      tmr.show_elapsed_time ('Inserted ' || num || ' rows');

      ROLLBACK;
   END;
BEGIN
   DBMS_OUTPUT.put_line ('Varrays');
   compare_inserting (1000000);
END;

Results

I ran each of the blocks three times, with the results you see below. I conclude that there is no significant difference in performance based on the type of collection you choose.
Associative Arrays
"Inserted 1000000 rows" completed in: 84 cs
"Inserted 1000000 rows" completed in: 78 cs
"Inserted 1000000 rows" completed in: 77 cs

Nested Table
"Inserted 1000000 rows" completed in: 78 cs
"Inserted 1000000 rows" completed in: 76 cs
"Inserted 1000000 rows" completed in: 77 cs

Varrays
"Inserted 1000000 rows" completed in: 81 cs
"Inserted 1000000 rows" completed in: 77 cs
"Inserted 1000000 rows" completed in: 77 cs
Now, obviously, this is not a comprehensive test. Will things go more slowly with nested tables for updates instead of inserts? Are associative arrays faster with small sets of data and nested tables better with super-duper large datasets? But it seems rather unlikely that these factors would affect overall performance, so I am not going to build scripts for those.

I do, however, encourage each of you to try this yourself. Feel free to download and modify my LiveSQL script. Test some variations, let me know what you discover.

In the meantime, I suggest that you pick the collection type that allows you to most easily build and manage the collections you will feed into the FORALL statement. Associative arrays are particularly good with sparse collections (not every index value filled between first and last). Nested tables offer lots of features (not connected to FORALL) for set-oriented management of collection contents. Varrays: well, honestly, I doubt you'll ever use a array. It's intended more for usage as nested columns in relational tables, offering some performance advantages there.

Comments

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of wo...

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch, ...

PL/SQL 101: Three ways to get error message/stack in PL/SQL

The PL/SQL Challenge quiz for 10 September - 16 September 2016 explored the different ways you can obtain the error message / stack in PL/SQL. Note: an error stack is a sequence of multiple error messages that can occur when an exception is propagated and re-raised through several layers of nested blocks. The three ways are: SQLERRM - The original, traditional and (oddly enough) not currently recommended function to get the current error message. Not recommended because the next two options avoid a problem which you are unlikely  to run into: the error stack will be truncated at 512 bytes, and you might lose some error information. DBMS_UTILITY.FORMAT_ERROR_STACK - Returns the error message / stack, and will not truncate your string like SQLERRM will. UTL_CALL_STACK API - Added in Oracle Database 12c, the UTL_CALL_STACK package offers a comprehensive API into the execution call stack, the error stack and the error backtrace.  Note: check out this LiveSQL script if...