Skip to main content

Best Type of Collection for FORALL?

I recently received this question in my In Box:

Is FORALL faster with Associative Arrays or Nested Tables? Oracle 12.2 documentation says: "The most efficient way to pass collections to and from the database server is to use associative arrays with the FORALL statement or BULK COLLECT clause." And a blog post claims Associative Arrays with "indices of" option is fastest in 10.2. Just wondering if you have noticed any differences and if so, how much faster Associative Arrays are in 12.2 than Nested Tables?

Quick Answer

  1. There is no significant difference I can see in the performance based on different collection types (with the limited tests I have run).
  2. Don't trust performance tests run on very old versions of Oracle Database (e.g,, 10.2).
  3. Use the documentation as a starting, not ending, point of your exploration.
  4. Try it yourself! Writing and running the code will teach you more than reading the doc or my blog post.
If you would like to read further, I will show you the code I used in my tests.

You can run these yourself on LiveSQL. I encourage you to build your own tests and post them on LiveSQL as well. I will be happy to add links in this blog post if you send them to me.

I create a table into which I will insert rows. I then create a package to help me calculate elapsed time of my test code.
CREATE TABLE parts
(
   partnum    NUMBER,
   partname   VARCHAR2 (15)
)
/

CREATE OR REPLACE PACKAGE tmr
IS
   PROCEDURE start_timer;

   PROCEDURE show_elapsed_time (message_in IN VARCHAR2);
END;
/

CREATE OR REPLACE PACKAGE BODY tmr
IS
   l_start   INTEGER;

   PROCEDURE start_timer
   IS
   BEGIN
      l_start := DBMS_UTILITY.get_cpu_time;
   END start_timer;

   PROCEDURE show_elapsed_time (message_in IN VARCHAR2)
   IS
   BEGIN
      DBMS_OUTPUT.put_line (
            CASE
               WHEN message_in IS NULL THEN 'Completed in:'
               ELSE '"' || message_in || '" completed in: '
            END
         || (DBMS_UTILITY.get_cpu_time - l_start)
         || ' cs');

      /* Reset timer */
      start_timer;
   END show_elapsed_time;
END;
/

With Associative Arrays

Otherwise known as "index by tables" because of their use of the INDEX BY clause, these collections have been around in PL/SQL since 7.3, greatly enhanced over the years to offer string indexing and other fun stuff.

In the block below, I populate two collections, once with the part IDs and the other with the part names. I then use FORALL to insert them into the table. I use the tmr subprograms to set the starting time, and then display the number of hundredths of seconds (centi-seconds) that elapsed to do the FORALL inserts.
DECLARE
   PROCEDURE compare_inserting (num IN INTEGER)
   IS
      TYPE numtab IS TABLE OF parts.partnum%TYPE
         INDEX BY PLS_INTEGER;

      TYPE nametab IS TABLE OF parts.partname%TYPE
         INDEX BY PLS_INTEGER;

      pnums       numtab;
      pnames      nametab;
   BEGIN
      FOR indx IN 1 .. num
      LOOP
         pnums (indx) := indx;
         pnames (indx) := 'Part ' || TO_CHAR (indx);
      END LOOP;

      tmr.start_timer;

      FORALL indx IN 1 .. num
         INSERT INTO parts
              VALUES (pnums (indx), pnames (indx));

      tmr.show_elapsed_time ('Inserted ' || num || ' rows');

      ROLLBACK;
   END;
BEGIN
   DBMS_OUTPUT.put_line ('Associative Arrays');
   compare_inserting (1000000);
END;

With Nested Tables

The nested table type was added way back in Oracle8, as part of the object-relational model. The definition of a nested table type looks just like the associative array, without the INDEX BY clause. That's because, officially, nested tables have no index, no order to their elements - they are multisets. But in practice, you can use the index value, which we certainly do in the FORALL statement.

With nested tables, you must initialize the collection using its constructor function, and explicitly extend the collection to make room for new elements.
DECLARE
   PROCEDURE compare_inserting (num IN INTEGER)
   IS
      TYPE numtab IS TABLE OF parts.partnum%TYPE;

      TYPE nametab IS TABLE OF parts.partname%TYPE;

      pnums       numtab := numtab();
      pnames      nametab := nametab();
   BEGIN
      pnums.extend (num);
      pnames.extend (num);
      
      FOR indx IN 1 .. num
      LOOP
         pnums (indx) := indx;
         pnames (indx) := 'Part ' || TO_CHAR (indx);
      END LOOP;

      tmr.start_timer;

      FORALL indx IN 1 .. num
         INSERT INTO parts
              VALUES (pnums (indx), pnames (indx));

      tmr.show_elapsed_time ('Inserted ' || num || ' rows');

      ROLLBACK;
   END;
BEGIN
   DBMS_OUTPUT.put_line ('Nested Table');
   compare_inserting (1000000);
END;

With Varrays

Finally, varrays - a shortening of "varying arrays." Which is kind of funny, because varrays are the least varying of all the collection types. With varrays, you have to declare the maximum number of elements allowed in the collection up front.

With varrays, like nested tables, you must initialize the collection using its constructor function, and explicitly extend the collection to make room for new elements.
DECLARE
   PROCEDURE compare_inserting (num IN INTEGER)
   IS
      TYPE numtab IS VARRAY (1000000) OF parts.partnum%TYPE;

      TYPE nametab IS VARRAY (1000000) OF parts.partname%TYPE;

      pnums       numtab := numtab();
      pnames      nametab := nametab();
   BEGIN
      pnums.extend (num);
      pnames.extend (num);
      
      FOR indx IN 1 .. num
      LOOP
         pnums (indx) := indx;
         pnames (indx) := 'Part ' || TO_CHAR (indx);
      END LOOP;

      tmr.start_timer;

      FORALL indx IN 1 .. num
         INSERT INTO parts
              VALUES (pnums (indx), pnames (indx));

      tmr.show_elapsed_time ('Inserted ' || num || ' rows');

      ROLLBACK;
   END;
BEGIN
   DBMS_OUTPUT.put_line ('Varrays');
   compare_inserting (1000000);
END;

Results

I ran each of the blocks three times, with the results you see below. I conclude that there is no significant difference in performance based on the type of collection you choose.
Associative Arrays
"Inserted 1000000 rows" completed in: 84 cs
"Inserted 1000000 rows" completed in: 78 cs
"Inserted 1000000 rows" completed in: 77 cs

Nested Table
"Inserted 1000000 rows" completed in: 78 cs
"Inserted 1000000 rows" completed in: 76 cs
"Inserted 1000000 rows" completed in: 77 cs

Varrays
"Inserted 1000000 rows" completed in: 81 cs
"Inserted 1000000 rows" completed in: 77 cs
"Inserted 1000000 rows" completed in: 77 cs
Now, obviously, this is not a comprehensive test. Will things go more slowly with nested tables for updates instead of inserts? Are associative arrays faster with small sets of data and nested tables better with super-duper large datasets? But it seems rather unlikely that these factors would affect overall performance, so I am not going to build scripts for those.

I do, however, encourage each of you to try this yourself. Feel free to download and modify my LiveSQL script. Test some variations, let me know what you discover.

In the meantime, I suggest that you pick the collection type that allows you to most easily build and manage the collections you will feed into the FORALL statement. Associative arrays are particularly good with sparse collections (not every index value filled between first and last). Nested tables offer lots of features (not connected to FORALL) for set-oriented management of collection contents. Varrays: well, honestly, I doubt you'll ever use a array. It's intended more for usage as nested columns in relational tables, offering some performance advantages there.

Comments

Popular posts from this blog

Quick Guide to User-Defined Types in Oracle PL/SQL

A Twitter follower recently asked for more information on user-defined types in the PL/SQL language, and I figured the best way to answer is to offer up this blog post. PL/SQL is a strongly-typed language . Before you can work with a variable or constant, it must be declared with a type (yes, PL/SQL also supports lots of implicit conversions from one type to another, but still, everything must be declared with a type). PL/SQL offers a wide array of pre-defined data types , both in the language natively (such as VARCHAR2, PLS_INTEGER, BOOLEAN, etc.) and in a variety of supplied packages (e.g., the NUMBER_TABLE collection type in the DBMS_SQL package). Data types in PL/SQL can be scalars, such as strings and numbers, or composite (consisting of one or more scalars), such as record types, collection types and object types. You can't really declare your own "user-defined" scalars, though you can define subtypes  from those scalars, which can be very helpful from the p

The differences between deterministic and result cache features

 EVERY once in a while, a developer gets in touch with a question like this: I am confused about the exact difference between deterministic and result_cache. Do they have different application use cases? I have used deterministic feature in many functions which retrieve data from some lookup tables. Is it essential to replace these 'deterministic' key words with 'result_cache'?  So I thought I'd write a post about the differences between these two features. But first, let's make sure we all understand what it means for a function to be  deterministic. From Wikipedia : In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.  Another way of putting this is that a deterministic subprogram (procedure or function) has no side-effects. If you pass a certain set of arguments for the parameters, you will always get

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch,