Thursday, April 7, 2016

Tips for avoiding the "ORA-04030: out of process memory..." error

Process Global Area (PGA) memory is utilized to store and manage session-specific data (compared to the System Global Area or SGA, which manages cross-session memory; lots more details here). If a session uses too much PGA, it can fail with the "ORA-04030: out of process memory when trying to allocate..." error. PL/SQL offers several techniques for reducing PGA consumption, including pipelined table functions, the LIMIT clause of BULK COLLECT, and the NOCOPY hint.

The 2015 Oracle PL/SQL Championship at the PL/SQL Challenge offered a quiz on this topic. You can take the entire championship to test your own knowledge by registering at this quiz website and then clicking on the Tests tab.

In this post, I point how techniques for PGA reduction using the code from that quiz.

I have a table with lots of data:
CREATE TABLE plch_data (
   data_id     INTEGER PRIMARY KEY,
   data_name   VARCHAR2 (1000) UNIQUE)
/

BEGIN
   INSERT INTO plch_data (data_id, data_name)
          SELECT LEVEL, 'Big Data ' || LEVEL FROM DUAL
      CONNECT BY LEVEL < 100000;
   COMMIT;
END;   
/
I have a package that works with this table:
CREATE OR REPLACE PACKAGE plch_pkg
IS
   TYPE data_t IS TABLE OF 
         plch_data%ROWTYPE INDEX BY PLS_INTEGER;
   TYPE names_t IS TABLE OF plch_data.data_name%TYPE;

   PROCEDURE update_lots_of_data;
   FUNCTION return_lots_of_data RETURN names_t;
   PROCEDURE show_lots_of_data;

   FUNCTION one_data (data_id_in IN INTEGER) 
      RETURN plch_data%ROWTYPE;

   PROCEDURE change_data (
         data_in IN data_t, data_out OUT data_t);
END;
/

CREATE OR REPLACE PACKAGE BODY plch_pkg
IS
   PROCEDURE update_lots_of_data IS
      l_data   data_t;
   BEGIN
      SELECT * BULK COLLECT INTO l_data FROM plch_data;

      FORALL indx IN 1 .. l_data.COUNT
         UPDATE plch_data SET data_name = UPPER (data_name)
          WHERE data_id = l_data (indx).data_id;
   END;

   FUNCTION return_lots_of_data RETURN names_t IS
      l_data names_t;
   BEGIN
      SELECT data_name
        BULK COLLECT INTO l_data FROM plch_data;
      RETURN l_data;
   END;

   PROCEDURE show_lots_of_data IS
      CURSOR data_cur IS SELECT * FROM plch_data;
      rec   data_cur%ROWTYPE;
   BEGIN
      OPEN data_cur;
      LOOP
         FETCH data_cur INTO rec;
         EXIT WHEN data_cur%NOTFOUND;
         DBMS_OUTPUT.put_line (rec.data_name);
      END LOOP;
      CLOSE data_cur;
   END;

   FUNCTION one_data (data_id_in IN INTEGER) 
      RETURN plch_data%ROWTYPE 
   IS
      l_data   data_t;
   BEGIN
      SELECT * BULK COLLECT INTO l_data FROM plch_data
        ORDER BY data_id;
      RETURN l_data (data_id_in);
   END;

   PROCEDURE change_data (
      data_in IN data_t, data_out OUT data_t) IS
   BEGIN
      FOR indx IN 1 .. data_in.COUNT
      LOOP
         IF MOD (indx, 3) = 0 THEN
            data_out (indx).data_name := 
               UPPER (data_in (indx).data_name);
         END IF;
      END LOOP;
   END;
END;
/
And here's what I know about the application behavior:
  1. All subprograms are used extensively and repeatedly by 100s of simultaneous connections.
  2. All rows in plch_data are repeatedly fetched 1000s of times per minute through calls to plch_pkg.one_data.
  3. Existing rows in plch_data are changed on average once per hour. Rows are never inserted or deleted and primary key values are never changed.
  4. The return_lots_of_data function is only invoked from within the TABLE operator in a SELECT's FROM clause.
Oh, and I know something else as well: Users complain that they often get this error:
ORA-04030: out of process memory when trying to allocate 40932 bytes
So the question for me is now: how can I change the above code to minimize or completely avoid the chance of ORA-04030 rearing its ugly head?

The first step is to identify where and how PGA is being consumed in my code. Generally, whenever your program assigns a value to a variable or constant, it uses up some PGA memory. Also generally, the biggest consumers of PGAs are your biggest data structures: first and foremost, collections, secondarily, CLOBs, records, object types, etc.

This leads me to focus on the following possible culprits of PGA consumption:

1. The l_data variable in update_lots_of_data
2. The l_data variable in return_lots_of_data
3. The l_data variable in one_data
4. The two collection parameters of change_data

So let's go exploring the possible kinds of changes we can make to the plch_pkg ("plch" is the standard prefix we use for all PL/SQL Challenge quiz objects), and determine if they will help avoid the ORA-04030 error.

How about if I change update_lots_of_data to:
   PROCEDURE update_lots_of_data
   IS
   BEGIN
      UPDATE plch_data
         SET data_name = UPPER (data_name);
   END;
Yep, that is sure to reduce PGA consumption: I no longer declare and populate a collection in my session. And if you can take this approach (a "pure" SQL solution), it is generally the best way to go - not only as regards PGA consumption, but also performance and maintainability (less and simpler code).

You may, however, run into a different error: 
ORA-01555: snapshot too old: rollback segment number too small
That's the problem sometimes with SQL: you can express what you want to do very elegantly and often with a very small amount of code - but your database is not necessarily configured to execute it successfully.

Of course, you can ask your DBA to give you a larger rollback segment and maybe she will even say yes. If not, though, you have to fall back on incremental commit processing. Check out my LiveSQL script on this topic.

[Note: actually, if the "pure" SQL solution hits this error, so may the original bulk processing version.  Besides a larger rollback segment, the only real workaround is to commit, which I demonstrate for bulk processing below.]

Regardless, for the purposes of this post on ORA-04030, the shift away from PL/SQL collections to "pure" SQL will help with PGA memory consumption.

OK, how about if we change update_lots_of_data to:
   PROCEDURE update_lots_of_data
   IS
      CURSOR data_cur
      IS
         SELECT * FROM plch_data;

      l_data   data_t;
   BEGIN
      OPEN data_cur;

      LOOP
         FETCH data_cur BULK COLLECT INTO l_data
           LIMIT 250;

         FORALL indx IN 1 .. l_data.COUNT
            UPDATE plch_data
               SET data_name = UPPER (data_name)
             WHERE data_id = l_data (indx).data_id;

         EXIT WHEN data_cur%NOTFOUND;
      END LOOP;

      CLOSE data_cur;        
   END;
Will that help with PGA consumption? Absolutely! I am still relying on BULK COLLECT and FORALL, but now I use the BULK COLLECT's LIMIT clause to return at most 250 rows with each fetch.

In addition, if I place a commit after each FORALL statement, I can avoid the rollback segment too small error. Again, check out that LiveSQL script for more details.

Moving on, let's take a closer look at return_lots_of_data. As noted above, it is only invoked from within the TABLE operator in a SELECT's FROM clause. It is, in other words, used exclusively as a table function.

[If you are not too familiar with table functions and pipelined table functions, check out my series.]

Well, how about if we change that function as follows:
   FUNCTION return_lots_of_data RETURN names_t PIPELINED
   IS
      CURSOR data_cur
      IS
         SELECT data_name FROM plch_data;

      l_data   names_t;
      l_return names_t;
   BEGIN
      OPEN data_cur;

      LOOP
         FETCH data_cur BULK COLLECT INTO l_data LIMIT 1000;

         FOR indx IN 1 .. l_data.COUNT
         LOOP
             PIPE ROW (l_data (indx));
         END LOOP;

         EXIT WHEN data_cur%NOTFOUND;
      END LOOP;

      CLOSE data_cur; 

      RETURN;
   END; 
So now I've changed the table function to a pipelined table function, which alters very dramatically how data is accumulated and returned to the calling query.

With a non-pipelined table function, I populate and return a collection, thereby consuming PGA memory. With a pipelined table function, I "pipe" a "row" out from the function directly to the SELECT, which can consume and use that row of data, asynchronously to the completion of the function. In fact, when the function is done, it simply returns control and no data, as you can see from the unusual "RETURN;" statement.

A switch to pipelining will definitely have a dramatic negative effect on PGA consumption!

Finally let's take a look at the one_data function:
   FUNCTION one_data (data_id_in IN INTEGER)
      RETURN plch_data%ROWTYPE
   IS
      l_data   data_t;
   BEGIN
      SELECT *
       BULK COLLECT INTO l_data
       FROM plch_data;

      RETURN l_data (data_id_in);
   END;
This is a really, REALLY awful implementation of a one-row lookup function. I am supposed to return one row for the primary key, and to do it, I load all the rows in the table into a collection, and then return the row located at that primary key value. That's not ridiculous solely because of the unnecessary memory consumption, but also because this logic requires that the primary key is always equal to the generated index value generated by the BULK COLLECT process.

Talk about nasty!

So the very first thing to do is get rid of that collection! This function could "fall back" on the most basic and pretty good implementation:
   FUNCTION one_data (data_id_in IN INTEGER)
      RETURN plch_data%ROWTYPE
   IS
      l_data   plch_data%ROWTYPE;
   BEGIN
      SELECT *
        INTO l_data
       FROM plch_data
      WHERE data_id = data_id_in;

      RETURN l_data;
   END;
That will certainly reduce PGA consumption. But from a performance standpoint is that the best we can do? Remember that earlier I wrote that:
  • All rows in plch_data are repeatedly fetched 1000s of times per minute through calls to plch_pkg.one_data.
  • Existing rows in plch_data are changed on average once per hour. Rows are never inserted or deleted and primary key values are never changed.
Whenever you see this combination (the same rows fetched with much higher frequency than they are changed), you should think: Function Result Cache!

And, indeed, we can both minimize PGA consumption and greatly speed up performance of the execution of the function (across all users) by changing the one_data function to:
   FUNCTION one_data (data_id_in IN INTEGER)
      RETURN plch_data%ROWTYPE
      RESULT_CACHE
   IS
      l_return plch_data%ROWTYPE;
   BEGIN
      SELECT * INTO l_return
        FROM plch_data
       WHERE data_id = data_id_in;

      RETURN l_return;
   END;
The Function Result Cache is a really wonderful feature of PL/SQL, added in 11.1 (and also available in SQL itself through use of the /*+ RESULT_CACHE */ hint. After all, a SELECT really is "just like" a function!). For more information about this feature, check out my article in Oracle Magazine.

I hope this gave you some ideas for managing PGA in your code, and patterns of code to look for.

1 comment:

  1. Your blog has given me that thing which I never expect to get from all over the websites. Nice post guys!

    ReplyDelete