Tuesday, April 26, 2016

Minimize context switches and unnecessary PL/SQL code: an example from the PL/SQL Challenge

On the PL/SQL Challenge, when you click on a link to play a quiz, you are taken to the "launch" page. We give you an opportunity to review assumptions and instructions, and then press the Start button when you are ready (your score is based in part on the time it takes you to answer).

However, if you've taken that particular quiz before, and there have been no changes to assumptions or instructions, the launch page just gets in the way.

So I decided to streamline the flow on our site as follows:

1. If a person has never taken this quiz before, go to the launch page.

2. Otherwise, if assumptions or instructions have changed since the last playing of the quiz, go to the launch page.

3. Otherwise, go straight to the Play Quiz page.

I figured the way to do this is build a function that will be invoked from Oracle Application Express. Here is a first pass, using the top-down design technique, at implementing the function.
CREATE OR REPLACE PACKAGE qdb_player_mgr
IS
   FUNCTION can_skip_launch_page (comp_event_id_in   IN INTEGER,
                                  user_id_in         IN INTEGER)
      RETURN BOOLEAN;
END qdb_player_mgr;
/

CREATE OR REPLACE PACKAGE BODY qdb_player_mgr
IS
   FUNCTION can_skip_launch_page (comp_event_id_in   IN INTEGER,
                                  user_id_in         IN INTEGER)
      RETURN BOOLEAN
   IS
      l_can_skip          BOOLEAN;
      l_last_changed_on   DATE;
   BEGIN
      l_can_skip := NOT competition_answered (comp_event_id_in, user_id_in);

      IF l_can_skip
      THEN
         l_last_changed_on :=
            GREATEST (last_assumption_change (comp_event_id_in),
                      last_instruction_change (comp_event_id_in));
         l_can_skip :=
            l_last_changed_on <=
               most_recent_answer (comp_event_id_in, user_id_in);
      END IF;

      RETURN l_can_skip;
   END;
END qdb_player_mgr;
/
That's easy to understand, right? So the next step would be to implement each of the functions, each of which I can describe as follows:
  • competition_answered: use the comp_event_id to get it's parent competition ID. See if the specified user answered any events in this competition.
  • last_assumption_change: use the comp_event_id to get it's parent competition ID. Get the last-changed date for assumptions for that competition.
  • last_instruction_change: use the comp_event_id to get it's parent competition ID. Get the last-changed date for instructions for that competition.
  • most_recent_answer: use the comp_event_id to get it's parent competition ID. Get the maximum start date for any answers by this user for any events in this competition.
I hope that you are now saying: "No, Steven, don't do it! Don't write a separate function for each of these conditions!"

But, but, why not? I so looooove to write PL/SQL functions!

Even without really understanding our data model, the first hint of a problem with this approach should come with the repetition of "Use the comp_event_id to get it's parent competition ID."  Why would I want to have to do that over and over again?

The other, more general problem with this approach is that it looks like I will execute four SELECT statements en route to coming up with my return value. That's four context switches between the PL/SQL and SQL engines. 

For best performance, I would like to minimize context switches, and beyond that avoid unnecessary SQL statements. In other words, when looking at requirements for a new program like this, I should first ask: "How much information can I get in a single SELECT?"

Turns out, an awful lot.  In the implementation below, I join my competition events table to my parent competitions table, so I can get the information for that competition once. Then I use it in the single value subquery to get the latest answer submitted for that user across all events in the competition.
   FUNCTION can_skip_launch_page (comp_event_id_in   IN INTEGER,
                                  user_id_in         IN INTEGER)
      RETURN BOOLEAN
   IS
      l_can_skip          BOOLEAN;

      CURSOR answer_info_cur
      IS
         SELECT c.instr_last_changed_on,
                c.assump_last_changed_on,
                (SELECT MAX (start_date)
                   FROM qdb_compev_answers eva, qdb_comp_events ce
                  WHERE     eva.user_id = can_skip_launch_page .user_id_in
                        AND eva.comp_event_id = ce.comp_event_id
                        AND ce.competition_id = c.competition_id)
                   last_taken_on
           FROM qdb_comp_events ce, qdb_competitions c
          WHERE     ce.comp_event_id = can_skip_launch_page .comp_event_id_in
                AND ce.competition_id = c.competition_id;

      answer_info_r       answer_info_cur%ROWTYPE;
   BEGIN
      OPEN answer_info_cur;
      FETCH answer_info_cur INTO answer_info_r;
      CLOSE answer_info_cur;

      l_can_skip := 
            answer_info_r.last_taken_on >=
               GREATEST (answer_info_r.instr_last_changed_on,
                         answer_info_r.assump_last_changed_on);

      /* If no answer, then last_taken_on is NULL, so "can skip" might
         be NULL at this point, and that means "cannot skip." */
      RETURN NVL (l_can_skip, FALSE);
   END;
I could have used a SELECT-INTO for this single-row query, but then I would have had to declare a record type to match the cursor 's SELECT list or declared individual variables for each expression returned. With an explicit cursor, I can simply declare a record using %ROWTYPE.

This second solution is not as easy to read as the first implementation - unless you are comfortable with SQL and the data model of the application at hand (which obviously you would be, if it was your code!). 

But if you are going to write applications on top of a relational database, you should be comfortable - very comfortable - with SQL.

Thursday, April 7, 2016

Tips for avoiding the "ORA-04030: out of process memory..." error

Process Global Area (PGA) memory is utilized to store and manage session-specific data (compared to the System Global Area or SGA, which manages cross-session memory; lots more details here). If a session uses too much PGA, it can fail with the "ORA-04030: out of process memory when trying to allocate..." error. PL/SQL offers several techniques for reducing PGA consumption, including pipelined table functions, the LIMIT clause of BULK COLLECT, and the NOCOPY hint.

The 2015 Oracle PL/SQL Championship at the PL/SQL Challenge offered a quiz on this topic. You can take the entire championship to test your own knowledge by registering at this quiz website and then clicking on the Tests tab.

In this post, I point how techniques for PGA reduction using the code from that quiz.

I have a table with lots of data:
CREATE TABLE plch_data (
   data_id     INTEGER PRIMARY KEY,
   data_name   VARCHAR2 (1000) UNIQUE)
/

BEGIN
   INSERT INTO plch_data (data_id, data_name)
          SELECT LEVEL, 'Big Data ' || LEVEL FROM DUAL
      CONNECT BY LEVEL < 100000;
   COMMIT;
END;   
/
I have a package that works with this table:
CREATE OR REPLACE PACKAGE plch_pkg
IS
   TYPE data_t IS TABLE OF 
         plch_data%ROWTYPE INDEX BY PLS_INTEGER;
   TYPE names_t IS TABLE OF plch_data.data_name%TYPE;

   PROCEDURE update_lots_of_data;
   FUNCTION return_lots_of_data RETURN names_t;
   PROCEDURE show_lots_of_data;

   FUNCTION one_data (data_id_in IN INTEGER) 
      RETURN plch_data%ROWTYPE;

   PROCEDURE change_data (
         data_in IN data_t, data_out OUT data_t);
END;
/

CREATE OR REPLACE PACKAGE BODY plch_pkg
IS
   PROCEDURE update_lots_of_data IS
      l_data   data_t;
   BEGIN
      SELECT * BULK COLLECT INTO l_data FROM plch_data;

      FORALL indx IN 1 .. l_data.COUNT
         UPDATE plch_data SET data_name = UPPER (data_name)
          WHERE data_id = l_data (indx).data_id;
   END;

   FUNCTION return_lots_of_data RETURN names_t IS
      l_data names_t;
   BEGIN
      SELECT data_name
        BULK COLLECT INTO l_data FROM plch_data;
      RETURN l_data;
   END;

   PROCEDURE show_lots_of_data IS
      CURSOR data_cur IS SELECT * FROM plch_data;
      rec   data_cur%ROWTYPE;
   BEGIN
      OPEN data_cur;
      LOOP
         FETCH data_cur INTO rec;
         EXIT WHEN data_cur%NOTFOUND;
         DBMS_OUTPUT.put_line (rec.data_name);
      END LOOP;
      CLOSE data_cur;
   END;

   FUNCTION one_data (data_id_in IN INTEGER) 
      RETURN plch_data%ROWTYPE 
   IS
      l_data   data_t;
   BEGIN
      SELECT * BULK COLLECT INTO l_data FROM plch_data
        ORDER BY data_id;
      RETURN l_data (data_id_in);
   END;

   PROCEDURE change_data (
      data_in IN data_t, data_out OUT data_t) IS
   BEGIN
      FOR indx IN 1 .. data_in.COUNT
      LOOP
         IF MOD (indx, 3) = 0 THEN
            data_out (indx).data_name := 
               UPPER (data_in (indx).data_name);
         END IF;
      END LOOP;
   END;
END;
/
And here's what I know about the application behavior:
  1. All subprograms are used extensively and repeatedly by 100s of simultaneous connections.
  2. All rows in plch_data are repeatedly fetched 1000s of times per minute through calls to plch_pkg.one_data.
  3. Existing rows in plch_data are changed on average once per hour. Rows are never inserted or deleted and primary key values are never changed.
  4. The return_lots_of_data function is only invoked from within the TABLE operator in a SELECT's FROM clause.
Oh, and I know something else as well: Users complain that they often get this error:
ORA-04030: out of process memory when trying to allocate 40932 bytes
So the question for me is now: how can I change the above code to minimize or completely avoid the chance of ORA-04030 rearing its ugly head?

The first step is to identify where and how PGA is being consumed in my code. Generally, whenever your program assigns a value to a variable or constant, it uses up some PGA memory. Also generally, the biggest consumers of PGAs are your biggest data structures: first and foremost, collections, secondarily, CLOBs, records, object types, etc.

This leads me to focus on the following possible culprits of PGA consumption:

1. The l_data variable in update_lots_of_data
2. The l_data variable in return_lots_of_data
3. The l_data variable in one_data
4. The two collection parameters of change_data

So let's go exploring the possible kinds of changes we can make to the plch_pkg ("plch" is the standard prefix we use for all PL/SQL Challenge quiz objects), and determine if they will help avoid the ORA-04030 error.

How about if I change update_lots_of_data to:
   PROCEDURE update_lots_of_data
   IS
   BEGIN
      UPDATE plch_data
         SET data_name = UPPER (data_name);
   END;
Yep, that is sure to reduce PGA consumption: I no longer declare and populate a collection in my session. And if you can take this approach (a "pure" SQL solution), it is generally the best way to go - not only as regards PGA consumption, but also performance and maintainability (less and simpler code).

You may, however, run into a different error: 
ORA-01555: snapshot too old: rollback segment number too small
That's the problem sometimes with SQL: you can express what you want to do very elegantly and often with a very small amount of code - but your database is not necessarily configured to execute it successfully.

Of course, you can ask your DBA to give you a larger rollback segment and maybe she will even say yes. If not, though, you have to fall back on incremental commit processing. Check out my LiveSQL script on this topic.

[Note: actually, if the "pure" SQL solution hits this error, so may the original bulk processing version.  Besides a larger rollback segment, the only real workaround is to commit, which I demonstrate for bulk processing below.]

Regardless, for the purposes of this post on ORA-04030, the shift away from PL/SQL collections to "pure" SQL will help with PGA memory consumption.

OK, how about if we change update_lots_of_data to:
   PROCEDURE update_lots_of_data
   IS
      CURSOR data_cur
      IS
         SELECT * FROM plch_data;

      l_data   data_t;
   BEGIN
      OPEN data_cur;

      LOOP
         FETCH data_cur BULK COLLECT INTO l_data
           LIMIT 250;

         FORALL indx IN 1 .. l_data.COUNT
            UPDATE plch_data
               SET data_name = UPPER (data_name)
             WHERE data_id = l_data (indx).data_id;

         EXIT WHEN data_cur%NOTFOUND;
      END LOOP;

      CLOSE data_cur;        
   END;
Will that help with PGA consumption? Absolutely! I am still relying on BULK COLLECT and FORALL, but now I use the BULK COLLECT's LIMIT clause to return at most 250 rows with each fetch.

In addition, if I place a commit after each FORALL statement, I can avoid the rollback segment too small error. Again, check out that LiveSQL script for more details.

Moving on, let's take a closer look at return_lots_of_data. As noted above, it is only invoked from within the TABLE operator in a SELECT's FROM clause. It is, in other words, used exclusively as a table function.

[If you are not too familiar with table functions and pipelined table functions, check out my series.]

Well, how about if we change that function as follows:
   FUNCTION return_lots_of_data RETURN names_t PIPELINED
   IS
      CURSOR data_cur
      IS
         SELECT data_name FROM plch_data;

      l_data   names_t;
      l_return names_t;
   BEGIN
      OPEN data_cur;

      LOOP
         FETCH data_cur BULK COLLECT INTO l_data LIMIT 1000;

         FOR indx IN 1 .. l_data.COUNT
         LOOP
             PIPE ROW (l_data (indx));
         END LOOP;

         EXIT WHEN data_cur%NOTFOUND;
      END LOOP;

      CLOSE data_cur; 

      RETURN;
   END; 
So now I've changed the table function to a pipelined table function, which alters very dramatically how data is accumulated and returned to the calling query.

With a non-pipelined table function, I populate and return a collection, thereby consuming PGA memory. With a pipelined table function, I "pipe" a "row" out from the function directly to the SELECT, which can consume and use that row of data, asynchronously to the completion of the function. In fact, when the function is done, it simply returns control and no data, as you can see from the unusual "RETURN;" statement.

A switch to pipelining will definitely have a dramatic negative effect on PGA consumption!

Finally let's take a look at the one_data function:
   FUNCTION one_data (data_id_in IN INTEGER)
      RETURN plch_data%ROWTYPE
   IS
      l_data   data_t;
   BEGIN
      SELECT *
       BULK COLLECT INTO l_data
       FROM plch_data;

      RETURN l_data (data_id_in);
   END;
This is a really, REALLY awful implementation of a one-row lookup function. I am supposed to return one row for the primary key, and to do it, I load all the rows in the table into a collection, and then return the row located at that primary key value. That's not ridiculous solely because of the unnecessary memory consumption, but also because this logic requires that the primary key is always equal to the generated index value generated by the BULK COLLECT process.

Talk about nasty!

So the very first thing to do is get rid of that collection! This function could "fall back" on the most basic and pretty good implementation:
   FUNCTION one_data (data_id_in IN INTEGER)
      RETURN plch_data%ROWTYPE
   IS
      l_data   plch_data%ROWTYPE;
   BEGIN
      SELECT *
        INTO l_data
       FROM plch_data
      WHERE data_id = data_id_in;

      RETURN l_data;
   END;
That will certainly reduce PGA consumption. But from a performance standpoint is that the best we can do? Remember that earlier I wrote that:
  • All rows in plch_data are repeatedly fetched 1000s of times per minute through calls to plch_pkg.one_data.
  • Existing rows in plch_data are changed on average once per hour. Rows are never inserted or deleted and primary key values are never changed.
Whenever you see this combination (the same rows fetched with much higher frequency than they are changed), you should think: Function Result Cache!

And, indeed, we can both minimize PGA consumption and greatly speed up performance of the execution of the function (across all users) by changing the one_data function to:
   FUNCTION one_data (data_id_in IN INTEGER)
      RETURN plch_data%ROWTYPE
      RESULT_CACHE
   IS
      l_return plch_data%ROWTYPE;
   BEGIN
      SELECT * INTO l_return
        FROM plch_data
       WHERE data_id = data_id_in;

      RETURN l_return;
   END;
The Function Result Cache is a really wonderful feature of PL/SQL, added in 11.1 (and also available in SQL itself through use of the /*+ RESULT_CACHE */ hint. After all, a SELECT really is "just like" a function!). For more information about this feature, check out my article in Oracle Magazine.

I hope this gave you some ideas for managing PGA in your code, and patterns of code to look for.

Monday, April 4, 2016

Don't Return Home Without the RETURNING Clause

[That title is, by the way, a fairly lame play on the American Express "Don't leave home without it." tagline. Sorry, can't help myself. SF]

The RETURNING clause allows you to retrieve values of columns (and expressions based on columns) that were modified by an insert, delete or update. Without RETURNING you would have to run a SELECT statement after the DML statement is completed to obtain the values of the changed columns. The RETURNING clause can return multiple rows of data, in which case you will use the RETURNING BULK COLLECT INTO form. You can also call aggregate functions in the RETURNING clause to obtain sums, counts and so on of columns in multiple rows changed by the DML statement.

I explore the feature below. You can execute the same code on LiveSQL.

Set Up Some Data

First, let's create some data with which to play.

CREATE TABLE parts
(
   part_number    INTEGER
 , part_name   VARCHAR2 (100)
)
/

BEGIN
   INSERT INTO parts VALUES (1, 'Mouse');
   INSERT INTO parts VALUES (100, 'Keyboard');
   INSERT INTO parts VALUES (500, 'Monitor');
   COMMIT;
END;
/

CREATE TABLE employees
(
   employee_id   INTEGER
 , last_name     VARCHAR2 (100)
 , salary        NUMBER
)
/

BEGIN
   INSERT INTO employees VALUES (100, 'Gutseriev', 1000);
   INSERT INTO employees VALUES (200, 'Ellison', 2000);
   INSERT INTO employees VALUES (400, 'Gates', 3000);
   INSERT INTO employees VALUES (500, 'Buffet', 4000);
   INSERT INTO employees VALUES (600, 'Slim', 5000);
   INSERT INTO employees VALUES (700, 'Arnault', 6000);
   COMMIT;
END;
/
The Basics of RETURNING

Suppose that after I update a part name, I need to find out which row was changed. Here's one way I could to this:
DECLARE
   l_num   PLS_INTEGER;
BEGIN
   UPDATE parts
      SET part_name = UPPER (part_name)
    WHERE part_name LIKE 'K%';

   SELECT part_number
     INTO l_num
     FROM parts
    WHERE part_name = UPPER (part_name);

   DBMS_OUTPUT.put_line (l_num);
END;
/
This solution issues the update and then in a separate SQL statement retrieves the part number of the row that was just modified - but only by reproducing the logic ("partname = UPPER (partname)") in the WHERE clause. This means that I have introduced repetition in my code, and also inefficiency (an extra context switch). This is logically equivalent to using the RETURNING clause, but definitely inferior to RETURNING. And keep in mind that if you use a SELECT after your DML statement to determine if the correct changes were made, you need to be very careful about how you specify the WHERE clause of your query to be sure that you identify the same rows that were (possibly) changed.

Now take a look at the next block.
DECLARE
   l_num   PLS_INTEGER;
BEGIN
      UPDATE parts
         SET part_name = UPPER (part_name)
       WHERE part_name LIKE 'K%'
   RETURNING part_number
        INTO l_num;

   DBMS_OUTPUT.put_line (l_num);
END;
/
Don't do an unnecessary SELECT simply to see/verify the impact of a non-query DML statement! Just add RETURNING to the statement and get information back from that single context switch between PL/SQL and SQL. Note that this RETURNING INTO only works because the WHERE clause identifies a single row for changing.

RETURNING with Multiple Rows Changed

Now suppose that I am (or could be) updating more than one row with my DML statement. For example, I will simply remove the WHERE clause from the above block. Let's see what happens when I execute it:
DECLARE
   l_num   PLS_INTEGER;
BEGIN
      UPDATE parts
         SET part_name = UPPER (part_name)
   RETURNING part_number
        INTO l_num;

   DBMS_OUTPUT.put_line (l_num);
END;
/
Oh no! I get an error:
ORA-01422: exact fetch returns more than requested number of rows
Exact fetch? Why is it talking about an exact fetch? Isn't this the same error I get when I do a SELECT-INTO that returns more than one row (that is, the TOO_MANY_ROWS exception)? Yes! And that's because RETURNING-INTO acts just like a SELECT-INTO. It expects just one row of information to be returned. So...if you are expecting more than one row, do the same thing you would do with SELECT-INTO: add BULK COLLECT!
DECLARE
   l_part_numbers   DBMS_SQL.number_table;
BEGIN
      UPDATE parts
         SET part_name = part_name || '1'
   RETURNING part_number
        BULK COLLECT INTO l_part_numbers;

   FOR indx IN 1 .. l_part_numbers.COUNT
   LOOP
      DBMS_OUTPUT.put_line (l_part_numbers (indx));
   END LOOP;
END;
/
RETURNING a Record's Worth

OK, so what if I want to return a whole record's worth of information? Can I use the ROW keyword?
DECLARE
   l_part   parts%ROWTYPE;
BEGIN
      UPDATE parts
         SET part_number = -1 * part_number
           , part_name = UPPER (part_name)
       WHERE part_number = 1
   RETURNING ROW      /* WILL NOT WORK */
        INTO l_part;

   DBMS_OUTPUT.put_line (l_part.part_name);
END;
/
No! You can "UPDATE table_name SET ROW =" to perform a record-level update, but you cannot use the ROW keyword in that same way in a RETURNING clause. Sorry, you must list each column, with compatible number and type to the fields of the "receiving" record, as you below.
DECLARE
   l_part   parts%ROWTYPE;
BEGIN
      UPDATE parts
         SET part_number = -1 * part_number
           , part_name = UPPER (part_name)
       WHERE part_number = 1
   RETURNING part_number, part_name
        INTO l_part;

   DBMS_OUTPUT.put_line (l_part.part_name);
END;
/
Aggregate Functions and RETURNING

What if I want to perform some operations on the data returned by the RETURNING? Well, fall back on first principles. From Tom Kyte: "Do it in SQL is possible." Suppose I need to get the total of salaries changed by my update statement. I could execute a SELECT after the UPDATE:
DECLARE
   l_total   INTEGER;
BEGIN
   UPDATE employees
      SET salary = salary * 2
    WHERE INSTR (last_name, 'e') > 0;

   SELECT SUM (salary)
     INTO l_total
     FROM employees
    WHERE INSTR (last_name, 'e') > 0;

   DBMS_OUTPUT.put_line (l_total);
END;
/
Not necessary! You can execute aggregate functions right inside the RETURNING clause!
DECLARE
   l_total   INTEGER;
BEGIN
      UPDATE employees
         SET salary = salary * 2
       WHERE INSTR (last_name, 'e') > 0
   RETURNING SUM (salary)
        INTO l_total;

   DBMS_OUTPUT.put_line (l_total);
END;
/

Friday, April 1, 2016

Announcing PLSQL.js, a Javascript Framework Hiding Squiggles Behind PL/SQL APIs

[April Fools Joke]

Today, the Oracle Database Developer Advocates team announces the release of PLSQL.js, a PL/SQL framework for JavaScript developers, delivering all the power and flexibility of JavaScript though simple, declarative APIs written in the best database programming language in the world.

“The first key advantage to PLSQL.js is that you don’t have to write those little squiggle thingies,” notes Steven Feuerstein, well-known author and trainer on Oracle PL/SQL, who designed the bulk of PLSQL.js. “We really didn’t see the point. Why not use regular English words and the kind of punctuation everybody was already used to, like underscores and dots? Why do we always have to change things?”


Oracle’s Chief Healthification Officer, Jean Frutesandveggies, adds that PLSQL.js is also an attempt to help young application developers deal with a growing epidemic of Javascript Fatigue.

"While PLSQL.js appears to be yet-another-framework that just compounds the problem," explains Feuerstein, "it's really not. PLSQL.js is, in fact, the JS framework to end all JS frameworks. Definitively. Until version 2 comes out, that is, which will be a complete rewrite. Maybe using TypeScript."

“Writing code,” points out Frutesandveggies, “is very hard and stressful work. It doesn’t help to write in a language like JavaScript, in which developers are expected to constantly change their frameworks, tools, and general outlook on life. The bottom line? When you have to React to Yeoman who Plop and Babel from an overly Angular point of view, well, you are bound to Relay into Motorcycle trouble. Sure, you can take aspirin for the ensuing headache, but we recommend, instead, that you simply switch once and for all to PLSQL.js.”

PLSQL.js will be released as open source under the MYOB (Mind Your Own Business) license on GritHub. Users will be allowed to pull but not push, and never commit, to ensure that the framework remains stable and free of squiggles.