Tuesday, April 26, 2016

Minimize context switches and unnecessary PL/SQL code: an example from the PL/SQL Challenge

On the PL/SQL Challenge, when you click on a link to play a quiz, you are taken to the "launch" page. We give you an opportunity to review assumptions and instructions, and then press the Start button when you are ready (your score is based in part on the time it takes you to answer).

However, if you've taken that particular quiz before, and there have been no changes to assumptions or instructions, the launch page just gets in the way.

So I decided to streamline the flow on our site as follows:

1. If a person has never taken this quiz before, go to the launch page.

2. Otherwise, if assumptions or instructions have changed since the last playing of the quiz, go to the launch page.

3. Otherwise, go straight to the Play Quiz page.

I figured the way to do this is build a function that will be invoked from Oracle Application Express. Here is a first pass, using the top-down design technique, at implementing the function.

CREATE OR REPLACE PACKAGE qdb_player_mgr
IS
   FUNCTION can_skip_launch_page (comp_event_id_in   IN INTEGER,
                                  user_id_in         IN INTEGER)
      RETURN BOOLEAN;
END qdb_player_mgr;
/

CREATE OR REPLACE PACKAGE BODY qdb_player_mgr
IS
   FUNCTION can_skip_launch_page (comp_event_id_in   IN INTEGER,
                                  user_id_in         IN INTEGER)
      RETURN BOOLEAN
   IS
      l_can_skip          BOOLEAN;
      l_last_changed_on   DATE;
   BEGIN
      l_can_skip := NOT competition_answered (comp_event_id_in, user_id_in);

      IF l_can_skip
      THEN
         l_last_changed_on :=
            GREATEST (last_assumption_change (comp_event_id_in),
                      last_instruction_change (comp_event_id_in));
         l_can_skip :=
            l_last_changed_on <=
               most_recent_answer (comp_event_id_in, user_id_in);
      END IF;

      RETURN l_can_skip;
   END;
END qdb_player_mgr;
/

That's easy to understand, right? So the next step would be to implement each of the functions, each of which I can describe as follows:
  • competition_answered: use the comp_event_id to get it's parent competition ID. See if the specified user answered any events in this competition.
  • last_assumption_change: use the comp_event_id to get it's parent competition ID. Get the last-changed date for assumptions for that competition.
  • last_instruction_change: use the comp_event_id to get it's parent competition ID. Get the last-changed date for instructions for that competition.
  • most_recent_answer: use the comp_event_id to get it's parent competition ID. Get the maximum start date for any answers by this user for any events in this competition.
I hope that you are now saying: "No, Steven, don't do it! Don't write a separate function for each of these conditions!"

But, but, why not? I so looooove to write PL/SQL functions!

Even without really understanding our data model, the first hint of a problem with this approach should come with the repetition of "Use the comp_event_id to get it's parent competition ID."  Why would I want to have to do that over and over again?

The other, more general problem with this approach is that it looks like I will execute four SELECT statements en route to coming up with my return value. That's four context switches between the PL/SQL and SQL engines. 

For best performance, I would like to minimize context switches, and beyond that avoid unnecessary SQL statements. In other words, when looking at requirements for a new program like this, I should first ask: "How much information can I get in a single SELECT?"

Turns out, an awful lot.  In the implementation below, I join my competition events table to my parent competitions table, so I can get the information for that competition once. Then I use it in the single value subquery to get the latest answer submitted for that user across all events in the competition.

   FUNCTION can_skip_launch_page (comp_event_id_in   IN INTEGER,
                                  user_id_in         IN INTEGER)
      RETURN BOOLEAN
   IS
      l_can_skip          BOOLEAN;

      CURSOR answer_info_cur
      IS
         SELECT c.instr_last_changed_on,
                c.assump_last_changed_on,
                (SELECT MAX (start_date)
                   FROM qdb_compev_answers eva, qdb_comp_events ce
                  WHERE     eva.user_id = can_skip_launch_page .user_id_in
                        AND eva.comp_event_id = ce.comp_event_id
                        AND ce.competition_id = c.competition_id)
                   last_taken_on
           FROM qdb_comp_events ce, qdb_competitions c
          WHERE     ce.comp_event_id = can_skip_launch_page .comp_event_id_in
                AND ce.competition_id = c.competition_id;

      answer_info_r       answer_info_cur%ROWTYPE;
   BEGIN
      OPEN answer_info_cur;
      FETCH answer_info_cur INTO answer_info_r;
      CLOSE answer_info_cur;

      l_can_skip := 
            answer_info_r.last_taken_on >=
               GREATEST (answer_info_r.instr_last_changed_on,
                         answer_info_r.assump_last_changed_on);

      /* If no answer, then last_taken_on is NULL, so "can skip" might
         be NULL at this point, and that means "cannot skip." */
      RETURN NVL (l_can_skip, FALSE);
   END;

I could have used a SELECT-INTO for this single-row query, but then I would have had to declare a record type to match the cursor 's SELECT list or declared individual variables for each expression returned. With an explicit cursor, I can simply declare a record using %ROWTYPE.

This second solution is not as easy to read as the first implementation - unless you are comfortable with SQL and the data model of the application at hand (which obviously you would be, if it was your code!). 

But if you are going to write applications on top of a relational database, you should be comfortable - very comfortable - with SQL.

3 comments:

  1. Great example.

    I remember first embracing the concept of PL/SQL table APIs back in '95 when I read your pioneering book. I took it too far though, over-engineering and designing functions for all sorts of queries and operations I thought the app devs might need. When I tried to use them myself, it left me ill to see how inefficient and chatty the code was compared to a well-written DML statement. It also peppered the schemas with tons of unused code that became more difficult to clean out as time wore on.

    In the end, neither the table APIs, or my plethora of "clever" functions, ended up being what the applications needed. They needed custom, efficient SQL statements that satisfied unique requirements, as you've done here. The pursuit of a re-usable data layer API has left me more prone to create nothing but the common framework up front (security, logging, messages, email, etc.), only crafting PL/SQL routines as needed by the JSON-fluent middle tier services.

    ReplyDelete
  2. Thanks, Bill, very interesting. But you still really just want to have one "way" for any consumer of the table to, say, insert a row into the table, right?

    ReplyDelete
    Replies
    1. Yes, sorta. One published, authorized PL/SQL insert proc to be used by OLTP apps for transactional tables (and less volatile tables the app is allowed to administer). We also use various technical and cultural measures to ensure enthusiastic developers aren't going around the API to do their own thing.

      But the Dev DBAs sometimes have to write their own inserts, because we have backend jobs, loads, AQ processes, web service calls, and sync processes that bring data in bulk and need custom SQL the simple insert proc doesn't satisfy.

      So publicly, exposed to the developers, yes, one way to insert new. Privately, we might have two or three.

      I think the overall point of the post is great wisdom, that sometimes the goodness and readability of PL/SQL functions can backfire and be much more efficiently handled with well-designed, set-oriented SQL.

      Delete