Thursday, November 3, 2016

PL/SQL 101: Less code is better - avoid unnecessarily complex algorithms

With programmers new to PL/SQL (and SQL), it is not uncommon to find that they overcomplicate things, writing more code than is necessary, and putting too much logic into PL/SQL.  That problem can then be compounded by accidentally getting the "right answer" based on inadequate testing and test data.

So it is always good to be reminded: 
Do as much work as you can in SQL, and then finish up in PL/SQL. 
Where "finish up" means do whatever is appropriate within the database, and then make that available to whatever language is being used to write the UI!

A recent quiz on the PL/SQL Challenge explored this topic, and I offer it below as a learning exercise via blog post.

Suppose I have a table and dataset as follows:

CREATE TABLE plch_animals
(
   animal_id     INTEGER PRIMARY KEY,
   animal_name   VARCHAR2 (100)
)
/

BEGIN
   INSERT INTO plch_animals (animal_id, animal_name)
        VALUES (1, 'Bonobo');

   INSERT INTO plch_animals (animal_id, animal_name)
        VALUES (2, 'Cockatoo');

   INSERT INTO plch_animals (animal_id, animal_name)
        VALUES (3, 'Spider');

   COMMIT;
END;
/

And I need to write a program that produces the following output:

Animals in Alphabetical Order
Bonobo
Cockatoo
Spider

Let's take a look at some ways to achieve this objective (some of them distinctly sub-optimal, holding off the best till last):

#1. Accidental Success Through Bad Test Data

DECLARE
   l_count   INTEGER;
   l_name    plch_animals.animal_name%TYPE;
BEGIN
   DBMS_OUTPUT.put_line ('Animals in Alphabetical Order');

   SELECT COUNT (*) INTO l_count FROM plch_animals;

   FOR indx IN 1 .. l_count
   LOOP
      SELECT animal_name
        INTO l_name
        FROM plch_animals
       WHERE animal_id = indx;

      DBMS_OUTPUT.put_line (l_name);
   END LOOP;
END;
/

You see the desired output, but only because the animal IDs happen to ascend in exactly the same sequence as an alphabetical ordering of the animal names.(see line in blue).

You certainly do not want to ever assume this will be the case with real data. Even if you look at the data today and can confirm that pattern. That's just for today. A warning sign that this code is problematic is the SELECT COUNT(*).

You should generally not need to do a separate query just to control the number of iterations of a loop. A cursor FOR loop will usually take care of that for you.

#2. Too Much SQL, Not Enough Data

DECLARE
   TYPE animal_ids_t IS TABLE OF plch_animals.animal_id%TYPE;

   l_animal_ids   animal_ids_t;
   l_name         plch_animals.animal_name%TYPE;
BEGIN
   DBMS_OUTPUT.put_line ('Animals in Alphabetical Order');

   SELECT animal_id
     BULK COLLECT INTO l_animal_ids
     FROM plch_animals;

   FOR indx IN 1 .. l_animal_ids.COUNT
   LOOP
      SELECT animal_name
        INTO l_name
        FROM plch_animals
       WHERE animal_id = indx;

      DBMS_OUTPUT.put_line (l_name);
   END LOOP;
END;
/

Oh my. This choice is similar to #1, in terms of the basic, flawed assumptions (the animal IDs happen to ascend in exactly the same sequence as an alphabetical ordering of the animal names).

But it's even worse because I fetch all the animal IDs via BULK COLLECT into an array, then when I loop through the array, I execute a single row fetch to get the name.

And then it's even worse worse because the only reason this works is that the ID values happen to align with names in alphabetical order. So the results "look" OK when the test is run with this pathetic set of test data, but in the real world? Ugh.

That's a waste of code, PGA memory and CPU cycles.

#3. Having Too Much Fun With Collections

DECLARE
   TYPE animal_ids_t IS TABLE OF plch_animals.animal_id%TYPE
      INDEX BY plch_animals.animal_name%TYPE;

   l_animal_ids   animal_ids_t;
   l_index        plch_animals.animal_name%TYPE;
   l_name         plch_animals.animal_name%TYPE;
BEGIN
   DBMS_OUTPUT.put_line ('Animals in Alphabetical Order');

   FOR rec IN (  SELECT *
                   FROM plch_animals
               ORDER BY animal_name DESC)
   LOOP
      l_animal_ids (rec.animal_name) := rec.animal_id;
   END LOOP;

   l_index := l_animal_ids.FIRST;

   WHILE l_index IS NOT NULL
   LOOP
      DBMS_OUTPUT.put_line (l_index);

      l_index := l_animal_ids.NEXT (l_index);
   END LOOP;
END;
/

I very much enjoy collections (PL/SQL's version of arrays) and use them all over my code. And I especially appreciate string-indexed associative arrays, like I use above. But, really, there is such a thing as too much.

I use a cursor FOR loop to grab the rows from the animals table, and load the ID into my array, using the name as the index value. Then I use a WHILE loop to iterate through that array, simply displaying the index value. I never even use the ID!

It gets the job done, and sure performance will be fine unless you are looking at a very large volume, but this code is downright bewildering.

#4. SQL Simple

SELECT animal_name FROM plch_animals
 ORDER BY animal_name

Doesn't get much simpler than that. And if you need it inside PL/SQL....

#5. PL/SQL Simple

BEGIN
   DBMS_OUTPUT.put_line ('Animals in Alphabetical Order');

   FOR rec IN (SELECT animal_name FROM plch_animals
                ORDER BY animal_name)
   LOOP
      DBMS_OUTPUT.put_line (rec.animal_name);
   END LOOP;
END;

Lessons Learned

1. Let SQL do the heavy-lifting, as much as possible (not that there was anything very "heavy" to lift in this exercise!)

2. Don't over-complicate matters.

3. Make sure your test data has enough volume and variety to truly exercise your algorithm.

4. If you find yourself thinking "Does it have to be this complicated?", almost certainly the answer is a resounding "No!" and you should take a step back, challenge your assumptions, and see how you can simplify your code.

And don't forget:

a. Follow me on Twitter: @sfonplsql
b. Subscribe to my YouTube channel: PracticallyPerfectPLSQL
c. Check out the PL/SQL home page: oracle.com/plsql

11 comments:

  1. And the converse: simplicity = beauty!

    ReplyDelete
  2. #4 is not quite correct, Header is missing from output ;-)

    ReplyDelete
  3. Right, Herald, I guess it should be something more like:

    SELECT animal_name "Animals in Alphabetical Order"
    FROM plch_animals
    ORDER BY animal_name

    ReplyDelete
  4. Hi Steven, this morning i am drinking coffee and reading this blog again and i think this statement should be extended to: Do as much work as you can in SQL, and then finish up in PL/SQL. => Do as much work as you can in less SQL statements as you can in your code, and then finish up in PL/SQL. Part with Java and external C I will skip :D Also English is not my native language so i believe this sentence can be more simplify. :)

    ReplyDelete
    Replies
    1. Excellent thoughts, Neven. What do you think of this:

      Do as much as you can in the smallest possible number of SQL statements, then wrap those in a PL/SQL API.

      What do you think of that?

      Delete
    2. That is it :) simple and clean as code should be

      Delete
  5. For real world scenarios, I often prefere the number 3) example. It separates the data fetch process from the data manipulation/interpretation part. As soon as the logic becomes more complex, this is a pattern that helps tremendously to write good modularized code.

    ReplyDelete
    Replies
    1. Very interesting, Sven. I'd love to hear what others think of this.

      Delete
    2. My first gut reaction was seeing the "As soon as" and thinking "YAGNI".

      My deeper reaction is that this really depends a lot on whether you subscribe to Tom Kyte's long-standing advice to do as much as possible in SQL. I've usually found that this means the PL/SQL-required part is actually quite small, and putting it inside the cursor loop makes eminent sense.

      If you lean more toward doing the heavy lifting in PL/SQL and use the query as just a way to get data out of tables, this start to look more appealing.

      Delete
  6. And now we come to old topic, we all agree that this is best approach, less, simple etc. But many times I believe PL/SQL - SQL developers come to problem as i do. I cant write it in one SQL statement i need two. I am not sure will i use this SQL statements individually in future as part of some others stored procedures. Anyway should i separate them in two different stored procedure or functions or keep them in one stored procedure or function?
    Questions is, should we wrap all sql statements individually in stored procedures or functions?

    ReplyDelete