Tuesday, March 8, 2016

I Love Backups and APIs, a.k.a., Test Delete Processes VERY THOROUGHLY

The PL/SQL Challenge offers weekly quizzes on SQL, PL/SQL, Database Design, Application Express, and Logic. A week on this website starts on Saturday at 00:00:00  UTC and ends on Friday at 23:59:59 UTC.

So early in the morning on Saturday, we run weekly processes to rank the past week's quizzes, and do a whole lot of cleaning-up.

Therein lies the tale of this post. But before getting into the story, a little "ancient history."

In 1978-1980, while I attended the University of Rochester, I took three "101" software classes (Algol, Lisp and something else). That is the extent of my formal training as a software developer. Move forward in time 36 years or so, and here I am: a highly regarded PL/SQL programmer, trainer, author.

Still, though, essentially self-taught and definitely not a software engineer.

I have the highest regard for engineers in general, and software engineers in particular. They combine deep knowledge with a strong sense of discipline. The result, often though of course not always, is a product (be it a bridge or an application) that is relatively bug-free and works well for a long time.

My sense is that most software developers are not engineers. But I don't need to speak about other developers. I can speak with absolute clarity on this topic just by looking in the mirror as I type.

I sometimes shock my audiences by telling them I am an "amateur" developer and I imagine they don't believe me. But then sometimes I do something that drives the point home all too well.

Like this past Saturday. I was informed by some players on the PL/SQL Challenge that the results for last week's quizzes were not showing up. This is usually due to an error during the refresh of the ranking materialized views. I took a look and found that the problem was much worse: all of the answers to last week's quizzes were gone. As in deleted.

"How could this be?" I shouted silently. Well, you know how you get this feeling sometimes in your stomach, that feeling that says "You know how this could have happened, don't you, Steven?"

It didn't take me long to track it down. I kinda remembered working on cleanup code that would remove answers submitted by an admin on the site (an admin cannot compete in these quizzes). I kinda also remembered that I hadn't ever really finished it, but that was OK, because the logic was in a subprogram that wasn't invoked.

I kinda also remembered how bad my memory is. So I looked at the code and found:

1. The subprogram was invoked by the main process. Uh-oh.

2. The WHERE clause on the cleanup routine was awful, just awful. Instead of saying "Remove any answers submitted by a domain admin.", it said in effect "Remove any answers submitted for questions that were written by a domain admin."

Most of our quizzes are written by the admins in their respective domains.

In other words: bye, bye, quizzes!

Sweat beaded on my brow. My fingers shook. A whole week's worth of quiz activity gone? Awful to contemplate (though, of course, not the end of the world).

I immediately ran a flashback query but sadly too much time had already passed since the delete. I could not recover the data myself.

Fortunately (and not the least bit surprisingly), our operational IT staff was able to restore the data and in the end "no harm done." And that's because they use Oracle DataGuard to create a standby database synched to the production database. They then used our wonderful Flashback feature to go back to the desired point in time, and then copy the three depleted tables to another instance for the recovery process. So nice....

But I was (and am) mortified at how unprofessional I was in this situation. Obiously, I didn't test this enhancement. My excuse is that I thought I wasn't done with it and had kept it out of the execution flow. Still I made changes to the package and didn't verify that the code was disabled.

But my deeper shame was a realization as to how much I had veered from one of my favorite and most important best practices:

Hide your SQL behind an API.
a.k.a., Stop Writing So Much SQL!

In other words, from my application code (in this case, the Application Express interface and even many of the "behind the scenes" packages), I should not execute any non-query DML directly (inserts, updates, deletes). Instead, I should call predefined procedures to get the job done.

[For lots more detail on this approach, check out Bryn Llewellyn's Why Use PL/SQL?]

For example, I already had in place a "remove answer" procedure that copied the answers to an archive table before doing the actual delete.

   PROCEDURE remove_answer (comp_event_id_in    IN INTEGER,
                            user_id_in          IN INTEGER,
                            reverse_points_in   IN BOOLEAN DEFAULT TRUE)
   IS
   BEGIN
      /* Copy rows to archive table */
      INSERT INTO qdb_compev_answers_d ...

      /* Now do the delete */
      DELETE FROM qdb_compev_answers eva
            WHERE     eva.comp_event_id = comp_event_id_in
                  AND eva.user_id = user_id_in;
   END;

But I hadn't been careful about always using it. I found more than one place in my code that executed a DELETE FROM qdb_compev_answers statement directly. - and therefore not copying the rows to the archive first. Including the horrible, no-good, very bad statement with the wrong where clause.

So after I posted in my request to IT to PLEASE HELP ME ASAP!, I searched across all my package bodies for "delete from qdb_compev_answers" and replaced them with calls to qdb_player_mgr.remove_answer. And, yes, I also searched my APEX application code to make sure I wasn't an even worse programmer than I already felt myself to be, by sticking non-query DML in that level of the stack.

At least I can report with some, ahem, shred of self-dignity, that the APEX application did not do any deletes against the answers table that did not go through the API.

So in the end, all data was restored. All players were happy. And my code quality improved. Even the IT support team was pleased to exercise and validate their recovery processes.

But did I learn my lesson? Will I be more careful with my code? Will I be more rigorous about using my SQL API?

Yes, yes! Lesson learned! I promise!

* Funny story about engineers:  Got on a crowded plane recently, with a seat all the way in the back. Made it to 29D and was about to sit in it, when I noticed a fellow right behind me. He said: "I'm in 29F." We looked towards the window. There was already somebody in 29F. He looked back at us. After a moment, I shrugged and asked him: "Are you sure you're supposed to be in 29F?" He looked up and said: "I'm a mechanical engineer. I think I know how to find the right seat on an airplane." Um, OK. Thirty seconds later, he looked up again, this time sheepishly: "Huh. I'm supposed to be in 30F. Sorry about that."  Moral of the story: Even engineers get it wrong sometimes.

2 comments:

  1. Hello Steven,

    Glad to hear that our week was saved :):)

    The moral of this story is only one, as far as I can see:

    NEVER, EVER do anything in a PRODUCTION system, without having TESTED it first in a NON-PRODUCTION environment !

    And this is ALWAYS TRUE, regardless of whether you do use API only or direct DML, or any other coding paradigm.

    Methodology can only help to make things more easily manageable, testable, maintainable, and so on.

    But it cannot help to replace testing.

    I personally do believe in one's own testing, more than I believe in tests performed by others, and this is my eternal controversy with the real world that I am living in ...
    At our company there exists an excessively thick QA staff, who are checking each and every scenario included in their check lists,
    and, in spite of this, bugs do make their ways into production systems far too often ...

    But, the mere existence of QA department kind of "replaced" the self-checks that were traditionally performed
    by the developers themselves, in the (I say, old, good) times when no one has even thought at all of employing a QA team ...


    I like very much the idea that most software developers are not engineers :):)

    For some reason, for already many years it sounds kind of "degrading yourself" if you tell somebody
    that you are (just) a developer ... and some years ago this was still called a "programmer" ...

    Now everybody is a "software engineer" ... how else ?

    I do believe that studying any discipline does help in developing one's brain into something "organized" and capable of supervising organized activities ... but, for that matter, developing software is not less an ART than it is an engineer activity.

    In my opinion, the quality of the software that one produces does depend mainly on his/her deepest personality characteristics as a human being, and much less on his/her vicinity with one engineering field or another.

    A good developer loves what he is doing and loves to test himself and his software creation, loves to be able to be proud of his own product, while one who is (just) an engineer and not also an "artist soul" usually writes code just because he is being paid well for doing it, and leaves the testing to others, who are "less engineers" ...


    Cheers & Best Regards,
    Iudith

    ReplyDelete
  2. 1. I'm an amateur too.
    2. I've done the very similar, believing I had covered all my bases, just forgetting that *one* small detail.
    3. We're all human. Yay!

    ReplyDelete