Skip to main content

I Love Backups and APIs, a.k.a., Test Delete Processes VERY THOROUGHLY

The PL/SQL Challenge offers weekly quizzes on SQL, PL/SQL, Database Design, Application Express, and Logic. A week on this website starts on Saturday at 00:00:00  UTC and ends on Friday at 23:59:59 UTC.

So early in the morning on Saturday, we run weekly processes to rank the past week's quizzes, and do a whole lot of cleaning-up.

Therein lies the tale of this post. But before getting into the story, a little "ancient history."

In 1978-1980, while I attended the University of Rochester, I took three "101" software classes (Algol, Lisp and something else). That is the extent of my formal training as a software developer. Move forward in time 36 years or so, and here I am: a highly regarded PL/SQL programmer, trainer, author.

Still, though, essentially self-taught and definitely not a software engineer.

I have the highest regard for engineers in general, and software engineers in particular. They combine deep knowledge with a strong sense of discipline. The result, often though of course not always, is a product (be it a bridge or an application) that is relatively bug-free and works well for a long time.

My sense is that most software developers are not engineers. But I don't need to speak about other developers. I can speak with absolute clarity on this topic just by looking in the mirror as I type.

I sometimes shock my audiences by telling them I am an "amateur" developer and I imagine they don't believe me. But then sometimes I do something that drives the point home all too well.

Like this past Saturday. I was informed by some players on the PL/SQL Challenge that the results for last week's quizzes were not showing up. This is usually due to an error during the refresh of the ranking materialized views. I took a look and found that the problem was much worse: all of the answers to last week's quizzes were gone. As in deleted.

"How could this be?" I shouted silently. Well, you know how you get this feeling sometimes in your stomach, that feeling that says "You know how this could have happened, don't you, Steven?"

It didn't take me long to track it down. I kinda remembered working on cleanup code that would remove answers submitted by an admin on the site (an admin cannot compete in these quizzes). I kinda also remembered that I hadn't ever really finished it, but that was OK, because the logic was in a subprogram that wasn't invoked.

I kinda also remembered how bad my memory is. So I looked at the code and found:

1. The subprogram was invoked by the main process. Uh-oh.

2. The WHERE clause on the cleanup routine was awful, just awful. Instead of saying "Remove any answers submitted by a domain admin.", it said in effect "Remove any answers submitted for questions that were written by a domain admin."

Most of our quizzes are written by the admins in their respective domains.

In other words: bye, bye, quizzes!

Sweat beaded on my brow. My fingers shook. A whole week's worth of quiz activity gone? Awful to contemplate (though, of course, not the end of the world).

I immediately ran a flashback query but sadly too much time had already passed since the delete. I could not recover the data myself.

Fortunately (and not the least bit surprisingly), our operational IT staff was able to restore the data and in the end "no harm done." And that's because they use Oracle DataGuard to create a standby database synched to the production database. They then used our wonderful Flashback feature to go back to the desired point in time, and then copy the three depleted tables to another instance for the recovery process. So nice....

But I was (and am) mortified at how unprofessional I was in this situation. Obiously, I didn't test this enhancement. My excuse is that I thought I wasn't done with it and had kept it out of the execution flow. Still I made changes to the package and didn't verify that the code was disabled.

But my deeper shame was a realization as to how much I had veered from one of my favorite and most important best practices:

Hide your SQL behind an API.
a.k.a., Stop Writing So Much SQL!

In other words, from my application code (in this case, the Application Express interface and even many of the "behind the scenes" packages), I should not execute any non-query DML directly (inserts, updates, deletes). Instead, I should call predefined procedures to get the job done.

[For lots more detail on this approach, check out Bryn Llewellyn's Why Use PL/SQL?]

For example, I already had in place a "remove answer" procedure that copied the answers to an archive table before doing the actual delete.

   PROCEDURE remove_answer (comp_event_id_in    IN INTEGER,
                            user_id_in          IN INTEGER,
                            reverse_points_in   IN BOOLEAN DEFAULT TRUE)
   IS
   BEGIN
      /* Copy rows to archive table */
      INSERT INTO qdb_compev_answers_d ...

      /* Now do the delete */
      DELETE FROM qdb_compev_answers eva
            WHERE     eva.comp_event_id = comp_event_id_in
                  AND eva.user_id = user_id_in;
   END;

But I hadn't been careful about always using it. I found more than one place in my code that executed a DELETE FROM qdb_compev_answers statement directly. - and therefore not copying the rows to the archive first. Including the horrible, no-good, very bad statement with the wrong where clause.

So after I posted in my request to IT to PLEASE HELP ME ASAP!, I searched across all my package bodies for "delete from qdb_compev_answers" and replaced them with calls to qdb_player_mgr.remove_answer. And, yes, I also searched my APEX application code to make sure I wasn't an even worse programmer than I already felt myself to be, by sticking non-query DML in that level of the stack.

At least I can report with some, ahem, shred of self-dignity, that the APEX application did not do any deletes against the answers table that did not go through the API.

So in the end, all data was restored. All players were happy. And my code quality improved. Even the IT support team was pleased to exercise and validate their recovery processes.

But did I learn my lesson? Will I be more careful with my code? Will I be more rigorous about using my SQL API?

Yes, yes! Lesson learned! I promise!

* Funny story about engineers:  Got on a crowded plane recently, with a seat all the way in the back. Made it to 29D and was about to sit in it, when I noticed a fellow right behind me. He said: "I'm in 29F." We looked towards the window. There was already somebody in 29F. He looked back at us. After a moment, I shrugged and asked him: "Are you sure you're supposed to be in 29F?" He looked up and said: "I'm a mechanical engineer. I think I know how to find the right seat on an airplane." Um, OK. Thirty seconds later, he looked up again, this time sheepishly: "Huh. I'm supposed to be in 30F. Sorry about that."  Moral of the story: Even engineers get it wrong sometimes.

Comments

  1. Hello Steven,

    Glad to hear that our week was saved :):)

    The moral of this story is only one, as far as I can see:

    NEVER, EVER do anything in a PRODUCTION system, without having TESTED it first in a NON-PRODUCTION environment !

    And this is ALWAYS TRUE, regardless of whether you do use API only or direct DML, or any other coding paradigm.

    Methodology can only help to make things more easily manageable, testable, maintainable, and so on.

    But it cannot help to replace testing.

    I personally do believe in one's own testing, more than I believe in tests performed by others, and this is my eternal controversy with the real world that I am living in ...
    At our company there exists an excessively thick QA staff, who are checking each and every scenario included in their check lists,
    and, in spite of this, bugs do make their ways into production systems far too often ...

    But, the mere existence of QA department kind of "replaced" the self-checks that were traditionally performed
    by the developers themselves, in the (I say, old, good) times when no one has even thought at all of employing a QA team ...


    I like very much the idea that most software developers are not engineers :):)

    For some reason, for already many years it sounds kind of "degrading yourself" if you tell somebody
    that you are (just) a developer ... and some years ago this was still called a "programmer" ...

    Now everybody is a "software engineer" ... how else ?

    I do believe that studying any discipline does help in developing one's brain into something "organized" and capable of supervising organized activities ... but, for that matter, developing software is not less an ART than it is an engineer activity.

    In my opinion, the quality of the software that one produces does depend mainly on his/her deepest personality characteristics as a human being, and much less on his/her vicinity with one engineering field or another.

    A good developer loves what he is doing and loves to test himself and his software creation, loves to be able to be proud of his own product, while one who is (just) an engineer and not also an "artist soul" usually writes code just because he is being paid well for doing it, and leaves the testing to others, who are "less engineers" ...


    Cheers & Best Regards,
    Iudith

    ReplyDelete
  2. 1. I'm an amateur too.
    2. I've done the very similar, believing I had covered all my bases, just forgetting that *one* small detail.
    3. We're all human. Yay!

    ReplyDelete

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of wo...

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch, ...

PL/SQL 101: Three ways to get error message/stack in PL/SQL

The PL/SQL Challenge quiz for 10 September - 16 September 2016 explored the different ways you can obtain the error message / stack in PL/SQL. Note: an error stack is a sequence of multiple error messages that can occur when an exception is propagated and re-raised through several layers of nested blocks. The three ways are: SQLERRM - The original, traditional and (oddly enough) not currently recommended function to get the current error message. Not recommended because the next two options avoid a problem which you are unlikely  to run into: the error stack will be truncated at 512 bytes, and you might lose some error information. DBMS_UTILITY.FORMAT_ERROR_STACK - Returns the error message / stack, and will not truncate your string like SQLERRM will. UTL_CALL_STACK API - Added in Oracle Database 12c, the UTL_CALL_STACK package offers a comprehensive API into the execution call stack, the error stack and the error backtrace.  Note: check out this LiveSQL script if...