So early in the morning on Saturday, we run weekly processes to rank the past week's quizzes, and do a whole lot of cleaning-up.
Therein lies the tale of this post. But before getting into the story, a little "ancient history."
In 1978-1980, while I attended the University of Rochester, I took three "101" software classes (Algol, Lisp and something else). That is the extent of my formal training as a software developer. Move forward in time 36 years or so, and here I am: a highly regarded PL/SQL programmer, trainer, author.
Still, though, essentially self-taught and definitely not a software engineer.
I have the highest regard for engineers in general, and software engineers in particular. They combine deep knowledge with a strong sense of discipline. The result, often though of course not always, is a product (be it a bridge or an application) that is relatively bug-free and works well for a long time.
My sense is that most software developers are not engineers. But I don't need to speak about other developers. I can speak with absolute clarity on this topic just by looking in the mirror as I type.
I sometimes shock my audiences by telling them I am an "amateur" developer and I imagine they don't believe me. But then sometimes I do something that drives the point home all too well.
Like this past Saturday. I was informed by some players on the PL/SQL Challenge that the results for last week's quizzes were not showing up. This is usually due to an error during the refresh of the ranking materialized views. I took a look and found that the problem was much worse: all of the answers to last week's quizzes were gone. As in deleted.
"How could this be?" I shouted silently. Well, you know how you get this feeling sometimes in your stomach, that feeling that says "You know how this could have happened, don't you, Steven?"
It didn't take me long to track it down. I kinda remembered working on cleanup code that would remove answers submitted by an admin on the site (an admin cannot compete in these quizzes). I kinda also remembered that I hadn't ever really finished it, but that was OK, because the logic was in a subprogram that wasn't invoked.
I kinda also remembered how bad my memory is. So I looked at the code and found:
1. The subprogram was invoked by the main process. Uh-oh.
2. The WHERE clause on the cleanup routine was awful, just awful. Instead of saying "Remove any answers submitted by a domain admin.", it said in effect "Remove any answers submitted for questions that were written by a domain admin."
Most of our quizzes are written by the admins in their respective domains.
In other words: bye, bye, quizzes!
Sweat beaded on my brow. My fingers shook. A whole week's worth of quiz activity gone? Awful to contemplate (though, of course, not the end of the world).
I immediately ran a flashback query but sadly too much time had already passed since the delete. I could not recover the data myself.
Fortunately (and not the least bit surprisingly), our operational IT staff was able to restore the data and in the end "no harm done." And that's because they use Oracle DataGuard to create a standby database synched to the production database. They then used our wonderful Flashback feature to go back to the desired point in time, and then copy the three depleted tables to another instance for the recovery process. So nice....
But I was (and am) mortified at how unprofessional I was in this situation. Obiously, I didn't test this enhancement. My excuse is that I thought I wasn't done with it and had kept it out of the execution flow. Still I made changes to the package and didn't verify that the code was disabled.
But my deeper shame was a realization as to how much I had veered from one of my favorite and most important best practices:
Hide your SQL behind an API.
a.k.a., Stop Writing So Much SQL!
a.k.a., Stop Writing So Much SQL!
In other words, from my application code (in this case, the Application Express interface and even many of the "behind the scenes" packages), I should not execute any non-query DML directly (inserts, updates, deletes). Instead, I should call predefined procedures to get the job done.
[For lots more detail on this approach, check out Bryn Llewellyn's Why Use PL/SQL?]
For example, I already had in place a "remove answer" procedure that copied the answers to an archive table before doing the actual delete.
PROCEDURE remove_answer (comp_event_id_in IN INTEGER,
user_id_in IN INTEGER,
reverse_points_in IN BOOLEAN DEFAULT TRUE)
/* Copy rows to archive table */
INSERT INTO qdb_compev_answers_d ...
/* Now do the delete */
DELETE FROM qdb_compev_answers eva
WHERE eva.comp_event_id = comp_event_id_in
AND eva.user_id = user_id_in;
But I hadn't been careful about always using it. I found more than one place in my code that executed a DELETE FROM qdb_compev_answers statement directly. - and therefore not copying the rows to the archive first. Including the horrible, no-good, very bad statement with the wrong where clause.
So after I posted in my request to IT to PLEASE HELP ME ASAP!, I searched across all my package bodies for "delete from qdb_compev_answers" and replaced them with calls to qdb_player_mgr.remove_answer. And, yes, I also searched my APEX application code to make sure I wasn't an even worse programmer than I already felt myself to be, by sticking non-query DML in that level of the stack.
At least I can report with some, ahem, shred of self-dignity, that the APEX application did not do any deletes against the answers table that did not go through the API.
So in the end, all data was restored. All players were happy. And my code quality improved. Even the IT support team was pleased to exercise and validate their recovery processes.
But did I learn my lesson? Will I be more careful with my code? Will I be more rigorous about using my SQL API?
Yes, yes! Lesson learned! I promise!
* Funny story about engineers: Got on a crowded plane recently, with a seat all the way in the back. Made it to 29D and was about to sit in it, when I noticed a fellow right behind me. He said: "I'm in 29F." We looked towards the window. There was already somebody in 29F. He looked back at us. After a moment, I shrugged and asked him: "Are you sure you're supposed to be in 29F?" He looked up and said: "I'm a mechanical engineer. I think I know how to find the right seat on an airplane." Um, OK. Thirty seconds later, he looked up again, this time sheepishly: "Huh. I'm supposed to be in 30F. Sorry about that." Moral of the story: Even engineers get it wrong sometimes.