Tuesday, November 25, 2014

Feedback on my Planning for Trouble Oracle Magazine article

Received feedback today from James S on my latest Oracle Magazine article, Planning for Trouble. I will respond here (there is no mechanism for comments/discussion at the Oracle Magazine page). If you haven't yet read it, I suggest you do so before proceeding with this post.

Comment 1. This is regarding exception handling.  Let’s say the client calls procedure A, and A calls B, B calls C. Are you suggesting we add a “When others” + error logging logic in each level? Does it mean one error will be logged three times? (because it is captured and re-raised each time).

Good point, James. I make the argument in the article that you should trap the exception as close as possible to where it was raised, so that you can log the values of local variables that may help you diagnose the problem. 

[Here's reinforcement of that point (and other great points about doing effective execution tracing) - they call it contextual logging: http://www.stackify.com/smarter-errors-logs-putting-data-work

But if you apply this rule to A calls B calls C, and C raises an exception, then you could well have three rows inserted into your error log. 

So, first, ask yourself: is this a problem? Generally more info is better than less, but of course you can overwhelmed by too much data.

If you decide that you really don't want that, then you could get "fancy" and in your logging routine set a flag in a package that indicates "already logged". Then in outer blocks, your logger could check the flag and only insert into the log table if it's not already logged.

Does anyone do this now?

Comment 2. I don’t think there’s an issue with the Miguel way (only capture at level A). If you don’t know what will happen, just let it happen. The outer layer or the client can log it and hide it and display a friendly information on the UI.

Well, I guess I was not able to convince James of the value of logging local variable values. What do you think? Is this valuable to you? Am I missing something?

Comment 3: And I don’t think “ORA-00942: table or view does not exist” is a good example. If you try to access a non-existing table your code won’t compile at all (unless you use dynamic sql). And I think Oracle should tell us what is missing. Prior to 10g when you insert a string longer than column width it doesn’t  tell you which column goes wrong! Now the information makes more sense, it not only tells you the column name but also the column width and actual length of value.

Well, your "unless" certainly indicates it could be a runtime issue for logging, and we certainly do write lots of dynamic SQL, these days. As for "I think Oracle should tell us what is missing", I can only tell you that Oracle is following well-accepted guidelines for keeping databases secure. That's specifically for this error. Obviously, as you point out, in other cases, Oracle does give us very excellent context-specific information in the error message. 

Comment 4: Regarding the “assignment in the declaration section”: do we have a choice for constants? It has to be in the declaration right?

You are correct. If you are declaring a constant, it must be done in the declaration section. So you need to assess the risk: am I assigning a value that could have wildly different sizes or format? If so and if I need to trap the exception inside that block and log the problem, then I need to remove the CONSTANT keyword. if you are assigning a literal, then you should consider it part of your job as a disciplined professional to READ YOUR CODE and make sure you are not making a blooper as in:

c_number CONSTANT NUMBER(3) := 1234;

20 comments:

  1. Re: A calls B which calls C which raises an exception.

    This happens often in Oracle Forms. Unhandled exceptions - those without an exception handler in forms, cause the form to display a very unfriendly and not particularly useful error message along the the lines of "FRM-40735 raised unhandled exception ORA-01403" and since trigger names in forms are not unique to the form this doesn’t even narrow down where to look (and in the days before Oracle exceptions where easily searchable online, meant we had to scurry around looking through printed Oracle documentation. At the same time Forms had a built in and default exception "form_trigger_failure". Raising this exception causes forms to stop whatever it is doing and in general try to return control to the user. But if you trapped form_trigger_failure and logged it, it tended to come up as user defined exception 100501 and not as form_trigger_failure. And if you didn't re-raise it, the program wouldn't stop and would likely raise even more errors or even go into an infinite loop.

    My solution was to add something like this at the end of all program units and triggers in forms:

    exception
    when form_trigger_failure then
    raise;
    when others then

    raise form_trigger_failure;
    end;

    This approach can be adapted to pl/sql on the database. Create a package spec containing an exception that will be used through the application similar to form_trigger_failure. Then each time you write an exception handler first trap for the form_trigger_failure. If it is, it means the error was already logged and now we are just propagating back out to stop program flow. if the exception is not form_trigger_failure then log the error (likely to a column in a table or perhaps to some text file and finally raise form_trigger_failure.

    ReplyDelete
  2. That's a great idea, sorek! Now why didn't I think of that? :-)

    ReplyDelete
  3. Hi Steven,

    I agree with you that we should trap the exception as close as possible to where it was raised.

    For example, in procedure C I have to select the name for an employee, and I will do this:

    BEGIN
    SELECT emp_name INTO v_name FROM emp WHERE emp_id = v_emp_id;
    EXCEPTION
    WHEN NO_DATA_FOUND THEN
    RAISE_APPLICATION_ERROR(-20001,'employee id not found: '||v_emp_id);
    END;
    .......

    But I don't add an exception handling at the end of procedure C to capture WHEN OTHERS. I don't do the fancy "already logged" flag either, this unhandled error will go to the client side. When the client logs the error message, I have the error stack, I have the driving parameters (provided by the client) for procedure A to reproduce this error, and I have the most valuable information that I need.

    If I add a WHEN OTHERS in procedure C, I can't tell where the error comes from, so I have to log all local variables. That would be a lot of code in the exception handling section. I don't mind having three error log entries, but adding this WHEN OTHERS handling in each layer of code is a lot of work to do. If my code has good instrumentation, I simply turn it on and reproduce it to get the local variable values.

    Regarding ORA-00942, I never thought of the security issue. Thanks for pointing it out.

    I don't do a lot of dynamnic SQL in my work, only in these cases:
    1. building a dynamic query based on the combination of input parameters
    2. running DDL
    3. running SQL of the same patterns on different tables
    4. running some generic tool scripts e.g. extracting data

    And I believe using static SQL whenever possible is a best practice, isn't it?

    From James Su

    ReplyDelete
  4. James, I like your point: if you have instrumentation in place that logs parameter values, you don't have to save those when you log an error. But I doubt you could expect to have enough instrumentation all through your code so that you've traced states of local variables just in time - right before the error occurred.

    Dynamic vs. Static SQL: we should only use dynamic SQL when we NEED to. And you provide a great list of when you would need to.

    ReplyDelete
  5. I have to admit it, even when instrumentation is in place, I always find the most important thing not logged. Then I have to change and recompile the code to add the one I need.

    But the same thing happens to exception handling, you can't always log everything. Consider this examnple:

    CREATE TABLE parts (partnum NUMBER, partname VARCHAR2 (15))
    /

    INSERT INTO parts VALUES(1,'Mouse');
    INSERT INTO parts VALUES(2,'Keyboard');


    CREATE OR REPLACE PROCEDURE C
    AS
    v1 VARCHAR2(3);
    v2 VARCHAR2(3);
    v3 VARCHAR2(3);
    v4 VARCHAR2(3);
    v5 VARCHAR2(3);
    BEGIN
    ------ other codes here
    FOR v_cur IN (SELECT * FROM parts)
    LOOP
    v1 := v_cur.partname; -------- this will trigger an exception
    --------- other codes here
    END LOOP;
    ------ other codes here

    EXCEPTION
    WHEN OTHERS THEN
    --------- You can't log v_cur.partname here! You can log v1,v2,v3,v4,v5 but that doesn't help
    --------- code to insert error log
    END;
    /

    Another interesting thing I found:

    SELECT ABCD FROM parts;
    SELECT ABCD FROM parts
    *
    ERROR at line 1:
    ORA-00904: "ABCD": invalid identifier

    So in ORA-00904 Oracle does tell you which column name is invalid. Do we need to worry about security here?


    From James Su

    ReplyDelete
  6. I found this article by Tom:

    http://www.oracle.com/technetwork/issue-archive/2011/11-nov/o61asktom-512015.html

    ....
    Why You Really Want to Let Exceptions Propagate
    ....... The fact is that most exceptions should never be caught in PL/SQL, or if they are, they should be immediately reraised..........

    Maybe you can invite Tom to discuss here?

    ReplyDelete
  7. Some more interesting finding:

    https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1270688200346183091

    ......
    When 11g was under development, the PLSQL group asked me if there was anything I'd like to have added to PLSQL...

    I requested that WHEN OTHERS be removed from the language. They would not comply :) however they did make it a WARNING - a warning that if you have a when others that does not raise the error - you almost certainly have a bug in the developed code. ......

    Looks like Tom strongly dislike "WHEN OTHERS" :)

    --- From James Su

    ReplyDelete
  8. I have invited Tom to chime in. He may well feel that he has said all he needs to say in AskTom and at Oracle Magazine. In which case, I will put words in his mouth. :-)

    ReplyDelete
  9. What Tom hates is WHEN OTHERS without RAISE. But even with RAISE, is it necessary to capture exceptions at each level of procedure calls? This topic is a little different and I think he might be interested in.

    James Su

    ReplyDelete
  10. My preference is to catch things at the highest level. If A calls B calls C and C fails, logging the inputs to C isn't good enough. It was the call to A that failed - we need that entire state. We need the inputs to A (which will give us the inputs to B and to C ultimately). A is the only thing that has all of the information needed to reproduce the issue. We need to debug starting from A.

    This is the only time pretty much that you should use an autonomous transaction (to log the error).

    begin
    A;
    exception
    when others then
    log_error_via_autonomous_transaction( ........ program state ..... );
    RAISE;
    end;

    if you catch the exception in C, you would need to catch it in B, and in A as well - because you truly need the state that A built up in order to reproduce (or stand a chance of reproducing) the issue. Don't forget - A might have set a global, B might have updated a row/inserted a row/deleted a row/etc...

    If you catch it in A, you have the "environment" (eg: the cgi-environment for example, the user) and the inputs necessary to cause the cascade of events that ultimately cause C to fail.

    I'd rather not have to put a when others log-error; raise; in every single procedure.

    when others should be used (in my opinion) in two cases:

    a) you have allocated a resource such as a file handle and need to make sure you clean it up:

    begin
    f := utl_file.fopen
    ...
    utl_file.fclose(f);

    exception
    when others
    then
    if f is open, close it
    RAISE;
    end;

    and the begin/end block should be around the fopen/fclose (eg: not necessarily at the bottom of the procedure - you want as little code as humanly possible in between the begin/end block

    b) at the top level procedure - the invoker of everything - to save the state of that transaction (in my world view, the act of calling A is to process the transaction "A"). To get all of the inputs/environment/state necessary to reproduce the cascading set of events that cause C to fail...

    ReplyDelete
  11. Thanks for your response Tom.
    Can PL/SQL do some improvement to cleanup by itself when we forget to close files? Otherwise we might end up with orphan file handles when the variable is out of scope. This annoying thing doesn’t happen to cursors since PL/SQL will take care of them in case we forget to close cursors.
    I like the SQL injection example you demonstrated in your book. Can you show an example of how it can be abused when the table name is revealed in the ORA-00942 error?

    James Su

    ReplyDelete
  12. Hi all!
    Interesting debate, one of the always "hot topic" within my team, for the last years using Tom's like approach.
    "Can PL/SQL do some improvement..." Steven now it's time to call/invite Bryn! :-)

    Regards,
    Nuno O.

    ReplyDelete
  13. I just recall this utl_file.fclose_all which can be used when you decide to return the control back to the client. So no need to worry if the procedure forgets to capture when others and close files.

    James Su

    ReplyDelete
  14. Let's take a closer look at the three-procedure example:

    ---------- code list1: capture exception at each level ----------

    CREATE OR REPLACE PROCEDURE C
    AS
    v_cnt NUMBER;
    BEGIN
    EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM NON_EXISTING' INTO v_cnt;
    EXCEPTION
    WHEN OTHERS THEN
    DBMS_OUTPUT.PUT_LINE('log error in C: '||DBMS_UTILITY.FORMAT_ERROR_BACKTRACE||SQLERRM);
    RAISE;
    END;
    /

    CREATE OR REPLACE PROCEDURE B
    AS
    BEGIN
    C;
    EXCEPTION
    WHEN OTHERS THEN
    DBMS_OUTPUT.PUT_LINE('log error in B: '||DBMS_UTILITY.FORMAT_ERROR_BACKTRACE||SQLERRM);
    RAISE;
    END;
    /

    CREATE OR REPLACE PROCEDURE A
    AS
    BEGIN
    B;
    EXCEPTION
    WHEN OTHERS THEN
    DBMS_OUTPUT.PUT_LINE('log error in A: '||DBMS_UTILITY.FORMAT_ERROR_BACKTRACE||SQLERRM);
    RAISE;
    END;
    /


    jsu@JSU> EXEC A;
    log error in C: ORA-06512: at "JSU.C", line 5
    ORA-00942: table or view does not exist
    log error in B: ORA-06512: at "JSU.C", line 9
    ORA-06512: at "JSU.B", line 4
    ORA-00942: table or view does not exist
    log error in A: ORA-06512: at "JSU.B", line 8
    ORA-06512: at "JSU.A", line 4
    ORA-00942: table or view does not exist
    BEGIN A; END;

    *
    ERROR at line 1:
    ORA-00942: table or view does not exist
    ORA-06512: at "JSU.A", line 8
    ORA-06512: at line 1

    So you end up with three records in the error log table. Your client sees:
    ORA-06512: at "JSU.A", line 8
    In order to get the whole error stack, you need to analyze the error log table to find the links between those three records.

    ---------- code list2: capture exception only at the top level ----------

    CREATE OR REPLACE PROCEDURE C2
    AS
    v_cnt NUMBER;
    BEGIN
    EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM NON_EXISTING' INTO v_cnt;
    END;
    /

    CREATE OR REPLACE PROCEDURE B2
    AS
    BEGIN
    C2;
    END;
    /

    CREATE OR REPLACE PROCEDURE A2
    AS
    BEGIN
    B2;
    EXCEPTION
    WHEN OTHERS THEN
    DBMS_OUTPUT.PUT_LINE('log error in A2: '||DBMS_UTILITY.FORMAT_ERROR_BACKTRACE||SQLERRM);
    RAISE;
    END;
    /

    jsu@JSU> EXEC A2;
    log error in A2: ORA-06512: at "JSU.C2", line 5
    ORA-06512: at "JSU.B2", line 4
    ORA-06512: at "JSU.A2", line 4
    ORA-00942: table or view does not exist
    BEGIN A2; END;

    *
    ERROR at line 1:
    ORA-00942: table or view does not exist
    ORA-06512: at "JSU.A2", line 8
    ORA-06512: at line 1


    In this case, you have only one record in the error log table, and it contains the whole error stack. You check only one record and you can trace all the way up to procedure C2.
    Isn't this more convenient?

    ReplyDelete
  15. One quick answer then off to Thanksgiving with family. We need to separate both issues and features. It is definitely more convenient to just call backtrace at the top level - and that is sufficient, because it doesn't matter where you call backtrace in your stack - it will take you back to the line number of the raise (of course, if you re-raise along the way, you lose the original).

    But that's NOT the issue I was addressing. My concern is when A calls B calls C, if you do NOT trap and log in C, you lose the context in C.

    Sure, JUST logging in C is not sufficient. If you have local state in A and B and C, you need to trap and log in each of those as you propagate out from the inner raised error.

    Finally, convenience is the smallest concern here, is it not? To me, the idea would be to gather as much info as possible, top to bottom, bottom to top, in a way that does not impact production performance, so that when something goes wrong, you've got the best chance of understanding and fixing it - quickly.

    ReplyDelete
    Replies
    1. If you log the inputs of A, you have what you need. A might have done something to the session state, B might have done something to the session state. If you just have C's inputs - it isn't enough to reproduce the issue (the session state isn't set up, C might happily run with those inputs now - or it might fail in some other fashion because you don't have the session state set up). If you log at the level of A, B and C - then you would ignore C's inputs anyway - you'd be re-running A to reproduce the issue.


      If you log C's inputs and see they are grossly "wrong" (as in they should never have been those values) - you now need to know what B's inputs were to figure out how B could go so wrong. Which means you need to understand the context of A at that point.


      It is more than "convenience". Logging at every level just scatters everything you need all over the place and you have to put it all back together again. You need to be able to piece your error log together in such a way that you can build the entire error stack back (or you could just log the error stack once, have it all together, with the set of inputs to A necessary to reproduce the issue...)

      Delete
  16. Yes in procedure C you got the chance to log all the userful information locally. But I doubt anyone would write a big piece of code in the exception section. Usually one just calls a generic error log procedure with SQLERRM. And not all variables are available in this section (for example loop iterator). With code list2, we do lose the context in C, but that context can be obtained once you know how to reproduce it.

    Happy Thanksgiving!


    From James Su

    ReplyDelete
  17. Great discussion thanks Steven and Tom.

    Is an outcome of this somewhere in the middle of Steven's advice and Tom's advice?
    Should exceptions propagate as per Tom's advice but at the same time the code needs to be designed such that it covers Steven’s advice of knowing state at the lowest level (for example C as used in this discussion)?
    Or in other words our code design needs to lend itself to the lowest level state being derivable based on the caller's state (and each level in between; tips for this are in Steven's Planning for Trouble Oracle Magazine article)?

    Thanks
    Grant

    ReplyDelete
  18. James and Tom, I get that in theory having inputs from A lets you reproduce the flow all the way down to C. But that is one heck of a theory. It assumes that the state of tables can also be reproduced from those inputs. It also assumes that a developer or a team is set up to easily run that segment of the app to reproduce the behavior.

    It sounds like that works and is feasible for you. For me, when I am in the heat of battle (that is, trying to diagnose and fix a "live" bug), it is terribly helpful for me to have those "inner" state values immediately available.

    As for "write a big piece of code in the exception section", I am not sure what you mean. You need a good logging utility with an API that lets you add values of local variables (or whatever you want to save) without having to write lots of code, but that should, again, be part of the high quality logger you are using.

    ReplyDelete
  19. Hi Steven,
    I agree that sometimes it's very hard to reproduce an issue in another environment. But in order to proove your fix works, you need to reproduce the issue first. You just can't get around this step. It's possible to do quick hot fix on production without reproducing in other environments but this try and fail approach is risky.
    By "write a big piece of code in the exception section" I mean you capture all the local variables one by one in the exception. Yes there are better ways. You can collect the values in the execution section using your logger (I do this for input parameters, see the log_api package I published on your website:
    http://www.plsqlchallenge.com/pls/apex/f?p=10000:2039:6083112531269::NO:2039:P2039_RESOURCE_ID:3400:
    )
    I then store these values in the session (package variables), which means it's globally available, which means if I capture the exception only at level A, I still have a record of the local variable values in procedure C. I can log this information without the exception handling in procedure C.
    I hope you can show a solid example demonstrating the benefit of capturing exceptions at every level. Thanks!

    James Su

    ReplyDelete