I recently heard a very interesting story about a use for LOG ERRORS: namely, to verify the correctness (and improve the performance) of a problematic SQL statement.
I thought you might enjoy hearing about it.
Greg Belliveau of Digital Motorworks buttonholed me at ODTUG's Kscope14 conference with this report:
As part of a data mart refresh, we had an insert statement that refreshed a FACT table, and over the years had gotten to the point where it took well over 3 hours to complete.
There was a NOT IN clause in the statement, and we were pretty sure that was the cause of the degenerating performance. Why was it there? Our best guess was so that the statement would not fail in case someone ran the refresh for a date range that had already ran...even though there was code that checked and prevented that from happening.
[Steven: In other words, programmer's insurance. "Just in case, let's do this." That's always problematic in software. Is that a real "case"? Will it ever happen? What are the consequences of including this insurance? Will we document it so that others coming along later can figure out why this weird code is here? Usually not.]
Not wanting to lose the "safety net" that was in place, we decided to try the LOG ERRORS feature you had mentioned in your session at Oracle Open World in 2010. We removed the NOT IN clause and added LOG ERRORS. The insert statement then ran (and continues to run to this day) in roughly one minute (down from three hours!).
Oh, and there's never been a single row inserted in the error table!
Nice, nice, very nice.
It's always a bit scary to mess with code that's been around for years and (since it's been around for years) is a gnarly bunch of logic. SQL, with its set orientation, can be even more off-putting than, say, a PL/SQL function.
So, of course, the solution is to build a comprehensive automated regression test script so that you can compare before and after results.
Yes, but almost no one will (ever) do that. So we do what we can.
And in this case, Greg and his team came up with a very creative solution:
We are pretty sure NOT IN is causing the performance problem, but we also cannot afford to remove the clause and have the statement fail.
So we'll take it out, but add LOG ERRORS to ensure that all inserts are at least attempted.
Then we can check the error log table afterwards to see if the NOT IN really was excluding data that would have caused errors.
In Greg's case, the answer was "Nope". So far as they can tell, the NOT IN was unnecessary.
OK, but maybe it will be necessary a week or month or year from now. What then?
Well, the bad data that should have been excluded will still be excluded (the insert will fail), and then the job can check the log table and issue whatever kind of report (or alarm) is needed.
So LOG ERRORS as part regression test, part performance monster.
Nice work, Greg.
Nice work, Oracle.
I thought you might enjoy hearing about it.
Greg Belliveau of Digital Motorworks buttonholed me at ODTUG's Kscope14 conference with this report:
As part of a data mart refresh, we had an insert statement that refreshed a FACT table, and over the years had gotten to the point where it took well over 3 hours to complete.
There was a NOT IN clause in the statement, and we were pretty sure that was the cause of the degenerating performance. Why was it there? Our best guess was so that the statement would not fail in case someone ran the refresh for a date range that had already ran...even though there was code that checked and prevented that from happening.
[Steven: In other words, programmer's insurance. "Just in case, let's do this." That's always problematic in software. Is that a real "case"? Will it ever happen? What are the consequences of including this insurance? Will we document it so that others coming along later can figure out why this weird code is here? Usually not.]
Not wanting to lose the "safety net" that was in place, we decided to try the LOG ERRORS feature you had mentioned in your session at Oracle Open World in 2010. We removed the NOT IN clause and added LOG ERRORS. The insert statement then ran (and continues to run to this day) in roughly one minute (down from three hours!).
Oh, and there's never been a single row inserted in the error table!
Nice, nice, very nice.
It's always a bit scary to mess with code that's been around for years and (since it's been around for years) is a gnarly bunch of logic. SQL, with its set orientation, can be even more off-putting than, say, a PL/SQL function.
So, of course, the solution is to build a comprehensive automated regression test script so that you can compare before and after results.
Yes, but almost no one will (ever) do that. So we do what we can.
And in this case, Greg and his team came up with a very creative solution:
We are pretty sure NOT IN is causing the performance problem, but we also cannot afford to remove the clause and have the statement fail.
So we'll take it out, but add LOG ERRORS to ensure that all inserts are at least attempted.
Then we can check the error log table afterwards to see if the NOT IN really was excluding data that would have caused errors.
In Greg's case, the answer was "Nope". So far as they can tell, the NOT IN was unnecessary.
OK, but maybe it will be necessary a week or month or year from now. What then?
Well, the bad data that should have been excluded will still be excluded (the insert will fail), and then the job can check the log table and issue whatever kind of report (or alarm) is needed.
So LOG ERRORS as part regression test, part performance monster.
Nice work, Greg.
Nice work, Oracle.
Hi Steven.
ReplyDeleteI've recently discovered that Data Pump uses LOG ERRORS under the hood to implement the feature of "skip constraint errors" during import: http://www.db-oriented.com/2014/07/19/impdp-which-rows-failed
It's nice that Oracle uses its own features internally.
Thanks,
Oren.
Hello Steven & Oren,
ReplyDeleteVery nice, thanks to Oren for his comment :) :)
What a small world ... to find Oren's post on a blog that we both follow ...
and we also know each other from the last Israeli Oracle Week 2013 ...
we only missed Steven to join us there, am I right, Oren ?
Regarding the automatic error logging performed during IMPDP ...
I read the post on your blog and saw the nice trick for temporarily preserving the
error logging table ... but, I wonder, why doesn't Oracle always preserve this table
as a default behavior ... in fact ... it is NOT of much use to fill a table with data
and then just drop it before anyone has the chance to use that data ...
I wonder whether other tricks are possible, like, for example,
using a DDL event trigger to catch the creation of the ERRDP$ table,
and then create another "shadow" table for it and a trigger to copy the data
from ERRDP$ into the shadow table which will remain ...
I cannot but express my hope that we all will meet one day :) :)
Thanks a lot & Best Regards,
Iudith