Skip to main content

Deterministic functions, caching, and worries about consistent data

A developer contacted me with the following questions last week:

We have created a function that returns a single row column value form a query. When we call this function with the same input values it takes to long to return. Example:
select max (det_function('A2')) from dual connect by rownum <= 1000000
But when we change the function to a deterministic function the statement returns really fast. The only thing where we are unsure is what happens when the tables has changed to which the statement of the function selects? Do we need a to commit this table to bring oracle to re-execute the statement in the function and not use the cache or what should we do to get a consistent return value?
FUNCTION det_function (v_in_id VARCHAR2) RETURN NUMBER DETERMINISTIC
AS
   v_ident   NUMBER;
BEGIN
   SELECT VALUE INTO v_ident
     FROM my_table
    WHERE id = v_in_id;

   RETURN v_ident;
EXCEPTION
   WHEN VALUE_ERROR OR NO_DATA_FOUND THEN RETURN -1;
END;
A function is deterministic if the value returned by the function is determined entirely by its input(s).

The following function, for example, is deterministic:
FUNCTION betwnstr (
   string_in      IN   VARCHAR2
 , start_in       IN   INTEGER
 , end_in         IN   INTEGER
)
   RETURN VARCHAR2
IS
BEGIN
   RETURN (SUBSTR (string_in, start_in, end_in - start_in + 1));
END betwnstr;
You can also quickly see, I hope, that any function that contains SQL (like the first function defined above) cannot possibly be deterministic: it depends on the contents of one or more tables to do its job, and those datasets are not passed as IN parameters.

Does that mean the compiler will complain? No! But it does mean that you could create real problems for yourself if you are not careful about your use of this keyword.

So the rule should be: Only add the DETERMINISTIC keyword to truly deterministic functions.

Why? Why should it matter? Because under certain circumstances (such as the one identified by the developer above), Oracle Database will not execute your function, but instead simply use a previously cached return value.

Within the scope of a single server call (e.g., execution of a PL/SQL block), Oracle Database will keep track of input and return values for your deterministic functions. If in that same server call, you pass the same input values to the function, the database engine may choose to not actually execute the function, but instead simply pass back the previously-computed return value (for those same inputs).

That's why this developer saw such a great leap forward in performance.

Once that SELECT statement finishes, though, memory for the cache is released. When and if that same query is run again, the engine will start rebuilding and using that cache.

While that statement executing, though, no matter what sort of changes are made to the table, no matter if a commit is issued or not, those changes will not be visible to the statement that called the function.

That's why I will repeat The Rule again:

Only add the DETERMINISTIC keyword to truly deterministic functions.

If your function contains a SELECT statement and you want to call it from a SELECT statement, the best thing to do is take the SQL out of the function and "merge" it into your SQL - in other words, no user-defined functions. Just SQL.

Rob van Wijk offers lots more details on the behavior and performance of deterministic functions here. You will also be well-served to read Bryn Llewellyn's in-depth exploration of How to write a safe result-cached function.

Rather than repeat all those findings, I will simply conclude with:

1. Use the DETERMINISTIC function primarily as a way to document to future developers that your function is currently free of side effects, and should stay that way.

2. If you are looking for ways to improve the performance of functions executed inside SQL, learn more about the UDF pragma (new in Oracle Database 12c Release 1).

3. See if the function result cache feature (also explored in Bryn's blog post) might be applicable to your situation.

4. Do not call user-defined functions from SQL statements that in turn contain SQL statements (or at least do so with extreme caution). That SQL inside the function is not part of the same read-consistent image as the data set identified by the "outer" SQL.

Comments

  1. It would be useful to have some way of identifying functions as being "deterministic within a transaction." For functions based on slow-changing data.

    I'm sure it's not at all uncommon to have functions which return values based on database tables, but where those values are not expected to change often (if at all). I've often seen a "system parameter" table to hold settings, and a function written to access them easily. These functions are liable to be mislabelled as deterministic, even though they're not, because it's usually OK to treat them that way. Of course that's a bad idea because of the 1% of times when it causes real problems.

    It'd be good to have a way to deal with this without lying to the database about a function being deterministic.

    ReplyDelete
  2. Just use the function result cache?

    ReplyDelete
  3. Why can't we use the combination of deterministic and result_cache relies on <> together.
    What would be the impact of using both of them together ?

    ReplyDelete
  4. I am intrigued as to whether a table function can be DETERMINISTIC. For example imagine a table function TFN_A(p1 IN NUMBER, p2 IN NUMBER, p3 IN NUMBER) that executes a cursor using p1,p2,p3 as filter values and then pipes out the result set. If it is called more than once in the same transaction with identical parameter values for p1,p2,p3 then Oracles point in time consistency would ensure that the same data would have to be returned on each call. So if declared as DETERMINISTIC would the rowset of the first invocation be buffered? and then used on subsequent invocations?

    ReplyDelete
    Replies
    1. That's a lovely fantasy. :-)

      Yes, you can use a function declared as deterministic as a table function.

      And you probably already can guess that it doesn't work as you describe.

      As to whether or not it might do that in the future.....well....you might want to submit the idea to give it a chance!

      https://community.oracle.com/tech/apps-infra/categories/database-ideas-ideas

      Delete

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of wo...

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch, ...

PL/SQL 101: Three ways to get error message/stack in PL/SQL

The PL/SQL Challenge quiz for 10 September - 16 September 2016 explored the different ways you can obtain the error message / stack in PL/SQL. Note: an error stack is a sequence of multiple error messages that can occur when an exception is propagated and re-raised through several layers of nested blocks. The three ways are: SQLERRM - The original, traditional and (oddly enough) not currently recommended function to get the current error message. Not recommended because the next two options avoid a problem which you are unlikely  to run into: the error stack will be truncated at 512 bytes, and you might lose some error information. DBMS_UTILITY.FORMAT_ERROR_STACK - Returns the error message / stack, and will not truncate your string like SQLERRM will. UTL_CALL_STACK API - Added in Oracle Database 12c, the UTL_CALL_STACK package offers a comprehensive API into the execution call stack, the error stack and the error backtrace.  Note: check out this LiveSQL script if...