Skip to main content

Is it time to always declare string variables as CLOBs and forget about VARCHAR2?

I received this email from a longtime Oracle Database developer:
Why shouldn't we always use clobs instead of varchar2 in PL/SQL? Memory will be allocated once used so there should not be any overhead always to use clobs instead of a varchar2. There must be a reason, however I'm unable to come up with a explanation. If it is true,  you might say the varchar2 is deprecated, ;-)
I had to laugh a bit when I saw this - not because I thought his question was funny or silly, but because it reminded me of what I sometimes say in my trainings.

I ask students:

Why don't we always declare all string variables as:

my_variable VARCHAR2(32767);

That way, we don't have to worry about a VALUE_ERROR exception if the string gets "too long". And since memory is allocated dynamically (for lengths above 4000), it won't cause excessive memory consumption.

And my answer(s) to this question are:

  1. This declaration - especially appearing all over the place - will raise questions in a developer's mind, including: "Will my strings get that large? Really?" and "What was wrong with the person who wrote this code?" Your code should answer not raise questions.
  2. You are assuming the maximum length of a VARCHAR2 will always be 32767. Maybe someday the maximum length will go up. Maybe someday there will be no difference between VARCHAR2 and CLOBs!
And now a developer wants to know: is there a difference? Maybe it's time to go all-out-CLOBBERS!

Or maybe not. It's true that you can now use many of the typical string functions on CLOBs, such as SUBSTR. You are no longer forced to use DBMS_LOB (though that can still be very helpful). But is there really no difference "under the covers"? Is there no penalty to be paid by switching to CLOBs?

Connor McDonald, a member of my Developer Advocates team and one half of the Ask Tom Answer Team, offers these demonstrations of some of the big differences between VARCHAR2 and CLOB:

SQL> create or replace
  2  procedure p2 is
  3   c clob;
  4  begin
  5    for i in 1 .. 1000000 loop
  6      c := rpad(i,100);
  7    end loop;
  8  end;
  9  /

Procedure created.

SQL> set timing on
SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.07

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.06

SQL> create or replace
  2  procedure p2 is
  3   c varchar2(200);
  4  begin
  5    for i in 1 .. 1000000 loop
  6      c := rpad(i,100);
  7    end loop;
  8  end;
  9  /

Procedure created.

SQL> set timing on
SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.29

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.30

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.30

And similarly, there's some significant memory allocation differences:

SQL> create or replace
  2  procedure p2a is
  3    type t is table of clob index by pls_integer;
  4    c t;
  5  begin
  6    for i in 1 .. 100000 loop
  7      c(i) := 'qweqweqweqweqw';
  8    end loop;
  9  end;
10  /

Procedure created.

SQL> create or replace
  2  procedure p2b is
  3    type t is table of varchar2(50) index by pls_integer;
  4    c t;
  5  begin
  6    for i in 1 .. 100000 loop
  7      c(i) := 'qweqweqweqweqw';
  8    end loop;
  9  end;
10  /

Procedure created.

SQL> conn mcdonac/*****
Connected.

SQL> exec p2a

PL/SQL procedure successfully completed.

SQL> @stat
Enter value for stat_prefix: pga

SID_STAT# NAME                                                    VALUE
---------- ----------------------------------- ----------
        37 session pga memory                  2360216
        38 session pga memory max              22610840

SQL> conn mcdonac/*****
Connected.

SQL> exec p2b

PL/SQL procedure successfully completed.

SQL> @stat
Enter value for stat_prefix: pga

SID_STAT# NAME                                                    VALUE
---------- ----------------------------------- ----------
        37 session pga memory                  918424
             38 session pga memory max              10486680 

So I would say the answer is pretty clear: only use CLOBs when you are working with strings whose length are greater than 32767.

The time to declare VARCHAR2s "deprecated" has not yet arrived!

And I bet my readers can offer lots of other reasons/demonstrations of the advantages of sticking to VARCHAR2. Well? Well? :-)

Comments

  1. Steven:

    To my mind, the best reason for not using varchar2(max) or clobs everywhere is the the length limit on the variable is really a constraint. A large percentage of the time in PL/SQL code, the varchar2 variable is, ultimately, going to be inserted into a column (hopefully) with a constrained length, so the variable declaration should match the column definition.

    John

    ReplyDelete
  2. OCI applications require an additional call to fetch the CLOB value. So there is a client side,and network IO performance impact as well.

    ReplyDelete
  3. The key thing to remember about CLOB variables is that they are NOT really variables. Although, as far as i understand, Oracle PL/SQL engine "cheats" and treats CLOBs below 32K as pseudo-VARCHAR2s.

    Under normal conditions, temporary LOBs should be internally stored in TEMP tablespace. As a result, all LOB operations cause physical IO. To illustrate it let's take your original example and slightly change:

    create or replace procedure misha.p2c is
    type t is table of clob index by pls_integer;
    c t;
    begin
    for i in 1 .. 100 loop
    c(i) := lpad('A',32767,'A');
    end loop;
    end;
    /

    create or replace procedure misha.p2d is
    type t is table of clob index by pls_integer;
    c t;
    begin
    for i in 1 .. 100 loop
    c(i) := lpad('A',32767,'A')||'Z';
    end loop;
    end;
    /

    P2C puts to CLOB exactly 32K, while P2D puts 32K+1. You would be amazed to see the difference in stats if you run these prosedures! For example:

    Type Name P2C P2D Diff
    ----- --------------------- ------------ ------------
    STAT lob writes 1 100 99
    STAT db block gets 42 4,200 4,158

    These 100 LOB writes in P2D definitely prove that LOBs behave differently below and above 32K limit. But back to your original test - I have a feeling that even small LOBs try to "mimic" Varchar2, still there is an overhead of maintaining two-tier LOB structure (i.e. LOB index + LOB data). And that would cause doubling of PGA utilization. But it is just my wild guess, need to run more tests.

    ReplyDelete
  4. While errors are painful... there is a valid survival reason for the existence of pain. Pain lets us know that something is wrong.

    If I have a variable that should be storing Y or N, then declaring it as a VARCHAR2(1) makes sense. If a string with a length greater than one is placed in there, my code will return an error. It *should* return an error because something has gone wrong. Having garbage data get placed in that variable without generating an error is not helpful.

    ReplyDelete
  5. Another aspect is that working with CLOBs and database links, tends to give all kind of problems.

    ReplyDelete

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of wo...

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch, ...

PL/SQL 101: Three ways to get error message/stack in PL/SQL

The PL/SQL Challenge quiz for 10 September - 16 September 2016 explored the different ways you can obtain the error message / stack in PL/SQL. Note: an error stack is a sequence of multiple error messages that can occur when an exception is propagated and re-raised through several layers of nested blocks. The three ways are: SQLERRM - The original, traditional and (oddly enough) not currently recommended function to get the current error message. Not recommended because the next two options avoid a problem which you are unlikely  to run into: the error stack will be truncated at 512 bytes, and you might lose some error information. DBMS_UTILITY.FORMAT_ERROR_STACK - Returns the error message / stack, and will not truncate your string like SQLERRM will. UTL_CALL_STACK API - Added in Oracle Database 12c, the UTL_CALL_STACK package offers a comprehensive API into the execution call stack, the error stack and the error backtrace.  Note: check out this LiveSQL script if...