Skip to main content

Is it time to always declare string variables as CLOBs and forget about VARCHAR2?

I received this email from a longtime Oracle Database developer:
Why shouldn't we always use clobs instead of varchar2 in PL/SQL? Memory will be allocated once used so there should not be any overhead always to use clobs instead of a varchar2. There must be a reason, however I'm unable to come up with a explanation. If it is true,  you might say the varchar2 is deprecated, ;-)
I had to laugh a bit when I saw this - not because I thought his question was funny or silly, but because it reminded me of what I sometimes say in my trainings.

I ask students:

Why don't we always declare all string variables as:

my_variable VARCHAR2(32767);

That way, we don't have to worry about a VALUE_ERROR exception if the string gets "too long". And since memory is allocated dynamically (for lengths above 4000), it won't cause excessive memory consumption.

And my answer(s) to this question are:

  1. This declaration - especially appearing all over the place - will raise questions in a developer's mind, including: "Will my strings get that large? Really?" and "What was wrong with the person who wrote this code?" Your code should answer not raise questions.
  2. You are assuming the maximum length of a VARCHAR2 will always be 32767. Maybe someday the maximum length will go up. Maybe someday there will be no difference between VARCHAR2 and CLOBs!
And now a developer wants to know: is there a difference? Maybe it's time to go all-out-CLOBBERS!

Or maybe not. It's true that you can now use many of the typical string functions on CLOBs, such as SUBSTR. You are no longer forced to use DBMS_LOB (though that can still be very helpful). But is there really no difference "under the covers"? Is there no penalty to be paid by switching to CLOBs?

Connor McDonald, a member of my Developer Advocates team and one half of the Ask Tom Answer Team, offers these demonstrations of some of the big differences between VARCHAR2 and CLOB:

SQL> create or replace
  2  procedure p2 is
  3   c clob;
  4  begin
  5    for i in 1 .. 1000000 loop
  6      c := rpad(i,100);
  7    end loop;
  8  end;
  9  /

Procedure created.

SQL> set timing on
SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.07

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.06

SQL> create or replace
  2  procedure p2 is
  3   c varchar2(200);
  4  begin
  5    for i in 1 .. 1000000 loop
  6      c := rpad(i,100);
  7    end loop;
  8  end;
  9  /

Procedure created.

SQL> set timing on
SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.29

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.30

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.30

And similarly, there's some significant memory allocation differences:

SQL> create or replace
  2  procedure p2a is
  3    type t is table of clob index by pls_integer;
  4    c t;
  5  begin
  6    for i in 1 .. 100000 loop
  7      c(i) := 'qweqweqweqweqw';
  8    end loop;
  9  end;
10  /

Procedure created.

SQL> create or replace
  2  procedure p2b is
  3    type t is table of varchar2(50) index by pls_integer;
  4    c t;
  5  begin
  6    for i in 1 .. 100000 loop
  7      c(i) := 'qweqweqweqweqw';
  8    end loop;
  9  end;
10  /

Procedure created.

SQL> conn mcdonac/*****
Connected.

SQL> exec p2a

PL/SQL procedure successfully completed.

SQL> @stat
Enter value for stat_prefix: pga

SID_STAT# NAME                                                    VALUE
---------- ----------------------------------- ----------
        37 session pga memory                  2360216
        38 session pga memory max              22610840

SQL> conn mcdonac/*****
Connected.

SQL> exec p2b

PL/SQL procedure successfully completed.

SQL> @stat
Enter value for stat_prefix: pga

SID_STAT# NAME                                                    VALUE
---------- ----------------------------------- ----------
        37 session pga memory                  918424
             38 session pga memory max              10486680 

So I would say the answer is pretty clear: only use CLOBs when you are working with strings whose length are greater than 32767.

The time to declare VARCHAR2s "deprecated" has not yet arrived!

And I bet my readers can offer lots of other reasons/demonstrations of the advantages of sticking to VARCHAR2. Well? Well? :-)

Comments

  1. Steven:

    To my mind, the best reason for not using varchar2(max) or clobs everywhere is the the length limit on the variable is really a constraint. A large percentage of the time in PL/SQL code, the varchar2 variable is, ultimately, going to be inserted into a column (hopefully) with a constrained length, so the variable declaration should match the column definition.

    John

    ReplyDelete
  2. OCI applications require an additional call to fetch the CLOB value. So there is a client side,and network IO performance impact as well.

    ReplyDelete
  3. The key thing to remember about CLOB variables is that they are NOT really variables. Although, as far as i understand, Oracle PL/SQL engine "cheats" and treats CLOBs below 32K as pseudo-VARCHAR2s.

    Under normal conditions, temporary LOBs should be internally stored in TEMP tablespace. As a result, all LOB operations cause physical IO. To illustrate it let's take your original example and slightly change:

    create or replace procedure misha.p2c is
    type t is table of clob index by pls_integer;
    c t;
    begin
    for i in 1 .. 100 loop
    c(i) := lpad('A',32767,'A');
    end loop;
    end;
    /

    create or replace procedure misha.p2d is
    type t is table of clob index by pls_integer;
    c t;
    begin
    for i in 1 .. 100 loop
    c(i) := lpad('A',32767,'A')||'Z';
    end loop;
    end;
    /

    P2C puts to CLOB exactly 32K, while P2D puts 32K+1. You would be amazed to see the difference in stats if you run these prosedures! For example:

    Type Name P2C P2D Diff
    ----- --------------------- ------------ ------------
    STAT lob writes 1 100 99
    STAT db block gets 42 4,200 4,158

    These 100 LOB writes in P2D definitely prove that LOBs behave differently below and above 32K limit. But back to your original test - I have a feeling that even small LOBs try to "mimic" Varchar2, still there is an overhead of maintaining two-tier LOB structure (i.e. LOB index + LOB data). And that would cause doubling of PGA utilization. But it is just my wild guess, need to run more tests.

    ReplyDelete
  4. While errors are painful... there is a valid survival reason for the existence of pain. Pain lets us know that something is wrong.

    If I have a variable that should be storing Y or N, then declaring it as a VARCHAR2(1) makes sense. If a string with a length greater than one is placed in there, my code will return an error. It *should* return an error because something has gone wrong. Having garbage data get placed in that variable without generating an error is not helpful.

    ReplyDelete
  5. Another aspect is that working with CLOBs and database links, tends to give all kind of problems.

    ReplyDelete

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of work, revamping algorithms, ensuring correctness, you know the score. Then my eyes snagge

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch,

Quick Guide to User-Defined Types in Oracle PL/SQL

A Twitter follower recently asked for more information on user-defined types in the PL/SQL language, and I figured the best way to answer is to offer up this blog post. PL/SQL is a strongly-typed language . Before you can work with a variable or constant, it must be declared with a type (yes, PL/SQL also supports lots of implicit conversions from one type to another, but still, everything must be declared with a type). PL/SQL offers a wide array of pre-defined data types , both in the language natively (such as VARCHAR2, PLS_INTEGER, BOOLEAN, etc.) and in a variety of supplied packages (e.g., the NUMBER_TABLE collection type in the DBMS_SQL package). Data types in PL/SQL can be scalars, such as strings and numbers, or composite (consisting of one or more scalars), such as record types, collection types and object types. You can't really declare your own "user-defined" scalars, though you can define subtypes  from those scalars, which can be very helpful from the p