Tuesday, August 9, 2016

Is it time to always declare string variables as CLOBs and forget about VARCHAR2?

I received this email from a longtime Oracle Database developer:
Why shouldn't we always use clobs instead of varchar2 in PL/SQL? Memory will be allocated once used so there should not be any overhead always to use clobs instead of a varchar2. There must be a reason, however I'm unable to come up with a explanation. If it is true,  you might say the varchar2 is deprecated, ;-)
I had to laugh a bit when I saw this - not because I thought his question was funny or silly, but because it reminded me of what I sometimes say in my trainings.

I ask students:

Why don't we always declare all string variables as:

my_variable VARCHAR2(32767);

That way, we don't have to worry about a VALUE_ERROR exception if the string gets "too long". And since memory is allocated dynamically (for lengths above 4000), it won't cause excessive memory consumption.

And my answer(s) to this question are:

  1. This declaration - especially appearing all over the place - will raise questions in a developer's mind, including: "Will my strings get that large? Really?" and "What was wrong with the person who wrote this code?" Your code should answer not raise questions.
  2. You are assuming the maximum length of a VARCHAR2 will always be 32767. Maybe someday the maximum length will go up. Maybe someday there will be no difference between VARCHAR2 and CLOBs!
And now a developer wants to know: is there a difference? Maybe it's time to go all-out-CLOBBERS!

Or maybe not. It's true that you can now use many of the typical string functions on CLOBs, such as SUBSTR. You are no longer forced to use DBMS_LOB (though that can still be very helpful). But is there really no difference "under the covers"? Is there no penalty to be paid by switching to CLOBs?

Connor McDonald, a member of my Developer Advocates team and one half of the Ask Tom Answer Team, offers these demonstrations of some of the big differences between VARCHAR2 and CLOB:

SQL> create or replace
  2  procedure p2 is
  3   c clob;
  4  begin
  5    for i in 1 .. 1000000 loop
  6      c := rpad(i,100);
  7    end loop;
  8  end;
  9  /

Procedure created.

SQL> set timing on
SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.07

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.06

SQL> create or replace
  2  procedure p2 is
  3   c varchar2(200);
  4  begin
  5    for i in 1 .. 1000000 loop
  6      c := rpad(i,100);
  7    end loop;
  8  end;
  9  /

Procedure created.

SQL> set timing on
SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.29

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.30

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.30

And similarly, there's some significant memory allocation differences:

SQL> create or replace
  2  procedure p2a is
  3    type t is table of clob index by pls_integer;
  4    c t;
  5  begin
  6    for i in 1 .. 100000 loop
  7      c(i) := 'qweqweqweqweqw';
  8    end loop;
  9  end;
10  /

Procedure created.

SQL> create or replace
  2  procedure p2b is
  3    type t is table of varchar2(50) index by pls_integer;
  4    c t;
  5  begin
  6    for i in 1 .. 100000 loop
  7      c(i) := 'qweqweqweqweqw';
  8    end loop;
  9  end;
10  /

Procedure created.

SQL> conn mcdonac/*****
Connected.

SQL> exec p2a

PL/SQL procedure successfully completed.

SQL> @stat
Enter value for stat_prefix: pga

SID_STAT# NAME                                                    VALUE
---------- ----------------------------------- ----------
        37 session pga memory                  2360216
        38 session pga memory max              22610840

SQL> conn mcdonac/*****
Connected.

SQL> exec p2b

PL/SQL procedure successfully completed.

SQL> @stat
Enter value for stat_prefix: pga

SID_STAT# NAME                                                    VALUE
---------- ----------------------------------- ----------
        37 session pga memory                  918424
             38 session pga memory max              10486680 

So I would say the answer is pretty clear: only use CLOBs when you are working with strings whose length are greater than 32767.

The time to declare VARCHAR2s "deprecated" has not yet arrived!

And I bet my readers can offer lots of other reasons/demonstrations of the advantages of sticking to VARCHAR2. Well? Well? :-)

5 comments:

  1. Steven:

    To my mind, the best reason for not using varchar2(max) or clobs everywhere is the the length limit on the variable is really a constraint. A large percentage of the time in PL/SQL code, the varchar2 variable is, ultimately, going to be inserted into a column (hopefully) with a constrained length, so the variable declaration should match the column definition.

    John

    ReplyDelete
  2. OCI applications require an additional call to fetch the CLOB value. So there is a client side,and network IO performance impact as well.

    ReplyDelete
  3. The key thing to remember about CLOB variables is that they are NOT really variables. Although, as far as i understand, Oracle PL/SQL engine "cheats" and treats CLOBs below 32K as pseudo-VARCHAR2s.

    Under normal conditions, temporary LOBs should be internally stored in TEMP tablespace. As a result, all LOB operations cause physical IO. To illustrate it let's take your original example and slightly change:

    create or replace procedure misha.p2c is
    type t is table of clob index by pls_integer;
    c t;
    begin
    for i in 1 .. 100 loop
    c(i) := lpad('A',32767,'A');
    end loop;
    end;
    /

    create or replace procedure misha.p2d is
    type t is table of clob index by pls_integer;
    c t;
    begin
    for i in 1 .. 100 loop
    c(i) := lpad('A',32767,'A')||'Z';
    end loop;
    end;
    /

    P2C puts to CLOB exactly 32K, while P2D puts 32K+1. You would be amazed to see the difference in stats if you run these prosedures! For example:

    Type Name P2C P2D Diff
    ----- --------------------- ------------ ------------
    STAT lob writes 1 100 99
    STAT db block gets 42 4,200 4,158

    These 100 LOB writes in P2D definitely prove that LOBs behave differently below and above 32K limit. But back to your original test - I have a feeling that even small LOBs try to "mimic" Varchar2, still there is an overhead of maintaining two-tier LOB structure (i.e. LOB index + LOB data). And that would cause doubling of PGA utilization. But it is just my wild guess, need to run more tests.

    ReplyDelete
  4. While errors are painful... there is a valid survival reason for the existence of pain. Pain lets us know that something is wrong.

    If I have a variable that should be storing Y or N, then declaring it as a VARCHAR2(1) makes sense. If a string with a length greater than one is placed in there, my code will return an error. It *should* return an error because something has gone wrong. Having garbage data get placed in that variable without generating an error is not helpful.

    ReplyDelete
  5. Another aspect is that working with CLOBs and database links, tends to give all kind of problems.

    ReplyDelete