Skip to main content

Is it time to always declare string variables as CLOBs and forget about VARCHAR2?

I received this email from a longtime Oracle Database developer:
Why shouldn't we always use clobs instead of varchar2 in PL/SQL? Memory will be allocated once used so there should not be any overhead always to use clobs instead of a varchar2. There must be a reason, however I'm unable to come up with a explanation. If it is true,  you might say the varchar2 is deprecated, ;-)
I had to laugh a bit when I saw this - not because I thought his question was funny or silly, but because it reminded me of what I sometimes say in my trainings.

I ask students:

Why don't we always declare all string variables as:

my_variable VARCHAR2(32767);

That way, we don't have to worry about a VALUE_ERROR exception if the string gets "too long". And since memory is allocated dynamically (for lengths above 4000), it won't cause excessive memory consumption.

And my answer(s) to this question are:

  1. This declaration - especially appearing all over the place - will raise questions in a developer's mind, including: "Will my strings get that large? Really?" and "What was wrong with the person who wrote this code?" Your code should answer not raise questions.
  2. You are assuming the maximum length of a VARCHAR2 will always be 32767. Maybe someday the maximum length will go up. Maybe someday there will be no difference between VARCHAR2 and CLOBs!
And now a developer wants to know: is there a difference? Maybe it's time to go all-out-CLOBBERS!

Or maybe not. It's true that you can now use many of the typical string functions on CLOBs, such as SUBSTR. You are no longer forced to use DBMS_LOB (though that can still be very helpful). But is there really no difference "under the covers"? Is there no penalty to be paid by switching to CLOBs?

Connor McDonald, a member of my Developer Advocates team and one half of the Ask Tom Answer Team, offers these demonstrations of some of the big differences between VARCHAR2 and CLOB:

SQL> create or replace
  2  procedure p2 is
  3   c clob;
  4  begin
  5    for i in 1 .. 1000000 loop
  6      c := rpad(i,100);
  7    end loop;
  8  end;
  9  /

Procedure created.

SQL> set timing on
SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.07

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.06

SQL> create or replace
  2  procedure p2 is
  3   c varchar2(200);
  4  begin
  5    for i in 1 .. 1000000 loop
  6      c := rpad(i,100);
  7    end loop;
  8  end;
  9  /

Procedure created.

SQL> set timing on
SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.29

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.30

SQL> exec p2

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.30

And similarly, there's some significant memory allocation differences:

SQL> create or replace
  2  procedure p2a is
  3    type t is table of clob index by pls_integer;
  4    c t;
  5  begin
  6    for i in 1 .. 100000 loop
  7      c(i) := 'qweqweqweqweqw';
  8    end loop;
  9  end;
10  /

Procedure created.

SQL> create or replace
  2  procedure p2b is
  3    type t is table of varchar2(50) index by pls_integer;
  4    c t;
  5  begin
  6    for i in 1 .. 100000 loop
  7      c(i) := 'qweqweqweqweqw';
  8    end loop;
  9  end;
10  /

Procedure created.

SQL> conn mcdonac/*****
Connected.

SQL> exec p2a

PL/SQL procedure successfully completed.

SQL> @stat
Enter value for stat_prefix: pga

SID_STAT# NAME                                                    VALUE
---------- ----------------------------------- ----------
        37 session pga memory                  2360216
        38 session pga memory max              22610840

SQL> conn mcdonac/*****
Connected.

SQL> exec p2b

PL/SQL procedure successfully completed.

SQL> @stat
Enter value for stat_prefix: pga

SID_STAT# NAME                                                    VALUE
---------- ----------------------------------- ----------
        37 session pga memory                  918424
             38 session pga memory max              10486680 

So I would say the answer is pretty clear: only use CLOBs when you are working with strings whose length are greater than 32767.

The time to declare VARCHAR2s "deprecated" has not yet arrived!

And I bet my readers can offer lots of other reasons/demonstrations of the advantages of sticking to VARCHAR2. Well? Well? :-)

Comments

  1. Steven:

    To my mind, the best reason for not using varchar2(max) or clobs everywhere is the the length limit on the variable is really a constraint. A large percentage of the time in PL/SQL code, the varchar2 variable is, ultimately, going to be inserted into a column (hopefully) with a constrained length, so the variable declaration should match the column definition.

    John

    ReplyDelete
  2. OCI applications require an additional call to fetch the CLOB value. So there is a client side,and network IO performance impact as well.

    ReplyDelete
  3. The key thing to remember about CLOB variables is that they are NOT really variables. Although, as far as i understand, Oracle PL/SQL engine "cheats" and treats CLOBs below 32K as pseudo-VARCHAR2s.

    Under normal conditions, temporary LOBs should be internally stored in TEMP tablespace. As a result, all LOB operations cause physical IO. To illustrate it let's take your original example and slightly change:

    create or replace procedure misha.p2c is
    type t is table of clob index by pls_integer;
    c t;
    begin
    for i in 1 .. 100 loop
    c(i) := lpad('A',32767,'A');
    end loop;
    end;
    /

    create or replace procedure misha.p2d is
    type t is table of clob index by pls_integer;
    c t;
    begin
    for i in 1 .. 100 loop
    c(i) := lpad('A',32767,'A')||'Z';
    end loop;
    end;
    /

    P2C puts to CLOB exactly 32K, while P2D puts 32K+1. You would be amazed to see the difference in stats if you run these prosedures! For example:

    Type Name P2C P2D Diff
    ----- --------------------- ------------ ------------
    STAT lob writes 1 100 99
    STAT db block gets 42 4,200 4,158

    These 100 LOB writes in P2D definitely prove that LOBs behave differently below and above 32K limit. But back to your original test - I have a feeling that even small LOBs try to "mimic" Varchar2, still there is an overhead of maintaining two-tier LOB structure (i.e. LOB index + LOB data). And that would cause doubling of PGA utilization. But it is just my wild guess, need to run more tests.

    ReplyDelete
  4. While errors are painful... there is a valid survival reason for the existence of pain. Pain lets us know that something is wrong.

    If I have a variable that should be storing Y or N, then declaring it as a VARCHAR2(1) makes sense. If a string with a length greater than one is placed in there, my code will return an error. It *should* return an error because something has gone wrong. Having garbage data get placed in that variable without generating an error is not helpful.

    ReplyDelete
  5. Another aspect is that working with CLOBs and database links, tends to give all kind of problems.

    ReplyDelete

Post a Comment

Popular posts from this blog

Quick Guide to User-Defined Types in Oracle PL/SQL

A Twitter follower recently asked for more information on user-defined types in the PL/SQL language, and I figured the best way to answer is to offer up this blog post. PL/SQL is a strongly-typed language . Before you can work with a variable or constant, it must be declared with a type (yes, PL/SQL also supports lots of implicit conversions from one type to another, but still, everything must be declared with a type). PL/SQL offers a wide array of pre-defined data types , both in the language natively (such as VARCHAR2, PLS_INTEGER, BOOLEAN, etc.) and in a variety of supplied packages (e.g., the NUMBER_TABLE collection type in the DBMS_SQL package). Data types in PL/SQL can be scalars, such as strings and numbers, or composite (consisting of one or more scalars), such as record types, collection types and object types. You can't really declare your own "user-defined" scalars, though you can define subtypes  from those scalars, which can be very helpful from the p

The differences between deterministic and result cache features

 EVERY once in a while, a developer gets in touch with a question like this: I am confused about the exact difference between deterministic and result_cache. Do they have different application use cases? I have used deterministic feature in many functions which retrieve data from some lookup tables. Is it essential to replace these 'deterministic' key words with 'result_cache'?  So I thought I'd write a post about the differences between these two features. But first, let's make sure we all understand what it means for a function to be  deterministic. From Wikipedia : In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.  Another way of putting this is that a deterministic subprogram (procedure or function) has no side-effects. If you pass a certain set of arguments for the parameters, you will always get

My two favorite APEX 5 features: Regional Display Selector and Cards

We (the over-sized development team for the PL/SQL Challenge - myself and my son, Eli) have been busy creating a new website on top of the PLCH platform (tables and packages): The Oracle Dev Gym! In a few short months (and just a part time involvement by yours truly), we have leveraged Oracle Application Express 5 to create what I think is an elegant, easy-to-use site that our users will absolutely love.  We plan to initially make the Dev Gym available only for current users of PL/SQL Challenge, so we can get feedback from our loyal user base. We will make the necessary adjustments and then offer it for general availability later this year. Anyway, more on that as the date approaches (the date being June 27, the APEX Open Mic Night at Kscope16 , where I will present it to a packed room of APEX experts). What I want to talk about today are two features of APEX that are making me so happy these days: Regional Display Selector and Cards. Regional Display Sel