Thursday, July 23, 2015

Thoughts on PL/SQL, code sharing, and code generation

[June 2016 update: we have now launched a community code generation project: oddgen - for Oracle Data Dictionary Generator. More information available here. Join the project!]

[While I am sure you will be BLOWN AWAY by what I write below, just in case you are already convinced of the need for a flexible code generator for PL/SQL, skip to "Needed: Flexible Code Generator for PL/SQL"]

If you have spent much time in the PL/SQL world, you will have noticed some unusual aspects:

  • Almost no third party libraries or utilities (prominent exceptions: utPLSQL, Logger, Alexandria Library)
  • Minimal code sharing
  • Most applications built from scratch
The cause(s) of this odd reality (odd in comparison to, say, the way JavaScript apps are built)? Likely several, including:
  • Management concerns about putting "someone else's code" in their database
  • PL/SQL community well established before the days of open source, so no tradition of sharing, of free code, etc.
  • How much code can actually be shared for PL/SQL-based applications?
That last point/question is what I'd like to explore a bit in this post.

It's easy to shake our heads at the "sad state of affairs" when it comes to sharing of code in the PL/SQL world, when it comes especially to thinking about the thousands of PL/SQL teams around the world who, again and again, built an application almost from the ground up with minimal code re-use.

It's easy to conclude: "If only those developers knew better." or "If only managers could be less suspicious of open source." and so on.

But I am thinking more and more lately that the reality of minimal code sharing in our community may more to do with the fundamental nature of PL/SQL (and by extension any other database programming language, such as Transact-SQL).

PL/SQL is not a general purpose programming language. 

It is purpose-built to "extend" the SQL set-oriented, declarative data access language with procedural constructs. 

It is optimized both in terms of performance and syntax to allow programmers to easily (relatively speaking, anyway) build secure, high performance and maintainable applications on top of Oracle Database.

So PL/SQL is tightly tied to SQL, which means it is tightly tied to the tables used in each application. Which means that a large percentage of the code you write/need in PL/SQL will be specific to a given set of tables and columns - and those tables change for each application, usually by a whole lot

From this perspective, it makes more sense that there would be relatively little reuse and code sharing. Reuse is driven by patterns: we write the same code over and over, so, well, why not use the same code over and over, right?

Yes and no.

There's no shortage of patterns in PL/SQL code, but most of them have to do with patterns of SQL operations inside a PL/SQL subprogram. Heck, you could even consider any DML statement as a kind of template or pattern in and of itself (with the infamous "TAPI" - table API - a collection of all such templates). But what varies? A lot: the table names, the column names, the WHERE clause predicates, when you need to fetch all rows for a given value (foreign keys), the conditions under which you want to fetch just one row for a given value (primary keys), etc.

Lots of patterns, but little opportunity for "out of the box" reuse.

I am coming around to seeing that this fundamental reality of PL/SQL as a language designed to serve as an interface layer on top of SQL statements is the biggest "obstacle" to code reuse, and building PL/SQL applications on an extensive foundation/framework of code.

How about you? Do you see a flaw in my thinking?

Because if you do not, you might end up in the same place I find myself:

If code sharing / reuse may not be terribly applicable to PL/SQL, then conversely code generation is way more applicable. 

With code generation, you start from a generic template and then you apply that template to a particular database object, such as a table or view or package. The result is generated code that is specific to that object, but follows the pattern.

tapigen2 offers a fine example of such a code generator. In this utility, Oscar has put the template inside his generator, a part of which you can see here:


CREATE OR REPLACE PACKAGE BODY tapi_gen2
AS
   --Global private variables
   g_spec_template   VARCHAR2 (32767);
   g_body_template   CLOB;
   g_vars            teplsql.t_assoc_array;
   g_unque_key       dbo_name_t;


   PROCEDURE load_spec_template
   AS
   BEGIN
      --Spec Template
      g_spec_template :=
         q'^CREATE OR REPLACE PACKAGE tapi_${table_name}
IS
   /**
   -- # TAPI_${table_name}
   -- Generated by: tapiGen2 - DO NOT MODIFY!
   -- Website: github.com/osalvador/tapiGen2
   -- Created On: ${date}
   -- Created By: ${user}
   */


Then when I run the generator, say against the classic HR employees table, I get this and much more:


create or replace PACKAGE tapi_employees
IS
   /**
   -- # TAPI_employees
   -- Generated by: tapiGen2 - DO NOT MODIFY!
   -- Website: github.com/osalvador/tapiGen2
   -- Created On: 21-JUL-2015 12:56
   -- Created By: HR
   */
   --Scalar/Column types
   SUBTYPE hash_t IS varchar2 (40);   
   SUBTYPE employee_id IS employees.employee_id%TYPE;
   SUBTYPE first_name IS employees.first_name%TYPE;
   SUBTYPE last_name IS employees.last_name%TYPE;
   SUBTYPE email IS employees.email%TYPE;
   SUBTYPE phone_number IS employees.phone_number%TYPE;
   SUBTYPE hire_date IS employees.hire_date%TYPE;
   SUBTYPE job_id IS employees.job_id%TYPE;
   SUBTYPE salary IS employees.salary%TYPE;
   SUBTYPE commission_pct IS employees.commission_pct%TYPE;
   SUBTYPE manager_id IS employees.manager_id%TYPE;
   SUBTYPE department_id IS employees.department_id%TYPE;
   ...   

I have a long and very mixed (in terms of success) history with code generators. I've built several TAPI generators over the years, but reached my pinnacle of code generation craziness with a tool called QNXO (quality in, excellence out - terrible name with nary a single hit in Google, and I still got sued for copyright violation. :-) ). 

Later rebranded as the Quest CodeGen Utility (and no longer available), this utility was a design pattern factory. Built around a Code Generation Markup Language (that I designed, and was therefore kind of hokey), you can define whatever pattern you wanted, for whatever target "language" you wanted (generate a PL/SQL TAPI, a Java class, HTML doc, whatever), then point it at a database object and voila! lots of really cool code.

It also offered a bewildering amount of flexibility (specify your naming standards, for example) that made it hard to use. It didn't very far, but I still think it is a great idea.

And I want to resurrect that idea again, but see if we, the PL/SQL community, can be a much better job implementing it. Because I think that this is:


Needed: Flexible Code Generator for PL/SQL

I am convinced that a code generator with all the right features, with just enough but not too much flexibility, could make a powerful difference in the lives of many PL/SQL developers. You would be able to build your applications faster, relying on high quality code that is, in essence, pre-tested. 

You'd still have to write lots of bespoke, custom code, sure - but a much larger percentage of your code base than ever before could be generated - and re-generated as needed (or simply use the generated code as a starting point and modify from there).

I am also convinced that this effort should from the start be an open source project, primarily driven by PL/SQL developers, not Oracle employees (though I and others would certainly help). 

Here are what I see as the key features of such a utility:
  • (external) Template driven - the template (using what language? that is something to be decided. XML? Angular expressions? A variant on my old CGML? Hmmm....) would be separate from the code generator. 
  • Template libraries - anyone can build their own template, share it with others or keep it to themselves. 
  • Flexible naming conventions - in my experience, if developers cannot specify utilization of their own naming conventions, they will resist using the generated code.
I don't think this will be all that hard to do, especially if it is built incrementally and we avoid scope creep. With the right tempting language, most of the complexity could be "buried" in the templates themselves.

Have I convinced (any of) you? Are you also enthused about code generation and see the potential in a database programming language context? Would you like to help on this project (a little, a lot, doesn't matter - every bit helps!)? I figure we need at least two people who would like to take a leadership role, set up the project, and push it along. I will help, but I think I should take a back seat (but not very far back) for a number of reasons.

Well, I'd love to read your comments below, or send an email to steven.feuerstein@oracle.com. I will compile a list of interested parties and organize a conference call to discuss this further. 






6 comments:

  1. I still use CodeGen from time to time. I think the underlying PL/GEN 'engine' is a good starting point for the Flexible Code Generator (like the name already). The nice thing about CGML and CodeGen was (and is) that you can not only generate PL/SQL code, but in fact any code that is ASCII based. Maybe you have the sourcecode for the front-end of CodeGen lying around somewhere? I absolutely have the source code of the backend lying somewhere on my harddisks (where the most unseamingly things dwell ;-)).
    I know I am interested to help out.

    ReplyDelete
  2. Hi Steven,

    I agree with what you write in your post and the key features of this new tool I think some of them are covered:

    - Template Driven: tapiGen2 uses tePLSQL (https://github.com/osalvador/tePLSQL) another open source project that I created to process templates with any text output (PLSQL, HTML, Java ...) with the advantage of not having to learn any other language, everything is PLSQL.

    - Template Libraries: tePLSQL receives a template as CLOB parameter. Therefore the templates can be seamlessly shared, modified ...

    - Flexible Naming conventions: If processed is a template, naming conventions can be modified directly in the template for each programmer.

    If we refer to a simple scaffolding I think tePLSQL is enough.

    In tapiGen2 I decided to insert the template in the package body to simplify its creation and to share, but perfectly the template can be out of it. tapiGen2 has much own code that must be accessed from the template (get table primary keys, retrieve the names of the columns and types...) and so is a package that uses tePLSQL.

    In short and in my humble opinion I think that introducing too much logic in the template will make it very difficult to change and thus to share. So I think the templates must be shared by their respective packets containing the logic that the template needs, View-Model?

    Or maybe you're thinking of something more sophisticated?

    ReplyDelete
  3. Hello,

    I was searching for such utility, tried some.
    Now, I'm using FMPP (FreeMarker-based file PreProcessor) for my templates, PL/SQL as well as other languages.
    I'va built a simple and stupid UI to call FMPP templates and pass parameters.
    It's quite powerful, extendable (Java), the FreeMarker syntax can sometimes be a bit strange (XML like).

    for example, a package function, enter :
    function do_something(as_var, an_num) return number

    and it generates:
    -- --------------------------------------------------------------------------
    -- do_something
    -- Comment
    --
    -- Args:
    -- as_var IN Description
    -- an_num IN Description
    --
    -- Returns NUMBER:
    -- < 0 Error
    -- = 0 No action
    -- > 0 Success
    -- --------------------------------------------------------------------------
    FUNCTION do_something (
    as_var IN VARCHAR2,
    an_num IN NUMBER
    ) RETURN NUMBER IS
    BEGIN
    RETURN 1;
    END do_something;

    ReplyDelete
  4. Thanks for your comments and interest! I also received a half dozen emails. I will follow up next week with a next step.

    @salva, thanks for clarifying the architecture of tapigen2!

    @patrick, CodeGen and CGML are Dell property. Not that they are utilizing it. I have made an inquiry or two about "liberating" the code as open source for others to use, but haven't gotten anywhere. I don't think there's a future there and honestly CGML (or at least the generator engine) has some serious issues. I know. I built it. :-)

    ReplyDelete
    Replies
    1. CGML is not Dell property, is it? It is also used in PL/GEN and afaik PL/GEN is open source or at least license free. And, since the code is not wrapped we can still use the generator engine as a nice reference instead of reinventing the wheel again. If we can come up with something similar (but not the same) as CGML we have a great starting point. I don't know if you have any docs (other than what's in the code) for us to read though and find the flaws.

      Delete
  5. I will be organizing a conference call next week on this project. If you have not emailed me (and I don't already have your email address), please do get in touch at steven.feuerstein@oracle.com.

    ReplyDelete