Skip to main content

Find duplicate SQL statements with PL/Scope in 12.2

PL/Scope is a compiler tool that gathers information about identifiers (as of 11.1) and SQL statements (as of 12.2) in your PL/SQL code.

You can do all sorts of amazing deep-dive analysis of your code with PL/Scope, answering questions like:
  • Where is a variable assigned a value in a program?
  • What variables are declared inside a given program?
  • Which programs call another program (that is, you can get down to a subprogram in a package)?
  • Find the type of a variable from its declaration.
  • Show where specific columns are referenced
  • Find all program units performing specific DML operations on table (and help you consolidate such statements)
  • Locate all SQL statements containing hints
  • Find all dynamic SQL usages – ideal for getting rid of SQL injection vulnerabilities
  • Show all locations in your code where you commit or rollback
And my latest favorite: Locate multiple appearances of same "canonical" SQL statement.

What does this mean and why does it matter? One great feature of PL/SQL is that the PL/SQL compiler automatically "canonicalizes" all static SQL statements in your code. This means that it upper-cases all keywords, removes extraneous white space, and so on. It standardizes the format of your SQL statements.

This standardization is important because it reduces the number of times that Oracle Database will "hard parse" your SQL statement when it is executed. That's because standardization of format raises the likelihood that SQL statements which "look" different (different case, line breaks, spaces) but are actually the same "under the surface".

So canonicalization of SQL can improve performance.

Now on to another benefit gained from this process: PL/Scope compares all the canonicalized SQL statements and assigns the same SQL_ID to matching statements. Consider the following two statements.

I turn on PL/Scope to gather both identifier and statement information. Then I compile two procedures. Clearly, they were written by two different developers on my team, with their formatting and naming standards. Sigh....welcome to the real world, right?

ALTER SESSION SET plscope_settings='identifiers:all, statements:all'
/

CREATE OR REPLACE PROCEDURE p1 (p_id NUMBER, p_name OUT VARCHAR2)
IS
BEGIN
   SELECT 
          last_name
     INTO 
          p_name
     FROM 
          employees
    WHERE    
          employee_id = p_id;
END;
/

CREATE OR REPLACE PROCEDURE p2 (id_in NUMBER, name_out OUT VARCHAR2)
IS
BEGIN
   SELECT last_name
     INTO name_out
     FROM EMPLOYEES
    WHERE employee_id = id_in;
END;
/

Now let's analyze the PL/Scope data:

  SELECT signature, sql_id, text
    FROM all_statements
   WHERE object_name IN ('P1', 'P2')
ORDER BY line, col
/

517C7D44CC74C7BD752899158B277868 641rpxvq1qu8n 
SELECT LAST_NAME FROM EMPLOYEES WHERE EMPLOYEE_ID = :B1 
      
DFD0209075761780F18552DE6661B4E7 641rpxvq1qu8n 
SELECT LAST_NAME FROM EMPLOYEES WHERE EMPLOYEE_ID = :B1 

Brilliant! The signatures are different (no big surprise there; that's a value generated by PL/Scope that is guaranteed to be unique across all statements and identifiers). But notice that the SQL_IDs are the same - and the SQL statements are the same, too.

There you see the canonicalization at work.

Why I am so excited about this? One of the worst things you can do in your code is repeat stuff: copy/paste algorithms, magic values...and, most damaging of all, SQL statements.

If you repeat the same SQL statements across your application, it is much harder to optimize and maintain that code.

Now, with PL/Scope 12.2, we have an awesome and easy-to-use tool at our disposal to identify all duplicates of SQL. We can then decide which of those should be moved into functions (SELECTs) or procedures (non-query DML), so that the subprogram can be invoked in multiple places, and the SQL can be managed in one place.

Here's a query that tells you precisely where duplication of SQL occurs:

SELECT owner,
       object_name,
       line,
       text
  FROM all_statements
 WHERE sql_id IN (  SELECT sql_id
                      FROM all_statements
                     WHERE sql_id IS NOT NULL
                  GROUP BY sql_id
                    HAVING COUNT (*) > 1)
/

Cool stuff.

Try it out at LiveSQL, our free, 24x7 playground for Oracle Database 12c Release 2 SQL and PL/SQL (and a code library).

And check out the extensive doc on PL/Scope, with lots of examples and insights.

Comments

  1. Hello,
    I read your blog and i found it very interesting for me and i hope you will post like this. I am very thankful to you for these blog.
    Thank You.

    ReplyDelete

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).  All was going well when they hit the dreaded: ORA-04030: out of process memory when trying to allocate 2032 bytes  They asked for my help.  The error occurred on this SELECT: SELECT  *    FROM header_tab trx    WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)       AND ((trx.column_ntab MULTISET             EXCEPT DISTINCT generated_ntab2) IS EMPTY) The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality. We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of work, revamping algorithms, ensuring correctness, you know the score. Then my eyes snagge

How to Pick the Limit for BULK COLLECT

This question rolled into my In Box today: In the case of using the LIMIT clause of BULK COLLECT, how do we decide what value to use for the limit? First I give the quick answer, then I provide support for that answer Quick Answer Start with 100. That's the default (and only) setting for cursor FOR loop optimizations. It offers a sweet spot of improved performance over row-by-row and not-too-much PGA memory consumption. Test to see if that's fast enough (likely will be for many cases). If not, try higher values until you reach the performance level you need - and you are not consuming too much PGA memory.  Don't hard-code the limit value: make it a parameter to your subprogram or a constant in a package specification. Don't put anything in the collection you don't need. [from Giulio Dottorini] Remember: each session that runs this code will use that amount of memory. Background When you use BULK COLLECT, you retrieve more than row with each fetch,

Quick Guide to User-Defined Types in Oracle PL/SQL

A Twitter follower recently asked for more information on user-defined types in the PL/SQL language, and I figured the best way to answer is to offer up this blog post. PL/SQL is a strongly-typed language . Before you can work with a variable or constant, it must be declared with a type (yes, PL/SQL also supports lots of implicit conversions from one type to another, but still, everything must be declared with a type). PL/SQL offers a wide array of pre-defined data types , both in the language natively (such as VARCHAR2, PLS_INTEGER, BOOLEAN, etc.) and in a variety of supplied packages (e.g., the NUMBER_TABLE collection type in the DBMS_SQL package). Data types in PL/SQL can be scalars, such as strings and numbers, or composite (consisting of one or more scalars), such as record types, collection types and object types. You can't really declare your own "user-defined" scalars, though you can define subtypes  from those scalars, which can be very helpful from the p