The Joy of Low Hanging Fruit, Part 1

I publish an article in most Oracle Magazine issues, focusing on the PL/SQL language. I love writing these articles and I enjoy getting feedback from readers, but I also feel constrained by the, well, constraints of the Oracle Magazine format (both in terms of length of the article and also the way we can present code).

So I have decided to complement my articles (some, not necessarily all) with an extended, serialized treatment on my blog. In this first series, I walk through the refactoring of some code that has performance issues, and provide all the code for the initial, intermediate and final forms of the program here.

I hope you find this format interesting and helpful. Please comment!

Struggling to Keep Up

I recently spent a few days with a team of developers at extremememe.info (all names changed to protect the innocent), an up-and-coming Web 3.0 paradigm shifter that analyzes Internet memes, tracks them to their source, and best of all predicts new, near-future memes.

They are very knowledgeable about Oracle and PL/SQL, but as is the case with many developers they have little time to explore new features in technology. They mostly struggle to keep up with the demands of their users. And even when they know about some great new feature, it can be a challenge to convince management to commit the resources to apply those features to stable production code.

We interspersed training on PL/SQL techniques with reviews of their code, and in the process came across a program that updates the status of all their memes, and runs in a daily batch process. Unfortunately, as their data volumes have grown, the time it takes to complete that process was taking longer and longer, currently approaching twenty-three hours. Given how uncomfortably close that was to a fully day, we decided to take a closer look.

I was both dismayed and delighted with what I found (em_update_status0.sql):

Listing 1. The original em_update_status

1 CREATE OR REPLACE PROCEDURE em_update_status

2 IS

3 CURSOR incoming_cur

4 IS

5 SELECT * FROM em_incoming;

7 l_mention em_mentions%ROWTYPE;

8 l_status em_memes.meme_status%TYPE;

9 BEGIN

10 FOR incoming_r IN incoming_cur

11 LOOP

12 BEGIN

13 SAVEPOINT new_transaction;

15 l_mention := em_memes_pkg.unpacked_incoming (incoming_r);

17 INSERT INTO em_mentions (meme_id,

18 source_name,

19 source_details,

20 occurred_on)

21 VALUES (l_mention.meme_id,

22 l_mention.source_name,

23 l_mention.source_details,

24 l_mention.occurred_on);

26 em_memes_pkg.reset_meme_status (l_mention.meme_id,

27 l_status);

29 IF l_status IS NOT NULL

30 THEN

31 UPDATE em_memes

32 SET meme_status = l_status

33 WHERE meme_id = l_mention.meme_id;

34 END IF;

35 EXCEPTION

36 WHEN OTHERS

37 THEN

38 em_memes_pkg.log_error;

39 ROLLBACK TO new_transaction;

40 END;

41 END LOOP;

42 END;

First I will explain some background on references to tables and code, then I will explain the key lines in this procedure.

It works with the following three tables. The columns of those tables have been simplified greatly for purposes of this article.

They have a table for all the memes of which they are aware and for which they track the status (note that setting the default value of the primary key to the next sequential value is a 12.1 feature; in 11.2 and earlier, you should use a trigger to get the same effect - you will find the trigger code in em_memes_setup.sql).

CREATE TABLE em_memes

(

meme_id INTEGER PRIMARY KEY,

meme_name VARCHAR2 (1000) UNIQUE,

discovered_on DATE,

meme_status VARCHAR2 (100)

)

The meme_status might be "VIRAL", "DORMANT", "HOAX", etc.

They have a table for all the "mentions" or references to a meme:

CREATE SEQUENCE em_mentions_seq

CREATE TABLE em_mentions

(

mention_id INTEGER

DEFAULT em_mentions_seq.NEXTVAL

PRIMARY KEY,

meme_id INTEGER REFERENCES em_memes (meme_id),

source_name VARCHAR2 (100),

source_details CLOB,

occurred_on DATE

)

Note: the ability to specify the next value of a sequence in the DEFAULT clause of a table is new to 12.1.

Mentions are loaded from many different sources and they rely on a staging table to collect together data from all sources:

CREATE TABLE em_incoming

(

meme_name VARCHAR2 (1000),

source_name VARCHAR2 (100),

occurred_on DATE,

mention_details CLOB

)

The developers at extremememe.info also use the following em_memes_pkg package to log errors and to perform some critical proprietary computations, based on the content of those tables:

CREATE OR REPLACE PACKAGE em_memes_pkg

TYPE incoming_t IS TABLE OF em_incoming%ROWTYPE;

TYPE mentions_t IS TABLE OF em_mentions%ROWTYPE;

TYPE meme_ids_t IS TABLE OF em_memes.meme_id%TYPE;

PROCEDURE log_error (

error_code_in IN INTEGER DEFAULT SQLCODE,

error_msg_in IN VARCHAR2

DEFAULT DBMS_UTILITY.format_error_stack);

FUNCTION unpacked_incoming (

incoming_in IN em_incoming%ROWTYPE)

RETURN em_mentions%ROWTYPE;

PROCEDURE reset_meme_status (

meme_id_in IN em_memes.meme_id%TYPE,

new_status_out OUT em_memes.meme_status%TYPE);

END;

For reasons of space, we will not provide the package body (whoops, you caught me! I no longer have constraints on "space", but the real reason I won't show you the body is that it is totally uninteresting, just stubs to make sure the package body will compile), but here is what the elements in the specification provide:

em_memes_pkg .incoming_t	A nested table type that contains rows of data from the em_incoming table
em_memes_pkg .mentions_t	A nested table type that contains rows of data from the em_mentions table
em_memes_pkg .meme_ids_t	A nested table type that contains primary keys from the em_memes table
em_memes_pkg log_error	A generic (and typical) error logging mechanism. Defined as an autonomous transaction, it writes out the basic error information available from the DBMS_UTILITY functions, and more to a log table and then commits just that insert.
em_memes_pkg.unpacked_incoming	A function that converts a row of incoming data (whose mention_details column is an XML document that holds the different formats of source information) into a record that can be inserted into the me_mentions table
em_memes_pkg.reset_meme_status	The "heart and soul" of the extremememe.info proprietary process, it analyzes the contents of the mentions table and determines the status of the meme.

Here is an explanation of the key parts of em_update_status (see Listing 1):

Line(s)	Description
10	For each row in the staging table (em_incoming)…
12, 40	Put the entire body of the loop inside its own nested block, so that any exception can be trapped and logged.
13	Set a savepoint for this new "transaction" (add a mention, update the status).
15	Convert the incoming row to a record that can be inserted into the mentions table.
17-24	Insert a single row into the mentions table.
26-27	Compute the new status for the meme, based on the new mention.
29-34	If the status isn't NULL, update the memes table with that status
35-39	If anything goes wrong, log the error and then erase the effects of either/both the insert and update by rolling back to the savepoint.

We now have all the information we need to analyze and fix the em_update_status procedure.

Before I tell you about my solution (and, preferably, before you read the Oracle Magazine article!), however, I'd like to stop and offer you, dear reader, the opportunity to think through the situation yourself:

1. What do you see as the key problems with the program and perhaps more deeply the table design?

2. What would you do to fix the problem(s)?

Why DBMS_OUTPUT.PUT_LINE should not be in your application code

A database developer recently came across my Bulletproof PL/SQL presentation, which includes this slide. That first item in the list caught his attention: Never put calls to DBMS_OUTPUT.PUT_LINE in your application code. So he sent me an email asking why I would say that. Well, I suppose that is the problem with publishing slide decks. All the explanatory verbiage is missing. I suppose maybe I should do a video. :-) But in the meantime, allow me to explain. First, what does DBMS_OUTPUT.PUT_LINE do? It writes text out to a buffer, and when your current PL/SQL block terminates, the buffer is displayed on your screen. [Note: there can be more to it than that. For example, you could in your own code call DBMS_OUTPUT.GET_LINE(S) to get the contents of the buffer and do something with it, but I will keep things simple right now.] Second, if I am telling you not to use this built-in, how could text from your program be displayed on your screen? Not without a lot o...

Obsessed with Oracle PL/SQL

Search This Blog

The Joy of Low Hanging Fruit, Part 1

Comments

Post a Comment

Popular posts from this blog

Running out of PGA memory with MULTISET ops? Watch out for DISTINCT!

How to Pick the Limit for BULK COLLECT

Why DBMS_OUTPUT.PUT_LINE should not be in your application code