I publish an article in most Oracle Magazine issues,
focusing on the PL/SQL language. I love writing these articles and I enjoy
getting feedback from readers, but I also feel constrained by the, well, constraints of the Oracle Magazine
format (both in terms of length of the article and also the way we can present
code).
So I have decided to complement my articles (some, not necessarily all) with an extended, serialized treatment on my blog. In this first series, I walk through the refactoring of some code that has performance issues, and provide all the code for the initial, intermediate and final forms of the program here.
So I have decided to complement my articles (some, not necessarily all) with an extended, serialized treatment on my blog. In this first series, I walk through the refactoring of some code that has performance issues, and provide all the code for the initial, intermediate and final forms of the program here.
I hope you find this format interesting and helpful. Please comment!
Struggling to Keep Up
I recently spent a few days with a team of developers at
extremememe.info (all names changed to protect the innocent), an up-and-coming
Web 3.0 paradigm shifter that analyzes Internet memes, tracks them to their
source, and best of all predicts new, near-future memes.
They are very knowledgeable about Oracle and PL/SQL, but as
is the case with many developers they have little time to explore new features
in technology. They mostly struggle to keep up with the demands of their users.
And even when they know about some great new feature, it can be a challenge to
convince management to commit the resources to apply those features to stable
production code.
We interspersed training on PL/SQL techniques with reviews
of their code, and in the process came across a program that updates the status
of all their memes, and runs in a daily batch process. Unfortunately, as their
data volumes have grown, the time it takes to complete that process was taking
longer and longer, currently approaching twenty-three hours. Given how
uncomfortably close that was to a fully day, we decided to take a closer look.
I was both dismayed and delighted with what I found (em_update_status0.sql):
Listing 1. The original em_update_status
1
CREATE OR REPLACE PROCEDURE em_update_status
2 IS
3
CURSOR incoming_cur
4
IS
5
SELECT * FROM em_incoming;
6
7
l_mention em_mentions%ROWTYPE;
8
l_status
em_memes.meme_status%TYPE;
9
BEGIN
10
FOR incoming_r IN incoming_cur
11 LOOP
12
BEGIN
13
SAVEPOINT new_transaction;
14
15
l_mention := em_memes_pkg.unpacked_incoming (incoming_r);
16
17
INSERT INTO em_mentions (meme_id,
18
source_name,
19
source_details,
20 occurred_on)
21 VALUES (l_mention.meme_id,
22 l_mention.source_name,
23
l_mention.source_details,
24
l_mention.occurred_on);
25
26
em_memes_pkg.reset_meme_status (l_mention.meme_id,
27
l_status);
28
29
IF l_status IS NOT NULL
30
THEN
31
UPDATE em_memes
32 SET meme_status = l_status
33 WHERE meme_id =
l_mention.meme_id;
34
END IF;
35
EXCEPTION
36
WHEN OTHERS
37
THEN
38
em_memes_pkg.log_error;
39
ROLLBACK TO new_transaction;
40
END;
41
END LOOP;
42 END;
First I will explain some background on references to tables
and code, then I will explain the key lines in this procedure.
It works with the following three tables. The columns of those tables have been simplified greatly
for purposes of this article.
They have a table for all the memes of which they are aware
and for which they track the status (note that setting the default value of the
primary key to the next sequential value is a 12.1 feature; in 11.2 and
earlier, you should use a trigger to get the same effect - you will find the
trigger code in em_memes_setup.sql).
CREATE TABLE em_memes
(
meme_id INTEGER PRIMARY KEY,
meme_name VARCHAR2 (1000) UNIQUE,
discovered_on DATE,
meme_status VARCHAR2 (100)
)
/
The meme_status might be "VIRAL",
"DORMANT", "HOAX", etc.
They have a table for all the "mentions" or
references to a meme:
CREATE SEQUENCE em_mentions_seq
/
CREATE TABLE em_mentions
(
mention_id INTEGER
DEFAULT em_mentions_seq.NEXTVAL
PRIMARY KEY,
meme_id INTEGER REFERENCES em_memes (meme_id),
source_name VARCHAR2 (100),
source_details CLOB,
occurred_on DATE
)
/
Note: the ability to specify the next value of a sequence in
the DEFAULT clause of a table is new to 12.1.
Mentions are loaded from many different sources and they
rely on a staging table to collect together data from all sources:
CREATE TABLE em_incoming
(
meme_name VARCHAR2 (1000),
source_name VARCHAR2 (100),
occurred_on DATE,
mention_details CLOB
)
/
The developers at extremememe.info also use the
following em_memes_pkg package to log errors and to perform some critical
proprietary computations, based on the content of those tables:
CREATE OR REPLACE PACKAGE em_memes_pkg
IS
TYPE incoming_t IS TABLE OF em_incoming%ROWTYPE;
TYPE mentions_t IS TABLE OF em_mentions%ROWTYPE;
TYPE meme_ids_t IS TABLE OF em_memes.meme_id%TYPE;
PROCEDURE log_error (
error_code_in IN INTEGER DEFAULT SQLCODE,
error_msg_in IN VARCHAR2
DEFAULT DBMS_UTILITY.format_error_stack);
FUNCTION unpacked_incoming (
incoming_in IN em_incoming%ROWTYPE)
RETURN em_mentions%ROWTYPE;
PROCEDURE reset_meme_status (
meme_id_in IN
em_memes.meme_id%TYPE,
new_status_out OUT em_memes.meme_status%TYPE);
END;
/
For reasons of space, we will not provide the package body
(whoops, you caught me! I no longer have constraints on "space", but
the real reason I won't show you the body is that it is totally uninteresting,
just stubs to make sure the package body will compile), but here is what the
elements in the specification provide:
em_memes_pkg .incoming_t
|
A nested table type that contains rows of data from the
em_incoming table
|
em_memes_pkg .mentions_t
|
A nested table type that contains rows of data from the
em_mentions table
|
em_memes_pkg .meme_ids_t
|
A nested table type that contains primary keys from the
em_memes table
|
em_memes_pkg log_error
|
A generic (and typical) error logging mechanism. Defined
as an autonomous transaction, it writes out the basic error information
available from the DBMS_UTILITY functions, and more to a log table and then
commits just that insert.
|
em_memes_pkg.unpacked_incoming
|
A function that converts a row of incoming data (whose
mention_details column is an XML document that holds the different formats of
source information) into a record that can be inserted into the me_mentions
table
|
em_memes_pkg.reset_meme_status
|
The "heart and soul" of the extremememe.info
proprietary process, it analyzes the contents of the mentions table and
determines the status of the meme.
|
Here is an explanation of the key parts of em_update_status
(see Listing 1):
Line(s)
|
Description
|
10
|
For each row in the staging table (em_incoming)…
|
12, 40
|
Put the entire body of the loop inside its own nested
block, so that any exception can be trapped and logged.
|
13
|
Set a savepoint for this new "transaction" (add
a mention, update the status).
|
15
|
Convert the incoming row to a record that can be inserted
into the mentions table.
|
17-24
|
Insert a single row into the mentions table.
|
26-27
|
Compute the new status for the meme, based on the new
mention.
|
29-34
|
If the status isn't NULL, update the memes table with that
status
|
35-39
|
If anything goes wrong, log the error and then erase the
effects of either/both the insert and update by rolling back to the
savepoint.
|
We now have all the information we need to analyze and fix
the em_update_status procedure.
Before I tell you about my solution (and, preferably, before you read the Oracle Magazine article!), however, I'd like to
stop and offer you, dear reader, the
opportunity to think through the situation yourself:
1. What do you see as the key problems with the program and
perhaps more deeply the table design?
2. What would you do to fix the problem(s)?
Comments
Post a Comment