Wednesday, July 30, 2014

Hiring for a New Team: Oracle Database Evangelists

I have been given the honor of building a new team at Oracle: Oracle Database Evangelists.

The Oracle Database Evangelists team is responsible for helping users fully leverage their investment in Oracle Database and to promote the use of Oracle Database technologies, including SQL, PL/SQL, Application Express, Oracle REST Data Services, and more.

So what does it mean to be an evangelist? I very much like the following quote:


A developer evangelist is first and foremost a translator. Someone who can explain technology to different audiences to get their support for a certain product or technology. It needs to be someone who is technical but also capable to find the story in a technical message… A good developer evangelist can get techies excited about a product by pointing out the benefits for developers who use the product on an eye-to-eye level." 

Christian Heilmann, Principal Technical Evangelist at Mozilla

I like the focus on "translation", because our team will be first and foremost dedicated to helping our users succeed. And to do that, you often do need to "translate" the product features and the product documentation into how-tis and examples that are accessible and quickly applied. 

Oracle Database Evangelist Job Description

Each Oracle Database Evangelist (ODE) is responsible for the creation of high quality and highly entertaining resources for training and education; channeling user requirements back to Product Management and Product Development; building a vibrant, engaged global user community.

Detailed Description

The ideal ODE candidate:

Is an expert in at least one product area
Can explain things in ways people understand and by which they are inspired
Enjoys performing in front of a crowd – and a camera
Works on a development team (you have to use it or you lose it)
Is easy to get along with; her or his ego fits through the doorway
Is a user evangelist, representing user needs and issues to development teams

Each ODE will:

Hold monthly webinars in their technology focus area
Write and publish through the PL/SQL Challenge a weekly quiz on their technology
Publish daily tips through Twitter and other social media outlets
Attend key Oracle and Oracle User Group conferences, to present, listen and learn
Work closely with Product Management and Product Development to both deepen product understanding and more effectively communicate to users how to use those products.

Location and Travel

The ODE team is distributed geographically; you do not need to work out of Oracle Headquarters to do this job.

You should expect 25% travel, though the amount of travel will be largely up to you. The focus of our team is on building global communities, and this will be done mostly through the Internet, as opposed to via jet planes.

Interested? Excited? Get in touch!

I am! One of my key objectives in returning to Oracle after a 22 year absence is to help PL/SQL developers utilize that language more fully, and to make it easier for all the experts "out there" (outside of Oracle) to contribute their knowledge to the global community.

The ODE team is going to be precisely the vehicle to do that. And along the way, we will utilize the latest multimedia and education technologies to create engaging, entertaining resources that will change the way our core application development technologies for Oracle Database are perceived and used.

If you've been around the Oracle technology community for a while and are looking for a way to contribute more, to do more, to have a greater impact, then please consider the ODE position. If you want to help ensure that languages like SQL and PL/SQL flourish, are fully appreciated and used to their max, if you like to help others do what you have learned to do, then go to Taleo, find requisition number 14000L34 and apply!






Tuesday, July 22, 2014

Using LOG ERRORS as a testing mechanism

I recently heard a very interesting story about a use for LOG ERRORS: namely, to verify the correctness (and improve the performance) of a problematic SQL statement.

I thought you might enjoy hearing about it.

Greg Belliveau of Digital Motorworks buttonholed me at ODTUG's Kscope14 conference with this report:

As part of a data mart refresh, we had an insert statement that refreshed a FACT table, and over the years had gotten to the point where it took well over 3 hours to complete. 

There was a NOT IN clause in the statement, and we were pretty sure that was the cause of the degenerating performance. Why was it there? Our best guess was so that the statement would not fail in case someone ran the refresh for a date range that had already ran...even though there was code that checked and prevented that from happening. 

[Steven: In other words, programmer's insurance. "Just in case, let's do this." That's always problematic in software. Is that a real "case"? Will it ever happen? What are the consequences of including this insurance? Will we document it so that others coming along later can figure out why this weird code is here? Usually not.]

Not wanting to lose the "safety net" that was in place, we decided to try the LOG ERRORS feature you had mentioned in your session at Oracle Open World in 2010. We removed the NOT IN clause and added LOG ERRORS. The insert statement then ran (and continues to run to this day) in roughly one minute (down from three hours!).

Oh, and there's never been a single row inserted in the error table!

Nice, nice, very nice.

It's always a bit scary to mess with code that's been around for years and (since it's been around for years) is a gnarly bunch of logic. SQL, with its set orientation, can be even more off-putting than, say, a PL/SQL function.

So, of course, the solution is to build a comprehensive automated regression test script so that you can compare before and after results.

Yes, but almost no one will (ever) do that. So we do what we can. 

And in this case, Greg and his team came up with a very creative solution:

We are pretty sure NOT IN is causing the performance problem, but we also cannot afford to remove the clause and have the statement fail.

So we'll take it out, but add LOG ERRORS to ensure that all inserts are at least attempted.

Then we can check the error log table afterwards to see if the NOT IN really was excluding data that would have caused errors.

In Greg's case, the answer was "Nope". So far as they can tell, the NOT IN was unnecessary.

OK, but maybe it will be necessary a week or month or year from now. What then?

Well, the bad data that should have been excluded will still be excluded (the insert will fail), and then the job can check the log table and issue whatever kind of report (or alarm) is needed.

So LOG ERRORS as part regression test, part performance monster.

Nice work, Greg.

Nice work, Oracle.


Friday, July 18, 2014

How Has SQL or PL/SQL Changed Your Life?

It's not hard to identify some of the key enabling technologies of the Information and Internet Eras: Windows, Linux, HTTP, HTML and Java all come to mind.

But likely what doesn't come to mind immediately, these days, is:

SQL and PL/SQL

Seriously, how important can these be or have been when there's an entire software movement that puts the word "No" in front of SQL?

Extremely important, it turns out.

The SQL language, with its set-oriented and declarative power, revolutionized the way programmers, DB administrators and at least some end users worked with, and work with, data.

PL/SQL enabled the creation of powerful, effective mission-critical applications that run pretty much everything that modern human society relies on, day to day.

Sadly, we work in an industry that is perhaps more fashion conscious than the fashion industry. itself. We are always driven to get excited about the latest, greatest (or at least newest) thing. And when a technology's been around for 35 years how good could it really be, anymore?

Pretty darn good, when you're talking about SQL.

I get it that SQL and the relational model at least temporarily has been unable to handle the demands of Big Data and unstructured data. I get it that the world has changed a lot and there are some new requirements out there.

I get all that. What I don't get is that these new requirements cover a tiny percentage of use cases. The vast majority of applications, of user requirements related to data, are still handled best with the relational model.

Talk about throwing out the baby with the bathwater!

Well, folks, it's time to fill up the bath with sparkling spring water and put Baby SQL and Baby PL/SQL back in. [That's probably stretching the metaphor too far, but since I love babies so much, I will go with it. Don't believe me? Check out the Flickr page for my granddaughter.]

It's time, in other words, to "fight back", to recognize the incredible value and importance of the technologies with which we work, and of the work we do.

It's time, in short, to celebrate SQL and PL/SQL!

At Oracle Open World 2014, Oracle Technology Network will host the first-ever YesSQL! A celebration of SQL and PL/SQL.

No feature Powerpoints. No demos. Here's the description:

Co-hosted by Tom Kyte and Steven Feuerstein, YesSQL! celebrates SQL, PL/SQL, and the people who both make the technology and use it. At YesSQL!, special guests Andy Mendelsohn, Maria Colgan, Andrew Holdsworth, Graham Wood and others share our stories with you, and invite you to share yours with us, because?.

YesSQL! is an open mic night. Tell us how SQL and PL/SQL - and the Oracle experts who circle the globe sharing their expertise - have affected your life! 

Bottom line: If developing applications against Oracle Database is a big a part of your life, join us for a fun and uplifting evening.

I hope you can join us at the event (you'll be able to sign up for YesSQL! just like for a regular OOW session). 

But if you can't (or even if you can), you can share your story with us, right here (and on the PL/SQL Challenge, in our latest Roundtable discussion).

How has SQL and/or PL/SQL and/or Oracle Database changed your life, personally, professionally or otherwise? We will select some of your stories to read at the YesSQL! event and if you are attending, you can tell the story yourself.

Monday, July 14, 2014

The Joy of Low Hanging Fruit, Part 3: Deferred satisfaction key to great code

In the second post of this "Joy" series, I introduced the primary technique we would use to improve the performance of extremememe.info's daily job: bulk processing with FORALL and BULK COLLECT.

I emphasized that whenever you see a loop that contains non-query DML, you should convert it to use these bulk features.

So we've got some problematic code and we know what features of PL/SQL to use. So now it's time to check the documentation, make sure we've the syntax down, and apply the feature, right?

Change the code around and watch the bits and bytes flow like greased lightning!

Not quite.

Sure, we could have done that. But after some 30 years in software, I have grown leery and wary of excitement. When we get all excited about a new feature or a solution to a problem, we tend to then be in a hurry to apply it. When you hurry, you miss critical information. When you hurry, you don't question your assumptions. When you hurry, you are much more likely to make mistakes.

In fact, I have found that deferred satisfaction (holding off as long as you before writing the actual code) has served me well, saved me from creating bigger problems, and definitely saved me time.
With that in mind, let's continue with the story.

Code Review Identifies a Logic Flaw

I shared my dismay (non-query DML inside loop) and delight (a known fix with greatly increased performance) with the extremememe.info team. They were excited about the potential improvements and were eager to get started. Then, however, I noticed something odd about the algorithm they'd presented. Here, let's show you the relevant part of the code; can you see what might be the problem?

The original em_update_status

  1  CREATE OR REPLACE PROCEDURE em_update_status
  2  IS
  3     CURSOR incoming_cur
  4     IS
  5        SELECT * FROM em_incoming;
  6
  7     l_mention   em_mentions%ROWTYPE;
  8     l_status    em_memes.meme_status%TYPE;
  9  BEGIN
 10     FOR incoming_r IN incoming_cur
 11     LOOP
 12        BEGIN
 13           SAVEPOINT new_transaction;
 14
 15           l_mention := em_memes_pkg.unpacked_incoming (incoming_r);
 16
 17           INSERT INTO em_mentions (meme_id,
 18                                    source_name,
 19                                    source_details,
 20                                    occurred_on)
 21                VALUES (l_mention.meme_id,
 22                        l_mention.source_name,
 23                        l_mention.source_details,
 24                        l_mention.occurred_on);
 25
 26           em_memes_pkg.reset_meme_status (l_mention.meme_id,
 27                                           l_status);
 28
 29           IF l_status IS NOT NULL
 30           THEN
 31              UPDATE em_memes
 32                 SET meme_status = l_status
 33               WHERE meme_id = l_mention.meme_id;
 34           END IF;
 35        EXCEPTION
 36           WHEN OTHERS
 37           THEN
 38              em_memes_pkg.log_error;
 39              ROLLBACK TO new_transaction;
 40        END;
 41     END LOOP;
 42 END;

A brief Q&A session soon clarified matters.

"Can the table of incoming data contain more than one reference to the same meme?"

Heads nodded vigorously. They told me that the more "viral" a meme behaved, the more mentions there would be. On some very popular memes, the daily incoming could contain millions of references to the same meme. That led directly to my second question:

"Then why are you updating the meme status after each insert of a mention?" (see lines 26-34)

Now eyebrows shot up and heads turned to look at each other. 

"Wow," said one developer. "Oh. My. Gosh." said another.

The team lead turned to me. "We never thought of that before. We've been doing an enormous amount of unnecessary processing. That's kind of embarrassing. It sure helps having a fresh pair of eyes looking at our code."

Yes, it does.

Never underestimate the power of your brain to hide the obvious from you. Once you've spent a lot of time writing or working with a program, you don't really see the code anymore. You see what you think the code should be. When someone else looks at it, they come at it without any preconceived notions and see very different aspects of the program.

SO REMEMBER!
If you're the only one who's ever looked at your code,
you can be certain it contains bugs,
and perhaps even substantial errors in your algorithms.*

We quickly came to an agreement that the statuses of the memes should only be done after all mentions were inserted. Furthermore, only those memes with new mentions should be updated, so we would need to keep track of those.

The resulting code might be a bit more complicated, but if written correctly we can make the code easy to understand and maintain. Bottom line, though: we are focused on performance and this could be a big help.

* Which reminds me of one of my favorite sayings: If you look around the room and wonder who the chump** is, you are the chump.

** Chump: a foolish or easily deceived person.

Analyze Current Behavior

Before you can think about making changes to a program, you need to make sure that you understand how the current state of the code works - and what it does when errors occur. If you don't do this, how can you be sure that the new version still meets user requirements? Oh, right. You could build a regression test. I address that in the next section.

For now, let's focus simply on the importance of analyzing and understanding (deeply) your code before embarking on changes. This is especially critical for programs that change the contents of tables.

Here is a list of the behaviors we need to reproduce in the new, bulked-up version of reset_meme_status:
  • If the incoming table is empty, then make no changes.
  • If an error occurs when inserting a row into the mentions table, do not update the meme status.      
  • If an error occurs processing either DML statement, log the error and continue processing.


But wait a minute – that certainly reflects the current program, but not the program we want to build. Remember that we figured out that the meme status update should occur after all incoming rows are processed. So in actually the new set of behaviors we need are:

  • If the incoming table is empty, then make no changes.
  • If an error occurs when inserting a row into the mentions table, log the error and continue processing.
  • If an error does not occur, record this meme as being changed.
  • After all incoming rows have been processed, updated the status on all affected memes.

This change actually simplifies greatly the kind of code we will need to write using BULK COLLECT and FORALL. The reason is that FORALL can only be used with a single DML statement. So if my loop contains two DML statements I have to find a way to "communicate" between from FORALL to the next. I will explore this challenge more fully in the next installment of this series.

So now we know what we need to do in the bulked-up version. That means it's time start writing code, right? Wrong.

Build the Regression Test

Now, let's talk about seriously deferring satisfaction.

Let's talk about the importance of building automated regression tests before you embark on a substantial revision to mission-critical code.

The reset_meme_status procedure had been in production for several years. It had been thoroughly tested and was now relied upon implicitly to do its job without errors.

There was no margin for error; the new version had to work just as well (but lots faster). And the only way to ensure that this would be the case was a to build a script that verifies the current behavior, and could then be run against the new version as well.

This is called a regression test. You want to make sure that your code has not regressed (to an earlier, buggier state) after you applied a change.

It isn't easy to build regression tests for database programs. First, you must analyze and document current behavior. Then you need to come up with a list of all the test cases needed to verify that behavior. Then you have to build a script to set up the tables with all the variability of production data. Then you have to write code to check the contents of those tables after your program runs. Finally, you need a way to run all that code so that you can easily identify failures or confirm success.
Some development groups write all of this code by hand. Others use one of several regression test tools, including:

utPLSQL - an open source project that I wrote back in 1999, is still used by many groups around the world and has recently come back to life, under the administration of Paul Walker. You must write your own test packages, but the utPLSQL framework then runs the package and verifies the results.

SQL Developer Unit Testing - Oracle's SQL Developer tool offers integrated unit testing that also allows you to describe the expected behavior of your program and then generates the code needed to verify that behavior.

Code Tester for Oracle - a commercial product from Dell that will generate test packages from your descriptions of expected behavior.

There are others besides these, including PLUTO, PL/Unit and dbFit.

I encourage you to explore these options and to make the greatest effort possible to build regression tests, so that you can avoid getting calls from users that go like this: "Thanks for adding all those new features in version 2.6, but you broke three old features that I rely on every day."

Now, having said all of that, I am also resigned to the fact that it will be the rarest of PL/SQL developers (and teams) who actually make time to build automated regression tests.

I don't do a whole lot of them myself.

So I would like to offer a simple alternative that could give you some nice benefits without taking an enormous amount of time.

Before I do I have a question:

You need to write a new program. Your manager says so. OK, so you think about what you need to do and you start typing. Here's the question: how do you know when you're done?

Seems like an obvious question. But the answer often comes down to a variation of:

"I'll know it when I see it."

Unfortunately, our vision is all too often blurred by the pressures we face to get our work done and our programs coded.

So here is my suggestion: before you start a new program, take out a piece of paper and ask yourself "How will I know when I'm done?"

Write down brief descriptions of distinct scenarios the program should handle and what should happen. Don't worry about too many details; the bulk of the value of this step is to externalize the list from your head (very bad place to store a to-do list when you are pressed for time)

Then (finally), you write your code, and you check off the items on the list as you go.

The checklist would contain reminders of key best practices, but more importantly lists out the things your program should be able to do before you declare it done.

Of course, the most important items on the checklist are taken from the previous section: what does the new version of the program need to do?

Task
Completed?
Program Requirements

If the incoming table is empty, then make no changes.

If an error occurs when inserting a row into the mentions table, log the error and continue processing.

If an error does not occur, record this meme as being changed.

After all incoming rows have been processed, updated the status on all affected memes.


Best Practices

Add instrumentation to trace program activity.

Avoid hard-coded declarations. Use %TYPE or subtypes.

Validate all assumptions you are making (and document them).


I realize that when you look at my little checklist here, it can seem kind of obvious. Actually, that is (partly) the point.

These things should be obvious, they often are obvious, but it is so easy to forget them, to push hard and fast along the path to production, and inadvertently leave unimplemented requirements in your wake. Having a piece of paper (or, better, a tool) to help you remember can make an enormous difference.

OK, so remember what I said about deferred satisfaction?

Here's another deferral: I will post this next episode of my series without even getting to bulk processing!

In my next article, though, I promise to offer a detailed look at how bulk processing involves a shift from row by row to phased processing, and what effect that can have on your code.

Until then, happy PL/SQL coding!

Steven
steven.feuerstein@oracle.com


Friday, July 11, 2014

My Oracle Open World Sessions

I will be presenting three times at Oracle Open World:

Session ID: CON7828
Session Title: The Whys and Wherefores of New Oracle Database 12c PL/SQL Features

Session ID: CON8265
Session Title: PL/SQL: The Scripting Language Liberator

Session ID: CON8450
Session Title: SQL (and PL/SQL) Tuning Experts Panel

Now, sure, that's plenty exciting.

But I have even more exciting news: we will be holding the first ever YesSQL! Celebration of SQL and PL/SQL event at OOW14 on Monday, September 29th, at 6:30 PM. Here's a description:

Co-hosted by Tom Kyte and Steven Feuerstein, YesSQL! is an Oracle Open World event celebrating SQL, PL/SQL, and the people who both make the technology and use it.

At YesSQL!, special guests Andy Mendelsohn, Maria Colgan, Andrew Holdsworth, Graham Wood and others share our stories with you, and invite you to share yours with us, because....

SQL Celebration is an open mic night. Tell us how SQL and PL/SQL - and the Oracle experts who circle the globe sharing their expertise - have affected your life!

And right after YesSQL! everyone is invited to join the big Tech Fest on Howard Street, where we can continue our conversations, and mingle with Java, MySQL and other developers in the vast Oracle ecosystem.

Bottom line: If developing applications against Oracle Database is a big a part of your life, join us for a fun and uplifting evening.

You'll be hearing more about YesSQL! in the coming months, but I thought I'd give you early notice so you can put it on your schedule.

Space is limited at YesSQL!, so be sure to sign up. The session ID is CON9027.

Thursday, July 3, 2014

The 2014 PL/SQL by Feuerstein Webinar Series


Now that I am settling in very nicely at Oracle Corporation, after a 22 year absence, it's time to get back to celebrating all the wonderful things you can do with PL/SQL, along with advice about how to best do those things.

I am happy to announce that Oracle Corporation and ODTUG are teaming up to co-host a six-month webinar series on the Oracle PL/SQL language.

The first three webinars, hosted by Oracle, will focus on PL/SQL best practices:
[My apologies about the late notice for the first webcast, scheduled for next week. Like I said: I just got settled in.]

The second set of three webinars, hosted by ODTUG, take on the challenge of mining the PL/SQL language for features and techniques to improve performance:
Whether you are new to PL/SQL or have some years under your belt, attending these six webinars will deepen your expertise and help you think in new and creative ways about how to write fast, maintainable applications.

All webcasts will be recorded and available for viewing 24 hours a day, 7 days a week, for as long as electricity powers the Internet.

I hope you can join me!

Steven