A PL/SQL team inside Oracle made excellent use of nested tables and MULTISET operators in SQL, blending data in tables with procedurally-generated datasets (nested tables).
All was going well when they hit the dreaded:
ORA-04030: out of process memory when trying to allocate 2032 bytes
They asked for my help.
The error occurred on this SELECT:
SELECT *
FROM header_tab trx
WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)
AND ((trx.column_ntab MULTISET
EXCEPT DISTINCT generated_ntab2) IS EMPTY)
The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality.
We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of work, revamping algorithms, ensuring correctness, you know the score.
Then my eyes snagged on the use of DISTINCT and IS EMPTY.
They simply needed to make sure that there was nothing in column_ntab that was not in generated_ntab2. Having two of something left was the same result as having one of something left - not empty.
So I suggested that they could drop the DISTINCT and see what happened.
What happened is that the ORA-04030 stopped occurring.
The lead developer also took a closer look at how generated_ntab2 was constructed and discovered that they were already ensuring that there were no duplicates.
The DISTINCT was definitely not needed and so the problem, at least for now, was resolved.
They are not 100% confident that the error still might not occur, especially as data volume increases. But at least they've bought themselves the time to fully analyze their algorithms and explore a 100% SQL solution.
All was going well when they hit the dreaded:
ORA-04030: out of process memory when trying to allocate 2032 bytes
They asked for my help.
The error occurred on this SELECT:
SELECT *
FROM header_tab trx
WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)
AND ((trx.column_ntab MULTISET
EXCEPT DISTINCT generated_ntab2) IS EMPTY)
The problem is clearly related to the use of those nested tables. Now, there was clearly sufficient PGA for the nested tables themselves. So the problem was in executing the MULTISET-related functionality.
We talked for a bit about dropping the use of nested tables and instead doing everything in SQL, to avoid the PGA error. That would, however require lots of work, revamping algorithms, ensuring correctness, you know the score.
Then my eyes snagged on the use of DISTINCT and IS EMPTY.
They simply needed to make sure that there was nothing in column_ntab that was not in generated_ntab2. Having two of something left was the same result as having one of something left - not empty.
So I suggested that they could drop the DISTINCT and see what happened.
What happened is that the ORA-04030 stopped occurring.
The lead developer also took a closer look at how generated_ntab2 was constructed and discovered that they were already ensuring that there were no duplicates.
The DISTINCT was definitely not needed and so the problem, at least for now, was resolved.
They are not 100% confident that the error still might not occur, especially as data volume increases. But at least they've bought themselves the time to fully analyze their algorithms and explore a 100% SQL solution.
Hello Steven,
ReplyDeleteThis problem somehow drives me back to a post on Kim Berg Hansen's blog,
http://dspsd.blogspot.co.il/2014/06/member-of-comparison-of-plsql-and-sql.html
which I understood from Kim's post that it was also discussed with you.
The conclusion of that post seemed to be that, for MEMBER OF testing, SQL "tends to behave in a PL/SQL-ish way", that is, by "materializing in memory" an entire nested table collection stored relational table, and in that case the problem was only that it consumed a lot of time to do this, but without reaching the ORA-04030 memory error, or, anyway, not reaching it yet for the sample data sets used.
In your case above, it also looks to me that, since there is a pl/sql nested table involved, in addition to the stored one ( thr trx.column_ntab ), then probably Oracle still "switches" to a "PL/SQL-ish working style", that is, working in memory.
By "dropping the use of nested tables and instead doing everything in SQL", do you mean not to use at all any nested table, even not the one stored in the database, but use only relational tables and SQL set operations
( UNION, MINUS, INTERSECT ) instead of MULTISET / SUBMULTISET ?
Or, alternatively, do you mean storing the pl/sql nested table also in the database, maybe in some temporary/working table, and let the SQL statement use only stored collections, but still keeping the MULTISET / SUBMULTISET operations in the
resulting SQL ?
As Kim has commented in his blog post, the impression is that Oracle is not able "to delegate" (or "translate") collection operations ( like MEMBER OF, and, maybe also MULTISET and SUBMULTISET ) performed in SQL statements on stored nested tables to the "pure relational" layer, and perform "pure SQL", without materializing entire collections and perform "PL/SQL work".
Anyway, instead of getting memory leak errors, maybe Oracle could employ some kind of using temporary disk space, or, at least be able to do this on demand, by setting a session level parameter, aka, allow those developers who still prefer to keep the "nice nested table syntax" in place, by consuming more running time and less memory instead of the opposite.
It is however interesting how did it happen that removing the DISTINCT alone made the error disappeared, why/how is MULTISET EXCEPT DISTINCT likely to require more memory than MULTISET EXCEPT ?
Is it performing entirely differently ?
Is there an additional sorting involved with DISTINCT, and why would that require more memory and not just more time ?
Thanks a lot & Best Regards,
Iudith
Hello Steven, All,
ReplyDeleteFor the sake of completeness, just another small technical remark:
The empty/non-empty result of MULTISET EXCEPT when using or not using DISTINCT might be different, because the DISTINCT is applied to the two operand collections, and NOT to the result of the simple MULTISET EXCEPT.
This small example shows it:
DECLARE
TYPE numbers_nt IS TABLE OF NUMBER;
l_nt1 numbers_nt := numbers_nt(1,2,2,3,3,3);
l_nt2 numbers_nt := numbers_nt(1,2,3);
PROCEDURE show_bool (p_msg IN VARCHAR2, p_cond IN BOOLEAN)
IS
BEGIN
DBMS_OUTPUT.put_line( p_msg ||' - '||
CASE
WHEN p_cond THEN 'TRUE !'
WHEN NOT p_cond THEN 'FALSE !'
ELSE 'NULL'
END ) ;
END show_bool ;
BEGIN
show_bool ( 'MULTISET EXCEPT IS EMPTY',
(l_nt1 MULTISET EXCEPT l_nt2) IS EMPTY );
show_bool ( 'MULTISET EXCEPT DISTINCT IS EMPTY',
(l_nt1 MULTISET EXCEPT DISTINCT l_nt2) IS EMPTY );
END;
/
MULTISET EXCEPT IS EMPTY - FALSE !
MULTISET EXCEPT DISTINCT IS EMPTY - TRUE !
PL/SQL procedure successfully completed.
This is a little bit different from MULTISET UNION and MULTISET INTERSECT, where, indeed, the emptiness of the result is the same whether we use DISTINCT or not.
The small little things that we usually don't think too deeply about, until reading a "statement in words", rather than just looking at code :):)
So, technically speaking, in the case of the problem in your post, not only "generated_ntab" but "trx.column_ntab" as well should have distinct elements for safely removing DISTINCT from the MULTISET EXCEPT operation.
Looking at real life, though, it is not easy to guess why would somebody need to store in the database a nested table with non-distinct elements, except if for some technical purpose.
Thanks a lot again & Best Regards,
Iudith
Thanks, Iudith. That is excellent information and soon, soon one of us will write a quiz around this, right?
ReplyDeleteHi, I've been following this post keenly since i am part of the team which ran in to this problem and were extremely lucky to get expert help from Steven to try and work this out.
ReplyDeleteAdditionally this blog and the comments/links above (by iudith) have been very useful. I have learnt a lot from this discussion alone.
Based on Steven's inputs, (removing DISTINCT etc) I have re-run some checks against the data and have more than 1 approach working now.
Couple of slight corrections:
Notational Query should be :
SELECT count(1)
FROM header_tab trx
WHERE (generated_ntab1 SUBMULTISET OF trx.column_ntab)
AND ((trx.column_ntab MULTISET
EXCEPT DISTINCT generated_ntab1) IS EMPTY)
1. we only need counts, not the row contents ... neither regular columns nor nested table columns.
2. its generated_ntab1 in both places. There is no generated_ntab2 in query.
3. For business reasons, we have generated_ntab<1/2/3> and trx.column_ntab<1/2/3> to cater for 3 levels of product hierarchy.
This allows independent level specific analysis. So for completeness, the level wise pairing of nested tables are:
level1: generated_ntab1 and trx.column_ntab1 .... column trx.column_ntab in above SQL represents trx.column_ntab1
level2: generated_ntab2 and trx.column_ntab2 and
level3: generated_ntab3 and trx.column_ntab3
4. More an explanation, clarification:
There is a possibility for duplicates in both sides of the comparison (if we did not take care during the ETL or population of nested tables).
I noted the functionality offered by MULTISET EXCEPT DISTINCT and understood that it operated on the operands, not the resultant nested table.
I probably felt compelled to use it in code although it was unnecessary and has caused us some grief. It is more robust code but seems to involve some complicated memory operations.
Market Baskets rules are generally non-unique (a,b=>c)
but here we have some special single member rules of type a=>a defined manually by business users which we need to maintain and use for caculations.
So a rule could have been defined as (94,94) for product(94)=>product(94).
The 3 levels of analysis possible are levels from the product hierarchy.
In the Trx line item table, we have columns to store the SKU (SKU), SBC (Sub-Category) and DEPT (Category) for each line item i.e. SKU lowest level.
If i buy 2 cricket bats (key=10), 1 tennis racquet (key=20) and 2 cans of tennis balls (key=30) then the SKU are all unique. However they could belong to couple of Sub-Categories like Cricket (key=501) and Tennis (key=502). In this case, they all belong to same Department Sports (key=1001).
Naturally we have potential duplicates at SBC and DEPT for each Trx.
We have ensured that the SQLs and/or pl/sql routines populating the nested tables are doing so using DISTINCT resulting in distinct elements in the nested tables.
Market Basket (94=>94) is stored as nt(94), not (94,94)
Similarly SKU nt is stored as (10,20,30) but SBC is stored as nt (501,502), not (501,502,502) and DEPT nt is stored as (1001), not (1001,1001,1001).
As a result the removal of DISTINCT from the calculation logic made (and makes) no difference.
There are couple of more comments i may make later (due to space constraints in comments) based on the results with different ways of rewriting the same set comparison logic.
A variation of the faulty scenario reversing the positions of the 2 nt columns works fine (even with DISTINCT in the picture).
There are also perhaps neater/easier ways to check the same than what i was using at first go.
We are planning to now use: (nt1 SUBMULTISET nt2) and (nt2 SUBMULTISET nt1) to do the job.
Thanks
Shankar