« Tuning Leading '%' Queries - a simple approach | Main | When enough is enough? - how to estimate I/O Performance »

Cardinality and Pipelined Functions - Ugly!

Posted on Monday, August 4, 2008 at 10:27AM by Registered CommenterKarl Reitschuster in , | CommentsPost a Comment

Dear reader,

pipelined functions are very comfortable to process incoming data and to return it as structured result set. For this reason they are frequently used in SQL; But be carefully if you join result sets from pipelined function with other tables.

The Oracle optimizer usually uses gathered table statistics to calculate the cardinality of a join operation; For a result set  originated in a pipeline function it has no statistics to evaluate and hence could try to get another path to the data. 

Following SQL retrieves internal employee GUID's (Emp_Id) from external Id's passed as comma delimited string.

 

SQL> r
  1  SELECT Exe.Column_Value,
  2         Emp.Id AS Emp_Id
  3    FROM TABLE(CAST(Delimited_To_Collection('XCD00001,XCS22221',
  4                                            ',') AS Tc_Varchar2)) Exe,
  5         Employee Emp
  6   WHERE Upper(TRIM(Exe.Column_Value)) = Upper(TRIM(Emp.Extid))
  7*


Ausf³hrungsplan
----------------------------------------------------------
Plan hash value: 2017326674

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                         |  8168 |   215K|   235   (2)| 00:00:03 |
|*  1 |  HASH JOIN                         |                         |  8168 |   215K|   235   (2)| 00:00:03 |
|   2 |   COLLECTION ITERATOR PICKLER FETCH| DELIMITED_TO_COLLECTION |       |       |            |          |
|   3 |   VIEW                             | index$_join$_002        | 51640 |  1260K|   210   (1)| 00:00:03 |
|*  4 |    HASH JOIN                       |                         |       |       |            |          |
|   5 |     INDEX FAST FULL SCAN           | EMPLOYEE_PK             | 51640 |  1260K|   125   (1)| 00:00:02 |
|   6 |     INDEX FAST FULL SCAN           | EMPLOYEE_UK1            | 51640 |  1260K|    84   (0)| 00:00:02 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access(UPPER(TRIM(VALUE(KOKBF$)))=UPPER(TRIM("EMP"."Extid")))
   4 - access(ROWID=ROWID)


Statistiken
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        417  consistent gets
          0  physical reads
          0  redo size
        463  bytes sent via SQL*Net to client
        350  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

 

Instead of using the direct way choosing a function based index on employee defined on the extid column the optimizer merges two indexes, calls the function appropriately 8000 times and hashes the result set of the two sub-operations. Hm... there are indeed many ways to Rome. The real cardinality of the pipelined function based row set is 2;

Looking in the PL/SQL profilers output shows dramatically where computation of a test run is gone

Line 245 : 33.5 seconds for 3215 calls -  that's a lot!

As you see I prepared the Statement already using a CARDINALITY hint. Now the same Statement with CARDINALITY hint on the pipelined result set.

SQL> SELECT /*+CARDINALITY (Exe, 5) */
  2   Exe.Column_Value,
  3   Emp.Id AS Emp_Id
  4    FROM TABLE(CAST(Delimited_To_Collection('XCD00001,XCS22221',
  5                                            ',') AS Tc_Varchar2)) Exe,
  6         Employee Emp
  7   WHERE Upper(TRIM(Exe.Column_Value)) = Upper(TRIM(Emp.Extid))
  8  ;


Ausf³hrungsplan
----------------------------------------------------------
Plan hash value: 3611582162

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                         |     5 |   135 |    27   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                      |                         |     5 |   135 |    27   (0)| 00:00:01 |
|   2 |   COLLECTION ITERATOR PICKLER FETCH| DELIMITED_TO_COLLECTION |       |       |            |       |
|   3 |   TABLE ACCESS BY INDEX ROWID      | EMPLOYEE                |     1 |    25 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN               | EMPLOYEE_FX03           |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access(UPPER(TRIM(VALUE(KOKBF$)))=UPPER(TRIM("EXTID")))


Statistiken
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
        463  bytes sent via SQL*Net to client
        350  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

SQL>

The execution plan now looks more like expected; and number logical reads also reduced from 417 to 6, which is almost 1/70 of logical reads. How about timing now?

 

Instead of 33.500 seconds only 0.111 seconds were needed! The Test run completed in about 0.670 seconds instead of 34.600 seconds which is a tremendous difference which cannot only explained via less logical reads. It's about 1/300 of the time needed without hint. Probably the allocation of the hash areas are expensive too.

So be careful joining pipelined result sets with other tables - take a look on the CBO's estimated cardinality to see why the execution plan has such a strange layout.

HTH
Karl Reitschuster

 

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>