Vishal desai’s Oracle Blog

December 11, 2013

Bloom Filter and function in join condition

Filed under: Performance Tuning, SQL — vishaldesai @ 4:26 pm

One of my data warehouse user wanted to compare data between two Quarters. Original SQL is slightly complicated so I have created test case to demonstrate the issue.

DROP TABLE t1;

CREATE TABLE t1
  (
    id       NUMBER NOT NULL,
    datecol  DATE NOT NULL,
    comments VARCHAR2(500)
  )
  PARTITION BY RANGE
  (
    datecol
  )
  (
    PARTITION q3 VALUES LESS THAN (TO_DATE('10/10/2013', 'DD/MM/YYYY')) TABLESPACE users,
    PARTITION q4 VALUES LESS THAN (TO_DATE('01/01/2043', 'DD/MM/YYYY')) TABLESPACE users
  );
  
INSERT INTO t1
SELECT
  rownum,
  to_date('31/12/2013','DD/MM/YYYY'),
  dbms_random.string('X', '500')
FROM dual a
  CONNECT BY level <= 5000;
  
INSERT INTO t1
SELECT
  rownum,
  to_date('30/09/2013','DD/MM/YYYY'),
  dbms_random.string('X', '500')
FROM dual a
  CONNECT BY level <= 5000;

For sake of simplicity, I have added only two partitions here but original fact table had 10 years of data. Below is subset of SQL statement that user was running to compare data across quarters.

select /*+ leading(xa,xb) parallel(xa,4) parallel(xb,4) */ sum(xa.id),count( distinct xb.comments)
from t1 xa ,t1 xb
where xa.datecol = add_months(xb.datecol,-3)
and   xa.datecol = '30-SEP-13';
------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |          |     1 |   265 |   367  (74)| 00:00:01 |       |       |        |      |            |
|   1 |  SORT AGGREGATE                  |          |     1 |   265 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR                 |          |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)           | :TQ10001 |     1 |   265 |            |          |       |       |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE               |          |     1 |   265 |            |          |       |       |  Q1,01 | PCWP |            |
|   5 |      VIEW                        | VW_DAG_0 |    34M|  8774M|   367  (74)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   6 |       HASH GROUP BY              |          |    34M|  9370M|   367  (74)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   7 |        PX RECEIVE                |          |    34M|  9370M|   367  (74)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   8 |         PX SEND HASH             | :TQ10000 |    34M|  9370M|   367  (74)| 00:00:01 |       |       |  Q1,00 | P->P | HASH       |
|   9 |          HASH GROUP BY           |          |    34M|  9370M|   367  (74)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|* 10 |           HASH JOIN              |          |    34M|  9370M|   122  (20)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|  11 |            PARTITION RANGE SINGLE|          |  5501 |   118K|    33   (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWC |            |
|* 12 |             TABLE ACCESS FULL    | T1       |  5501 |   118K|    33   (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |            |
|  13 |            PX BLOCK ITERATOR     |          |  6311 |  1608K|    65   (0)| 00:00:01 |     1 |     2 |  Q1,00 | PCWC |            |
|* 14 |             TABLE ACCESS FULL    | T1       |  6311 |  1608K|    65   (0)| 00:00:01 |     1 |     2 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  10 - access("XA"."DATECOL"=ADD_MONTHS(INTERNAL_FUNCTION("XB"."DATECOL"),-3))
  12 - filter("XA"."DATECOL"='30-SEP-13')
  14 - filter(ADD_MONTHS(INTERNAL_FUNCTION("XB"."DATECOL"),-3)='30-SEP-13')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

As you can see from line 14, table T1 (XB) was doing full table scan and was accessing both the partitions (pstart 1 and pstop 2). In original SQL it was accessing all partitions for last 10 years.

I asked user to add additional filter for T1 (XB) so that it will scan required partition as shown below. But user was not satisfied (as it would require calculating dates manually every time they run this SQL and sometimes they run this SQL for multiple Quarters). Other option was to pass filter as xb.datecol = (select add_months(‘30-SEP-13’,3) from dual) but I was curios to find out more about bloom filters.

select /*+ leading(xa,xb) parallel(xa,4) parallel(xb,4) */ sum(xa.id),count( distinct xb.comments)
from t1 xa ,t1 xb
where xa.datecol = add_months(xb.datecol,-3)
and   xa.datecol = '30-SEP-13'
and   xb.datecol = '31-DEC-13';

I was wondering whether Oracle would create bloom filter after scanning T1 (XA) and apply that to T2 (XB) and do partition scan on T2(XB) instead of scanning all partitions but the problem was add_months function in join condition.

Original Plan:

image

Recommended SQL and plan (bloom filter in action):

select /*+ leading(xa,xb) parallel(xa,4) parallel(xb,4) */ sum(xa.id),count( distinct xb.comments)
from t1 xa ,t1 xb
where add_months(xa.datecol,+3) = xb.datecol
and   xa.datecol = '30-SEP-13';
-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |     1 |   265 |   315  (69)| 00:00:01 |       |       |        |      |            |
|   1 |  SORT AGGREGATE                   |          |     1 |   265 |            |          |       |       |        |      |            |
|   2 |   PX COORDINATOR                  |          |       |       |            |          |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)            | :TQ10001 |     1 |   265 |            |          |       |       |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                |          |     1 |   265 |            |          |       |       |  Q1,01 | PCWP |            |
|   5 |      VIEW                         | VW_DAG_0 |    28M|  7145M|   315  (69)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   6 |       HASH GROUP BY               |          |    28M|  7630M|   315  (69)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   7 |        PX RECEIVE                 |          |    28M|  7630M|   315  (69)| 00:00:01 |       |       |  Q1,01 | PCWP |            |
|   8 |         PX SEND HASH              | :TQ10000 |    28M|  7630M|   315  (69)| 00:00:01 |       |       |  Q1,00 | P->P | HASH       |
|   9 |          HASH GROUP BY            |          |    28M|  7630M|   315  (69)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|* 10 |           HASH JOIN               |          |    28M|  7630M|   118  (17)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|  11 |            PART JOIN FILTER CREATE| :BF0000  |  5501 |   118K|    33   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|  12 |             PARTITION RANGE SINGLE|          |  5501 |   118K|    33   (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWC |            |
|* 13 |              TABLE ACCESS FULL    | T1       |  5501 |   118K|    33   (0)| 00:00:01 |   KEY |   KEY |  Q1,00 | PCWP |            |
|  14 |            PX BLOCK ITERATOR      |          | 10278 |  2619K|    65   (0)| 00:00:01 |:BF0000|:BF0000|  Q1,00 | PCWC |            |
|  15 |             TABLE ACCESS FULL     | T1       | 10278 |  2619K|    65   (0)| 00:00:01 |:BF0000|:BF0000|  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  10 - access("XB"."DATECOL"=ADD_MONTHS(INTERNAL_FUNCTION("XA"."DATECOL"),3))
  13 - filter("XA"."DATECOL"='30-SEP-13')

image

As you can see from above, just by flipping the add_months function Oracle now creates and uses bloom filter to scan required partition for T1 (XB). So from this behavior it looks like bloom filters cannot be used for join column surrounded by function. It sounds very similar to having index on column x, scanning table with function(x)=y will not use index on column x. May be in future Oracle will add function based bloom filters.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: