Sunday, February 19, 2012

Filtering Records Out in DSV. Performance Benefit?

In short, will it be advantageous for me to filter out records that are not associated with any facts within the DSV?

For example, there are records in our Products dimension that have no facts associated with them and I'm thinking of doing the following for the Products table in the cube's DSV:

select *
from products (nolock)
where product_key not in (
select distinct product_key from [sales fact] (nolock)
) or product_key not in (
select distinct product_key from [forecast fact] (nolock)
)

Doing this would filter out around 15,000 rows.This isn't a performance question - this is a functionality question. Do you want your dimensions to have members, which (currently) don't have any data associated with them ? If yes - then you need to do this filtering in DSV. If no - then not. As far as performance goes, I imagine that by doing this filtering you will make performance worse, since during processing of dimension it will have to do join with two fact tables.|||

Mosha Pasumansky wrote:

This isn't a performance question - this is a functionality question. Do you want your dimensions to have members, which (currently) don't have any data associated with them ? If yes - then you need to do this filtering in DSV. If no - then not. As far as performance goes, I imagine that by doing this filtering you will make performance worse, since during processing of dimension it will have to do join with two fact tables.

I'm not too concerned with the performance hit during the processing of the cube since we do it night.

So there's really no performance benefit to filtering out the "unnecessary" data?|||Again, if it is OK for you to have less dimension members in the dimension - you may see some performance benefits during qurying. Whether or not you will see this benefit depends greatly on the queries you are sending, on the calculations inside cube etc.|||

Hi,

I did this for AS2000 and do this for AS2005. It help me. I have only about 2 Millions dimension members instead of 5. I make a sence.

But take a look at you query

select *
from products (nolock)
where product_key not in (
select distinct product_key from [sales fact] (nolock)
) or product_key not in (
select distinct product_key from [forecast fact] (nolock)
)

You should rewrite it without "distinct" but should have an index on the product_key field.

No comments:

Post a Comment