Showing posts with label facts. Show all posts
Showing posts with label facts. Show all posts

Friday, February 24, 2012

Financial Accounting Cube

Hi,

I'm designing a cube for analyzing financial data from an accounting application.

Basically, the facts table looks like:

(idAccount, idTime, nValue)

that shows every financial operation for any account, and it's very easy to build a cube for analysing the accounting movements.

But there is another important measure, that's the balance of every account in any time.

A tipical report would be, for a whole year, the sum of movements of any account and the balance at the end of the period.

I think adding a new column to the facts table like:

(idAccount, idTime, nValue, nBalance)

wouldn't work, because any account has a balance in any time, even thought there isn't movements in that time period.

What do you think would be a good design approach?

Why not break it into two (Seems like they are unrelated facts that happen to have common dimensions):

idAccount, idTime, nValue

idAccount, idTime, nBalance

|||

Could work, but with that approach I would have a row in the fact table for every day and for every account, and that doesn't look very efficient.

Imagine a "sales" account, that has a balance value in day 10 of 10,000$, and there are more sales after that day:

(idAccount="sales", idTime=10, nBalance=10000)

(idAccount="sales", idTime=15,nValue=500)

so, (idAccount="sales", idTime=15, nBalance=10500)

That means, the balance value changes after any accounting operation.

And now, what happens if the user makes a report that has a row for every account with two more columns: nBalance and nValue, and chooses only the day 20?

For the "sales" account nValue = 0, that's clear, but, what happens with nBalance?

It should be 10500, becuase the balance didn't change after the day 15.

The problem is that I don't know how to create the nBalance measure in order to get that behaviour.

The only way I can figure out is filling the (idAccount, idTIme, nBalance) table with the balance of every account for every day, and that seems to be a waste of resources, because there're many accounts that have very few changes in their balance over the year.

Any suggestion?

P.S. Another problem is the calculation of aggregations for nBalance. Aggregations for nBalance in the account dimension must be a sum of nBalance for any account. But for time dimension, nBalance is the value of nBalance in the last time of the time dimension.

I can figure out many "real wolrd" problems similar to this. For example, imagine that you have a solution for identifying the stock in a warehouse. There are, at least, two main measures: the ins and outs of the warehouse and the actual stock in a certain value of time.

Sunday, February 19, 2012

Filtering Records Out in DSV. Performance Benefit?

In short, will it be advantageous for me to filter out records that are not associated with any facts within the DSV?

For example, there are records in our Products dimension that have no facts associated with them and I'm thinking of doing the following for the Products table in the cube's DSV:

select *
from products (nolock)
where product_key not in (
select distinct product_key from [sales fact] (nolock)
) or product_key not in (
select distinct product_key from [forecast fact] (nolock)
)

Doing this would filter out around 15,000 rows.This isn't a performance question - this is a functionality question. Do you want your dimensions to have members, which (currently) don't have any data associated with them ? If yes - then you need to do this filtering in DSV. If no - then not. As far as performance goes, I imagine that by doing this filtering you will make performance worse, since during processing of dimension it will have to do join with two fact tables.|||

Mosha Pasumansky wrote:

This isn't a performance question - this is a functionality question. Do you want your dimensions to have members, which (currently) don't have any data associated with them ? If yes - then you need to do this filtering in DSV. If no - then not. As far as performance goes, I imagine that by doing this filtering you will make performance worse, since during processing of dimension it will have to do join with two fact tables.

I'm not too concerned with the performance hit during the processing of the cube since we do it night.

So there's really no performance benefit to filtering out the "unnecessary" data?|||Again, if it is OK for you to have less dimension members in the dimension - you may see some performance benefits during qurying. Whether or not you will see this benefit depends greatly on the queries you are sending, on the calculations inside cube etc.|||

Hi,

I did this for AS2000 and do this for AS2005. It help me. I have only about 2 Millions dimension members instead of 5. I make a sence.

But take a look at you query

select *
from products (nolock)
where product_key not in (
select distinct product_key from [sales fact] (nolock)
) or product_key not in (
select distinct product_key from [forecast fact] (nolock)
)

You should rewrite it without "distinct" but should have an index on the product_key field.