Fragmented array: systems

Showing posts with label systems. Show all posts

Wednesday, March 28, 2012

Finding Correct Primary Key

Hello,
I could not reach a conclusion on this. Maybe someone helps me to understand
better.
I have a consolidated text extract from other systems and want to upload
the data to an SQL table.
Below is the fields in extract.
OrderNo
LineNumber
SellingCompany
PurcCtry
FinalizeDt
ItemNo
CustomerID
SalesRep
Quantity
Sales_USD
At first analysis, I thought OrderNo and LineNumber are the PK of the table
and designed the table.
When I receive the data I realized that since the data is coming from
different sources. Some of them has duplicate Orderno LineNumber
combination.
Now my data has two different of type of data one has unique key of Orderno
LineNumber combination. On the other hand second part of data needs OrderNo
LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key.
Question is : Should I keep these data in two different table, or should I
keep them in a single table by adding more columns on PK ?
What is the affect of defining larger PK even it is not needed for some of
the data?
My table has data has 1.5 million record and increasing 75000 each month.
70 % of data has Unique Orderno Line Number
Data is used by sql server reporting services, I have also some plans to use
analysis server in the future.
Thanks in advance.
erdal,You can use up to 16 columns for a composite primary key, however, make sure
you only use as many as you actually need, based on your data.
If you expect frequent and large increases in table size, make the primary
key nonclustered, and find a more appropriate candidate for the clustered
index. An ideal candidate for this kind of situation is a column with
non-nullable unique values, that increase for each new inserted row (e.g. an
identity column).
Using several tables is an option, yet it may complicate querying for the
reporting purposes.
ML|||> Question is : Should I keep these data in two different table, or should I
> keep them in a single table by adding more columns on PK ?
It seems to me that the data from the 2 sources represents different
entities so you should store these in different tables.

> On the other hand second part of data needs OrderNo
> LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key.
You need to do some analysis to determine the appropriate PK. What does the
data from this data source actually represent (e.g. order details history or
different order entry systems)? Simply adding columns until you come up
with a unique combination isn't the right approach. Data may be unique
based on your current data sample but not necessarily in the future.
Perhaps the data is bad and you are only masking an underlying problem.
Maybe some data scrubbing is needed before import. These are all issues
that need to be addressed in order to make intelligent schema design
decisions.
Hope this helps.
Dan Guzman
SQL Server MVP
"Erdal Akbulut" <erdalim21@.yahoo.com> wrote in message
news:ukME7Fv1FHA.2348@.TK2MSFTNGP15.phx.gbl...
> Hello,
> I could not reach a conclusion on this. Maybe someone helps me to
> understand
> better.
> I have a consolidated text extract from other systems and want to upload
> the data to an SQL table.
> Below is the fields in extract.
> OrderNo
> LineNumber
> SellingCompany
> PurcCtry
> FinalizeDt
> ItemNo
> CustomerID
> SalesRep
> Quantity
> Sales_USD
>
> At first analysis, I thought OrderNo and LineNumber are the PK of the
> table
> and designed the table.
> When I receive the data I realized that since the data is coming from
> different sources. Some of them has duplicate Orderno LineNumber
> combination.
> Now my data has two different of type of data one has unique key of
> Orderno
> LineNumber combination. On the other hand second part of data needs
> OrderNo
> LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key.
> Question is : Should I keep these data in two different table, or should I
> keep them in a single table by adding more columns on PK ?
> What is the affect of defining larger PK even it is not needed for some of
> the data?
> My table has data has 1.5 million record and increasing 75000 each month.
> 70 % of data has Unique Orderno Line Number
> Data is used by sql server reporting services, I have also some plans to
> use
> analysis server in the future.
> Thanks in advance.
>
> erdal,
>|||On Sat, 22 Oct 2005 13:15:06 +0300, Erdal Akbulut wrote:

>Hello,
>I could not reach a conclusion on this. Maybe someone helps me to understan
d
>better.
>I have a consolidated text extract from other systems and want to upload
>the data to an SQL table.
>Below is the fields in extract.
>OrderNo
>LineNumber
>SellingCompany
>PurcCtry
>FinalizeDt
>ItemNo
>CustomerID
>SalesRep
>Quantity
>Sales_USD
>
>At first analysis, I thought OrderNo and LineNumber are the PK of the table
>and designed the table.
>When I receive the data I realized that since the data is coming from
>different sources. Some of them has duplicate Orderno LineNumber
>combination.
(snip)
>Question is : Should I keep these data in two different table, or should I
>keep them in a single table by adding more columns on PK ?
Hi Erdal,
It seems to me that you're approaching this from the wrong way. You
should start with how the data is supposed to be related in the
business.
Not knowing your business and making some wild assumptions about the
meaning of some columns names, my first guess is that (OrderNo,
LineNumber) might be a good candidate key for this table - but that's
noway sure! If your business assigns one and only one salesrep toe ach
customer, than SalesRep should be in the Customers table and can be
removed from this table. Similar, if each customer makes all it's
purchases in one country, then PurcCtry should be moved to the Customers
table as well. There might be more similar dependencies between the
columns that only someone with knowledge of the business model of your
company can find.
WRT the duplicated order numbers: if these sources each have their own
system for assigning order numbers (as might be the case after the
merger of two companies, or in a company with independent branches) and
duplicates are to be expected, then you should add a column to indicate
the source (not the data source, but the source of the order - e.g. the
name or ID of the branch where the order comes from), and add that
column to the primary key. OTOH, if the order numbers really should be
unique in the complete collection of data, then you'll have to hunt down
the reason why duplicates were entered in the first place, fix the
problem, clean up the erronous data before proceeding with this
consolidation project.
(previously snipped text:)
>Now my data has two different of type of data one has unique key of Orderno
>LineNumber combination. On the other hand second part of data needs OrderNo
>LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key.
I'm not sure if I understand you right. Maybe you're just trying to find
a set of columns that would technically (but not logically) qualify as
primary key - in that case see above: either add a column or fix the
real problem.
But if you're saying that the business model logically implies that the
data from one source has (OrderNo, LineNumber, FinalizeDt, ItemNo,
CustomerID) as the "real" primary key, then
a) You have a very unusual business model (but of course, mine is not to
judge), and
b) You can't combine the data from the two sources into one table, since
they have a different logical structure - it would be like trying to
combine ledger entries and personnel records into one table.

>What is the affect of defining larger PK even it is not needed for some of
>the data?
You'll get corrupted data in your table. Would you consider defining the
primary key on your Personnel table as (EmployeeID, EmpName)? I'd hope
not - since it's not the question IF you'll end up with duplicated data,
but WHEN you'll get it. And the answer is "sooner than you think".
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||> When I receive the data I realized that since the data is coming from
> different sources. Some of them has duplicate Orderno LineNumber
> combination.
>
It sounds as if what you need is something like SourceID, OrderNo,
LineNumber. If SourceID is not included in the extract files, then populate
it during the transform/load step.
.|||Thanks for the answers,
Let me try to clarify the issue.
we have 7 companies running on the same application, and 2 more using
different systems. All applications run on a mainframe environment and the
data is also consolidated in mainframe.
For the first 7 companies. OrderNo and Orderline are the candidates. No
problem at this part.
For the other 2 companies, we do not have the data in orderline level. The
data is summed by item, customer etc by month, and they have dummy ordenos
as 999999 and line no as 001
Currently I have two tables. one for the first 7 companies having Orderno
and Linenumber as PK. and another table for the other 2 companies having PK
with multiple columns.
I was thinking to merge the data in one table and define a larger PK having
PK columns from first and second table.
Seems it is not recommended in this case.
regards,
"Erdal Akbulut" <erdalim21@.yahoo.com> wrote in message
news:ukME7Fv1FHA.2348@.TK2MSFTNGP15.phx.gbl...
> Hello,
> I could not reach a conclusion on this. Maybe someone helps me to
understand
> better.
> I have a consolidated text extract from other systems and want to upload
> the data to an SQL table.
> Below is the fields in extract.
> OrderNo
> LineNumber
> SellingCompany
> PurcCtry
> FinalizeDt
> ItemNo
> CustomerID
> SalesRep
> Quantity
> Sales_USD
>
> At first analysis, I thought OrderNo and LineNumber are the PK of the
table
> and designed the table.
> When I receive the data I realized that since the data is coming from
> different sources. Some of them has duplicate Orderno LineNumber
> combination.
> Now my data has two different of type of data one has unique key of
Orderno
> LineNumber combination. On the other hand second part of data needs
OrderNo
> LineNumber ,FinalizeDt,ItemNo and CustomerID for a unique key.
> Question is : Should I keep these data in two different table, or should I
> keep them in a single table by adding more columns on PK ?
> What is the affect of defining larger PK even it is not needed for some of
> the data?
> My table has data has 1.5 million record and increasing 75000 each month.
> 70 % of data has Unique Orderno Line Number
> Data is used by sql server reporting services, I have also some plans to
use
> analysis server in the future.
> Thanks in advance.
>
> erdal,
>|||On Mon, 24 Oct 2005 18:30:15 +0300, Erdal Akbulut wrote:

>Thanks for the answers,
>Let me try to clarify the issue.
>we have 7 companies running on the same application, and 2 more using
>different systems. All applications run on a mainframe environment and the
>data is also consolidated in mainframe.
>For the first 7 companies. OrderNo and Orderline are the candidates. No
>problem at this part.
>For the other 2 companies, we do not have the data in orderline level. The
>data is summed by item, customer etc by month, and they have dummy ordenos
>as 999999 and line no as 001
>
>Currently I have two tables. one for the first 7 companies having Orderno
>and Linenumber as PK. and another table for the other 2 companies having P
K
>with multiple columns.
>I was thinking to merge the data in one table and define a larger PK having
>PK columns from first and second table.
>Seems it is not recommended in this case.
Hi Erdal,
Seems to me that you have two options to chose from:
A) Use two tables. One keyed on (OrderNo, LineNumber) to store the data
from the first 7 companies, and the second one with a smaller amount of
columns, and keyed on (Item, Customer, etc, Month) for the last 2
companies. Don't introducce fake order and line numbers in the last
table, they'll only confuse people.
B) Use a staging table to accept the data from the first 7 companies.
Aggregate this data to the same level as the data you receive from the
other 2 companies. Then combine the data from the latter 2 companies
with the aggregated data from the first 7 companies into one table.
Which of these solutions suits your problem best is for you to decide,
since I don't know what you're planning to do with the data, once it's
in the database.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

Finding Correct Primary Key

Friday, February 24, 2012

Financial Year

Hi,
one of the important things in accounting systems is using financial
year. whenever the year changes, they change the financial year and
use the new one. some times they need to reference the previous
financial year accounts.
I would like to to know what is the best approach to store these case
in sql server?
I thought of an approach which we store a field for Financial year for
each table which needs it. and when querying we use the default
Financial Year. Well, might it be very time consuming when the size of
the database grows very much?
I mean does it take a lot of time to find the financial year we want
through lots of other financial year information.
Or are there any better methods and solution to do so in SQL Server
2005?
Thank you very much
Ali"Ali" <nikzad.a@.gmail.com> wrote in message
news:1189316098.187434.283920@.57g2000hsv.googlegroups.com...
> Hi,
> one of the important things in accounting systems is using financial
> year. whenever the year changes, they change the financial year and
> use the new one. some times they need to reference the previous
> financial year accounts.
> I would like to to know what is the best approach to store these case
> in sql server?
> I thought of an approach which we store a field for Financial year for
> each table which needs it. and when querying we use the default
> Financial Year. Well, might it be very time consuming when the size of
> the database grows very much?
> I mean does it take a lot of time to find the financial year we want
> through lots of other financial year information.
> Or are there any better methods and solution to do so in SQL Server
> 2005?
> Thank you very much
> Ali
>
Create a Calendar table:
http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-calendar-table.html
--
David Portas|||You can use Partitioning for your database for this purpose in case your SQL
Server version is 2005 and edition Enterprise.
Using partitioning, you will be able to save all different years in
different files. It will be more managable and faster. You may keep the old
and rarely used partitions (files of them) to slower disks relatively to the
recent and more used partitions (files of them).
You can find more information about partitioning in the following page:
Understand Partitioning:
http://msdn2.microsoft.com/en-us/library/ms188232.aspx
Ekrem Önsoy
"Ali" <nikzad.a@.gmail.com> wrote in message
news:1189316098.187434.283920@.57g2000hsv.googlegroups.com...
> Hi,
> one of the important things in accounting systems is using financial
> year. whenever the year changes, they change the financial year and
> use the new one. some times they need to reference the previous
> financial year accounts.
> I would like to to know what is the best approach to store these case
> in sql server?
> I thought of an approach which we store a field for Financial year for
> each table which needs it. and when querying we use the default
> Financial Year. Well, might it be very time consuming when the size of
> the database grows very much?
> I mean does it take a lot of time to find the financial year we want
> through lots of other financial year information.
> Or are there any better methods and solution to do so in SQL Server
> 2005?
> Thank you very much
> Ali
>

Wednesday, March 28, 2012

Finding Correct Primary Key

Finding Correct Primary Key

Finding Correct Primary Key

Friday, February 24, 2012

Financial Year

Fragmented array

Blog Archive

About Me