Database Design based on existing data
I really like to avoid doing database design based on existing data structures. I really prefer to start with a functional flow of the business, target the data (be it existing electronic data, memo;s, post it's, whatever, data is data) for each function, organize all the data across functions into general entities, then take all of the data elemets or attributes and associate them with the higher level Entities. From all of this, I then would form a logical data model, develop a data dictionary and present it to the the Clients for accuracy. Then worry about the physical implementation.
BUT! Sometimes that's not possible. BUT! Good Normalization rules still apply. In this post, macca asked a question. that based on an existing data file, how would you build a database. Since there are 34 unique “Things” should they build that many tables? Well, in my opinion, the answer was no. It should be 1 table (as we find out later though, it's really 2). The interesting part about this problem was the normalization of the data and the code that breaks it up into it's normalized componenets.
How data ends up so denormalized in the first place always suprises me. I mean I understand X400 directory structures and the like as an original “slick” pre XML days to tag data, but it's gonna be really hard for me to accept this, or to accept XML. I just don't see how it beats the relational model. I mean how dynamic do you need to be?
In any case, this problem caused me more time than usual to provide the poster a solution, so I figured I'd post it. Hey Thursday was a foggy day. Knocked it off this morning in relative short order, but Thursday I just couldn't “see” it. Kinda like smashing your face on to your desk over and over...anyway....enjoy
USE Northwind
GO
SET NOCOUNT ON
CREATE TABLE myStage99(RowNum int IDENTITY(1,1), col1 varchar(8000))
GO
INSERT INTO myStage99(Col1)
SELECT 'one/a/001, one/a/002, one/a/500' UNION ALL
SELECT 'one/b/001, one/b/002, one/b/500' UNION ALL
SELECT 'one/c/001, one/c/002, one/c/500' UNION ALL
SELECT 'one/d/001, one/d/002, one/d/500' UNION ALL
SELECT 'one/e/001, one/e/002, one/e/500' UNION ALL
SELECT 'one/f/001, one/f/002, one/f/500' UNION ALL
SELECT 'two/a/001, two/a/002, two/a/500' UNION ALL
SELECT 'two/b/001, two/b/002, two/b/500' UNION ALL
SELECT 'two/f/001, two/f/002, two/f/500' UNION ALL
SELECT 'seven/a/001, seven/a/002, seven/a/500' UNION ALL
SELECT 'seven/b/001, seven/b/002, seven/b/500' UNION ALL
SELECT 'seven/f/001, seven/f/002, seven/f/500'
GO
CREATE TABLE mySection_SubSection99([Section] varchar(20), SubSection char(1), SubSectionNum char(3))
GO
DECLARE @RowNum int, @MAX_RowNum int, @s int, @e int, @col1 varchar(8000), @parse varchar(24)
, @Section varchar(20), @SubSection char(1), @SubSectionNum char(3)
SELECT @RowNum = 1, @MAX_RowNum = MAX(RowNum), @s = 1, @e = 1 FROM myStage99
WHILE @RowNum <= @MAX_RowNum
BEGIN
SELECT @Col1 = Col1 FROM myStage99 WHERE RowNum = @RowNum
SELECT @e = CHARINDEX(',',Col1,@s)-1 FROM myStage99 WHERE RowNum = @RowNum
WHILE @e > -1
BEGIN
SELECT @Parse = SUBSTRING(Col1, @s, @e-@s+1) FROM myStage99 WHERE RowNum = @RowNum
INSERT INTO mySection_SubSection99([Section], SubSection, SubSectionNum)
SELECT SUBSTRING(@Parse,1,CHARINDEX('/',@Parse)-1) AS [Section]
, SUBSTRING(@Parse,CHARINDEX('/',@Parse)+1,1) AS SubSection
, RIGHT(@Parse,3) AS SubSectionNum
SELECT @s = @e + 3
SELECT @e = CHARINDEX(',',Col1,@s)-1 FROM myStage99 WHERE RowNum = @RowNum
END
SELECT @Parse = REVERSE(SUBSTRING(REVERSE(Col1), 1, CHARINDEX(',',REVERSE(Col1))-2))
FROM myStage99 WHERE RowNum = @RowNum
INSERT INTO mySection_SubSection99([Section], SubSection, SubSectionNum)
SELECT SUBSTRING(@Parse,1,CHARINDEX('/',@Parse)-1) AS [Section]
, SUBSTRING(@Parse,CHARINDEX('/',@Parse)+1,1) AS SubSection
, RIGHT(@Parse,3) AS SubSectionNum
SELECT @RowNum = @RowNum + 1, @s = 1, @e = 1
END
GO
CREATE TABLE [dbo].[mySection99] (
[Section] [varchar] (20) NOT NULL PRIMARY KEY
) ON [PRIMARY]
GO
INSERT INTO mySection99([Section])
SELECT DISTINCT [Section] FROM mySection_SubSection99
GO
ALTER TABLE dbo.mySection_SubSection99 ADD CONSTRAINT
FK_mySection_SubSection99_mySection99 FOREIGN KEY
(
[Section]
) REFERENCES dbo.mySection99
(
[Section]
)
GO
SELECT * FROM mySection99
GO
SELECT * FROM mySection_SubSection99
GO
DROP TABLE myStage99
DROP TABLE mySection_SubSection99
DROP TABLE mySection99
GO