byrmol Blog

Garbage

The PUC Index - A Database Design Measure

In following from my previous post, and in the interests of a free shirt from graz, I present another metric designed to summarise the "goodness" of a physical database schema

The Potentially Useless Column (PUC) Index analysis the number of NULL defined columns versus the total number of columns. 

The PUC Index has 3 indicators (TotalColumns, NULLColumns and Index) and applies to individual Tables and the Database

Interpreting the results is very straight foward.  Any result in either the Table or Database with a PUC Index over 0.00 must be investigated and justified. 

If I had to give a general guidline..... Any database that scores more than 0.05 (5% of the columns are PU)  needs further considerations and any database that scores more the 0.5 (50%)  is probably an EXCEL spreadsheet.

-- Table
Select TABLE_NAME, TotalColumns, NULLColumns
    , cast(NULLColumns as money)/ TotalColumns  as PUCIndex
from
(
    select T.TABLE_NAME , count(*) as TotalColumns
        , sum(case when IS_NULLABLE = 'YES' then 1 else 0 end) as NULLColumns
    from INFORMATION_SCHEMA.COLUMNS C
    inner join INFORMATION_SCHEMA.TABLES T on T.TABLE_NAME = C.TABLE_NAME
    where T.TABLE_TYPE = 'BASE TABLE' 
    group by T.TABLE_NAME
) as x

-- Database
Select sum(TotalColumns) as TotalColumns
    ,  sum(NULLColumns) as TotalPUCs
    , cast(sum(NULLColumns) as money)/ sum(TotalColumns)  as PUCIndex
    
from 
(
Select TABLE_NAME, TotalColumns, NULLColumns
    , cast(NULLColumns as money)/ TotalColumns  as PUCIndex
from
(
    select T.TABLE_NAME , count(*) as TotalColumns
        , sum(case when IS_NULLABLE = 'YES' then 1 else 0 end) as NULLColumns
    from INFORMATION_SCHEMA.COLUMNS C
    inner join INFORMATION_SCHEMA.TABLES T on T.TABLE_NAME = C.TABLE_NAME
    where T.TABLE_TYPE = 'BASE TABLE' 
    group by T.TABLE_NAME
) as x
) as y

 

Legacy Comments


Jon
2007-02-22
re: The PUC Index - A Database Design Measure
Ok so you're creating this table...about Logins and you specifiy a facilityID for each login. Only certain individuals have a facilityID....so the others don't have a facilityID its unknown...so whats the biggie...NULL definite. What would you put in instead an empty space or a zero? Come on nothing wrong with nulls.

So you have a field asking for address 2 information...most people fill out address 1...who needs address 2 you say right?
So what do you do, give it an empty space....or a bunch of ------------...BAHHHHHHHHHHHHHH I say to all of that...go NULLs.

Nothing wrong with a NULL here or there...

And how do you handle gender..especially nowadays...a bit you say...wrong not possible by definition the bit is only on or off...this is the 21st century aint gonna happen...o but an integer you say 0 for males, 1 for females, and 2 for crazy yak....bah humbug to that...NULL all the way

O wait but what about a self referencing table...work orders lets say...one work order may be the root of additional work orders with a specific parentID. What happens to the root's parentid i ask...o you want to give it a default value of 0..sure looks nice...and you avoid NULLs....BAHHHHHHHHHHHH HUMBUUUUUUUG to that...go for the NULLs


DavidM
2007-02-22
re: The PUC Index - A Database Design Measure
You know Jon, I originally had a warning that this could be a possible flame war inspiring post.

But I thought "No... I use the words 'justified' and 'potential', they'll be cool about it"

Based on your response, I take it your database had a high PUC Index?

Chris Woodruff
2007-02-23
re: The PUC Index - A Database Design Measure
Thanks for all the great metrics. I will not give my scores <HEADE DOWN IN SHAME> for any of my inherited databases. This and the Hermetic Ratio are great to see how the databases "look".

rockmoose
2007-02-23
re: The PUC Index - A Database Design Measure
Ahem, Ahem..

Another great post!

I wonder what the next installment will be...
Maybe along the lines of a NKF (Natural Key Factor) ???

If I run this on the snitz db, it gets a NKF = 0.3333
I.e 33% of the tables have a "business key" - defined by having a key which does not include a Identity Column or a UID column. Ok, the uniqueidentifier rule should be relaxed a in the case when the Key is inherited from a parent table.

But so far I have:
(.... I wonder if it will be readable ...)

Regards,
rockmoose

declare @TableKeyAnalysis table
(
table_schema sysname
,table_name sysname
,hasNaturalKey tinyint not null check(hasNaturalKey<2)
,primary key(table_schema,table_name)
)
;with IdentityAndUidColumns(table_schema,table_name,column_name)
as
(
select table_schema,table_name,column_name
from information_schema.constraint_column_usage
where 1 = columnproperty(object_id(table_schema+'.'+table_name),column_name,'IsIdentity')
union
select schema_name(o.schema_id) as table_schema, o.[name] as table_name, c.[name] as column_name
from sys.objects o
join sys.columns c on o.object_id = c.object_id
join sys.types t on c.user_type_id = t.user_type_id
where t.system_type_id = (select system_type_id
from sys.types
where [name] = 'uniqueidentifier')
and o.type = 'U'
)
,NaturalKeyConstraints
as
(
select tcc.constraint_catalog,tcc.constraint_schema,tcc.constraint_name
from information_schema.table_constraints tc
join information_schema.constraint_column_usage tcc
on tc.constraint_catalog = tcc.constraint_catalog
and tc.constraint_schema = tcc.constraint_schema
and tc.constraint_name = tcc.constraint_name
--and tc.table_name = 'sysdiagrams'
where tc.constraint_type in('PRIMARY KEY','UNIQUE')
and not exists( select *
from IdentityAndUidColumns iuid
where tcc.table_schema = iuid.table_schema
and tcc.table_name = iuid.table_name
and tcc.column_name = iuid.column_name)
)
insert @TableKeyAnalysis
select t.table_schema,t.table_name,case when nk.table_name is not null then 1 else 0 end as hasNaturalKey
from information_schema.tables t
left join
(
select distinct
table_schema,table_name
from information_schema.table_constraints tc
where constraint_type in('PRIMARY KEY','UNIQUE')
and exists( select *
from NaturalKeyConstraints nk
where tc.constraint_catalog = nk.constraint_catalog
and tc.constraint_schema = nk.constraint_schema
and tc.constraint_name = nk.constraint_name)
) nk
on t.table_schema = nk.table_schema
and t.table_name = nk.table_name
where t.table_type = 'BASE TABLE'

-- Presentation
select 1.0*sum(hasNaturalKey)/count(*) as NKF
from @TableKeyAnalysis

select *
from @TableKeyAnalysis


NKF
---------------------------------------
0.333333333333

Jon
2007-02-23
re: The PUC Index - A Database Design Measure
David I was just trying to be funny :).
Good post again :)