The Hermetic Ratio - A Database Design Measure

Over the years, several of my contracts involved analysing and critiquing third party databases.

On arrival at the client site, I would be given the business model specification, a copy of the database and a room/cubicle to work in.Obviously depending on the complexity of the model, it could take me weeks to give a full analysis.

Mostly the report would be centred around normalisation, data integrity, scalability and the quality of the implementation of the business model within the database.

On final delivery of report, the client (usually the CIO or CEO) would take me out for a coffee and discuss the results. In most cases, they would ask me to cut down the technobabble and give it to them straight. Essentially they wanted me to provide some sort of score for the database.

In response, I devised a series of metrics that would allow a layman to interpret my results quickly.

I present to you the metric that had the most impact on my clients..... The Hermetic Ratio.

Named in honor of the legendary Hermes Trismegistus the Thrice Great, the ratio was designed to capture the "wholeness" of the database.

It is a measure of the databases integrity features and is centred around the concept of domain/type implementation. Put simply it is the numberof constraints defined by the number of data elements being captured.

The ratio has 2 contexts

  1. Individual tables
    • Each column in a table should have a domain/type defined. The base types supplied by most DBMS are inadequate for every day use and should be constrained further by CHECK constraints.
    • Each table has its own Hermetic Ratio
  2. The Database
    • The Hermetic Ratio of the database is the average Hermetic Ratio of all tables

Interpreting the results

  • For each table
    • Excellent >= 1.00
    • Acceptable >= 0.5
    • Poor < 0.5
  • Database
    • Excellent >= 0.75
    • Acceptable >= 0.5
    • Poor < 0.5

I have never seen any moderately large database schema with a Hermetic Ratio over 0.75 This is primarily caused by the addition of logging and audit tables which understandably score very low.

The most frightening results always appear from those schemas generated from Object/Relational tools and the big ERP and CRM implementations.

I DARE YOU ALL TO RUN THIS OVER YOUR CURRENT DATABASE.

Anybody who scores over 0.5 for the database ratio should pat themselves on the back......

Without further ado...

-- Individual Table Hermetic Ratio
Select TABLE_NAME, cast(CHKCount as money)/ColumnCount as HermeticRatio
from
(
Select T.TABLE_NAME
    , COUNT(CC.COLUMN_NAME) as ColumnCount
    , (SELECT COUNT(*) from INFORMATION_SCHEMA.TABLE_CONSTRAINTS CK where CK.TABLE_NAME = T.TABLE_NAME) as CHKCount
from INFORMATION_SCHEMA.TABLES T
inner join INFORMATION_SCHEMA.COLUMNS CC on CC.TABLE_NAME = T.TABLE_NAME
where TABLE_TYPE = 'BASE TABLE'
group by  T.TABLE_NAME
) as X

-- Database Hermetic Ratio
Select AVG(HermeticRatio) as HermeticRatio
from
(
Select TABLE_NAME, cast(CHKCount as money)/ColumnCount as HermeticRatio
from
(
Select T.TABLE_NAME,  COUNT(CC.COLUMN_NAME) as ColumnCount
    , (SELECT COUNT(*) from INFORMATION_SCHEMA.TABLE_CONSTRAINTS CK where CK.TABLE_NAME = T.TABLE_NAME) as CHKCount
from INFORMATION_SCHEMA.TABLES T
inner join INFORMATION_SCHEMA.COLUMNS CC on CC.TABLE_NAME = T.TABLE_NAME
where TABLE_TYPE = 'BASE TABLE'
group by  T.TABLE_NAME
) as X
) as Y
Print | posted on Wednesday, February 21, 2007 11:25 AM

Feedback

# re: The Hermetic Ratio - A Database Design Measure

left by Tara at 2/21/2007 6:28 PM Gravatar
After getting rid of the extra comma in the second SELECT statement, I ran this on a few of my databases. The highest score was 0.4733. The developers are responsible for the design though with only minimal help from the DBAs. They prefer to do most of their constraining in the application of course, which I keep telling them is bad. We do at least have all of the PKs, FKs, and unique constraints in place, just not too many check constraints.

# re: The Hermetic Ratio - A Database Design Measure

left by DavidM at 2/21/2007 7:13 PM Gravatar
Thanks for the bug report Tara.. I'll blame HTML..

0.4733 is definately above average for non-DBA developed databases. Kudos to them!

# re: The Hermetic Ratio - A Database Design Measure

left by Mladen at 2/21/2007 9:23 PM Gravatar
hey this is a very nice metric.

our current db has a score of 0.2171:(

but then again this is by design since it was decided that all integrity will be done in the app (ORM)
oh well... you live with what you get :)

# re: The Hermetic Ratio - A Database Design Measure

left by Jon H at 2/22/2007 5:32 AM Gravatar
.3267

But we do have a lot of audit log tables..ya that's it.

A very nice post yet again David. Love to read your posts.

Jon

# re: The Hermetic Ratio - A Database Design Measure

left by rockmoose at 2/22/2007 10:19 AM Gravatar
Cool, this is a must try :-)
Main db (300+tables) 0.3686
Ods/Reporting (80+tables) 0.5519
Problem Db#1(90+tables) 0.2248
Latest Db(only 8 tables so far..) 0.925 (sysdiagrams is the culprit :))

So definitely room for improvement, but that, I already know !

Cool approach, I would definitely like to see the ratio of null columns as part of the analysis.
select nullAbilityFactor=100*sum(case when c.is_nullable = 'yes' then 1 else 0 end)/count(*),count(*)
from information_schema.columns c
join information_schema.tables t
on c.table_catalog = t.table_catalog
and c.table_schema = t.table_schema
and c.table_name = t.table_name
where c.column_name not in('regDate','changeDate') -- except any metadata columns...
and t.table_type = 'base table'

oops! 31, hmm too high!
rockmoose

# re: The Hermetic Ratio - A Database Design Measure

left by DavidM at 2/22/2007 10:46 AM Gravatar
Rockmoose,

Don't steal my next post!!!

That is another metric I have - The PUC Index (Potenitally Useless Column)

# re: The Hermetic Ratio - A Database Design Measure

left by Paul at 2/26/2007 12:26 PM Gravatar
I had to change cast(CHKCount as money) to cast(CHKCount as decimal) to get this to work. However nice stuff.

# re: The Hermetic Ratio - A Database Design Measure

left by CliffordHeath at 2/26/2007 5:07 PM Gravatar
186 tables, HR=0.76 - with no CHECK constraints, and some FK's missing because of a core product and two sub-products that install separate schema elements without possibility of checking between domains.

# re: The Hermetic Ratio - A Database Design Measure

left by DavidM at 2/26/2007 5:26 PM Gravatar
Nice Clifford! The is at the low end of large schema's but very good none the less.

No CHECK constraints! Hmmm.... I'd imagine it is highly normalised with thin tables and a large number of interdependancies. Possibly multple ternary relationships.

# re: The Hermetic Ratio - A Database Design Measure

left by rudy.komacsar at 2/28/2007 12:54 PM Gravatar
I ran it on the MOM 2K5 OnePoint database and got .2144 ! A couple of tables came up 1.0 and 0.75 but the vast majority of this "fine MS product" ccam in under 0.5. There are another 40-50 more databases from a variety of vendors that I am going to test it on.

# re: The Hermetic Ratio - A Database Design Measure

left by Jorge Antonio at 3/2/2007 12:09 PM Gravatar
199 tables and the dtabase scored 0.6282, it's from an ERP system I built

# re: The Hermetic Ratio - A Database Design Measure

left by rudy.komacsar at 3/6/2007 4:37 AM Gravatar
I ran this on a database and got 2 tables that scored 1.5 ??? Quite possibly there is a slight discrepency somewhere ...

# re: The Hermetic Ratio - A Database Design Measure

left by DavidM at 3/6/2007 6:58 AM Gravatar
That is normal Rudy.

create table Test (TestCol varchar(10) not null primary key, checkCK_Test (len(TestCol) > 0))

1 column, 2 constraints......This table will score 2.

# re: The Hermetic Ratio - A Database Design Measure

left by Just 4 Fun at 3/10/2007 8:51 AM Gravatar
Northwind: .5279 :-)

# re: The Hermetic Ratio - A Database Design Measure

left by BuggyFunBunny at 5/15/2007 6:29 AM Gravatar
Just out of curiosity, but how often to you analyze sequential batch processing (e.g. COBOL based) systems?

Not that I disagree with your metric, but COBOL folk (and J[2]EE java folk; en masse - javBOL) always claim that it's faster "to do integrity in the application". Have you ever had the opportunity to measure a system where no integrity maintenance is duplicated between application and database? Which was faster?
Comments have been closed on this topic.