Dave’s guide to the EAV
This guide is intended to help programmers and DBA’s understand the considerations
and consequences associated with an EAV schema and to provide a basic framework for
its implementation in SQL Server (>= 2000).
It is based on the idea that the EAV schema is being implemented "inside" a SQL DBMS.
The following are used interchangeably - column/attribute, row/tuple,
table/relation/relvar
The usual advantage noted, is that it allows end-users to modify (Add) attributes
to the business model without professional personnel (DBA's & programmers)
If we use the analogy of building a house, essential it is designed to allow the home
owner the ability to add more rooms without any architectural or building training.
Can you image the house after a couple of changes?. Something Escher would be proud of...
"Relativity" 1953 Lithograph
The example I will use is based on the following requirements:
- A Client is identified by a unique identifier
- A Client has a first name
- A Client has a last name
- A Client can have many “end user” defined data elements.
- For the purpose of this example we will choose 5 data elements
to focus on the modelling aspect- Age – Integer greater than -1
- Weight – Numeric greater than zero
- Handed – string set “Left” or “Right”
- Quit Date – NULL marker indicating that the Client hasn't quit or
a valid Date - Has Insurance – Bit type
- For the purpose of this example we will choose 5 data elements
DISCLAIMER: I hate HTML. Any coding errors, spelling mistakes, or formatting debacles
can be placed squarely on HTML :-)
Structure
Know thy Enemy
The first 3 requirements can be neatly captured in a simple table and is our
primary “Entity”
CREATE TABLE Client (ClientID INT NOT NULL PRIMARY KEY, FirstName
VARCHAR(50) NOT NULL, LastName VARCHAR(50) NOT NULL )
Below is three possible ways we can model requirement 4:
- A Simple one-to-one design
- We will refer to this as the “Compacted” schema
- This is the most conventional design.
- This design implies that all attribute values are
known simultaneously
- A projected one-to-one design
- We will refer to this as the “Exploded” schema
- This is the projection of all attributes into separate tables.
- It provides more flexibility in that the schema allows any, all or none
of the “attribute tables” to contain data
- EAV
- A basic EAV schema
- The meaning of the schema is very generic and does not divulge any context.
A relational database can be thought of as one big rule and fact engine.
Each column of every table not only "stores" data, but has implied rules in the form of
constraints, relationshipsand of course it type which all go towards maintaining correct, contextual data.
When all columns of a row contain values they state a matter of fact.
“The Client with the identifier of 10, has an Age of 35, a Weight of 78kg, has
not quit, prefers his right hand and does has insurance”
Given all the constraints of the table and its data, we find meaning and context which allows us to extract information and knowledge.
Any changes made to a table (adding new columns, constraints etc...)
destroys the original meaning and creates a new meaning.
If we add another column to the ClientNF table in the "Compacted" schema then
our original row of data about Client 10 is wrong and a new fact is constructed in its place.
To the database (not just the table itself), this is a serious change.
In theory, the entire database is destroyed and a new database created.
If you think I am being dramatic, you are right.
Because modern DBMS perform this operation so effortlessly we tend to forget about the theoretical consequences.
The EAV appears to not only disregard this concept, but spits in its face! Not a good sign…
The minimum requirement of an EAV schema is that it simulates the equivalent
constraints of a conventional design.
Given the EAV schema above, the goal is to recreate the Compacted or Exploded schema’s constraints.
By simply comparing the schemas you can see the major things missing from the EAV
- No types/domain
- No defined column/attribute sets.
- No “real” context
In other words the EAV is missing the core database component!
An EAV schema requires the DBA to build a type system.
You need to realise this if you are to successfully implement an EAV system.
This is not a trivial task. Microsoft, Oracle, IBM and countless others have been
trying for decades and yet custom data type support is still not implemented very well in modern DBMS. What chance do you think you have?
Into the Darkness
Let’s look at the EAV in more detail
CREATE TABLE ClientAttribute
(
ClientID INT NOT NULL ,
Attribute VARCHAR (50) NOT NULL,
AttributeValue sql_variant NULL ,
CONSTRAINT PK_ClientAttribute PRIMARY KEY (ClientID, Attribute),
CONSTRAINT FK_ClientAttribute_Client FOREIGN KEY (ClientID) REFERENCES Client (ClientID)
)
Examining the table we find
- The ClientID represents the Client - “Entity”
- The Attribute represents the name of the Attribute – “Attribute”
- Attribute Value Column represents the value of the data Attribute – “Value”
The Entity Column/s and Attribute Columns must form a composite candidate key
Without this key there is absolutely NO chance of simulating the Compacted or Exploded schema
The Entity Column/s must be a foreign key to the “Entity”
This sounds rather obvious but must be stated none the less.
Physically, the only “weird” looking thing is the use of the sql_variant data type for the AttributeValue attribute. Traditional, a large string is used.There is nothing in relational theory that prohibits type inheritance. So if we use our imagination, we can think of the sql_variant data type as a base type and speak no more it!
We'll load some data into the EAV with the same values as the “Compacted” schema to produce the following results
ClientID | Attribute | AttributeValue |
10 |
Age |
35 |
10 |
Weight |
78 |
10 |
Handed |
Right |
10 |
HasInsurance |
1 |
10 |
QuitDate |
NULL |
When we turn this set of data into its "story" it might look like this….
“The Client with the identifier 10 has an Attribute named Age and an AttributeValue of 35”
“The Client with the identifier 10 has an Attribute named Weight and an AttributeValue of 78”
“The Client with the identifier 10 has an Attribute named Handed and an AttributeValue of Right”
“The Client with the identifier 10 has an Attribute named HasInsurance and an AttributeValue of 1”
“The Client with the identifier 10 has an Attribute named QuitDate and an AttributeValue of NULL”
When compared with the "Compacted" fact story, you notice that it is extremely verbose and that still more interpretation is needed to recreate the “Compacted” story. Another issue is cardinality. It takes exactly 5 rows in the EAV to simulate the “Compacted” schema. This means that all 5 EAV rows must be treated as an atomic operation. If any row is missing, then the EAV does not match the equivalent “Compacted” schema. A cardinality constraint MUST be added to the EAV schema
I don’t need cardinality…
If you do not need the cardinality constraint (e.g.. you don’t care if the Client has an Age but no Weight), then you are modelling the “Exploded” schema.
Most EAV schemas do not have cardinality constraints and so are based on the “Exploded” schema.
This is the major difference between the Compacted and Exploded schemas from the EAV point of view. One has a cardinality constraint, and the other does not.
The depths of Hell…
What you should notice about the EAV schema is that it looks like a mixture of meta-data and data. And that is because it is. We are in effect building a customisable type and storage engine. Or to put it more simply… a database! A database within a database... sounds fun all ready…
Every AttributeValue must be of type Attribute.
This is exactly the same concept that relational theory demands
As an example, when the “Age” Attribute is added, the AttributeValue must conform to the type as defined in the “Compacted” or “Exploded” schema. An Integer greater than -1 in this case.
Every single EAV implementation that ignores this vital concept contains crap data… by definition!
Type checking MUST be done at the database level
This CANNOT be left to the UI or middle tier.
This CANNOT be left to the UI or middle tier.
This CANNOT be left to the UI or middle tier.
This CANNOT be left to the UI or middle tier.
Shut up Dave!
Hack away...
Before I show you the schema for this little type system, go and have a look at SQL Servers…
select * from systypes
select * from syscolumns
Our demonstration system will require 2 additional tables.
- DataTypes
- A table to describe and implement our custom data types.
- This table will attempt to enforce the type checking
CREATE TABLE DataTypes
(
DataType VARCHAR (50) PRIMARY KEY,
IsNumericType bit NOT NULL ,
IsDateType bit NOT NULL ,
HasLike bit NOT NULL ,
LikeExpression VARCHAR (2048) DEFAULT (''),
HasDomain bit NOT NULL ,
DomainExpression VARCHAR (2048) DEFAULT (''),
AllowNull bit NOT NULL DEFAULT (0),
CONSTRAINT CK_DataTypes_DateOrTime CHECK (IsNumericType & IsDateType = 0),
CONSTRAINT CK_DataTypes_Domains CHECK (1 = CASE WHEN (HasDomain = 1 AND (LEN(DomainExpression) > 0)) THEN 1 WHEN (HasDomain = 0 AND (LEN(DomainExpression) = 0)) THEN 1 ELSE 2 END),
CONSTRAINT CK_DataTypes_LikeDomain CHECK (1 = CASE WHEN (HasLike = 1 AND (LEN(LikeExpression) > 0)) THEN 1 WHEN (HasLike = 0 AND (LEN(LikeExpression) = 0)) THEN 1 ELSE 2 END)
)- I hope the column names are descriptive enough to not require further explanation.
- As you can see this table is not fully normalised. To compensate I have had to create CHECK constraints to maintain integrity.
- If HasDomain is TRUE then DomainExpression must contain data or HasDomain is FALSE and DomainExpression is empty
- If HasLike is TRUE then LikeExpression must contain data or HasLike is FALSE and LikeExpression is empty
- Attributes
- A table named Attributes that contains a list of Attribute Names that can be used in the ClientAttribute table
- This table will provide a set of attributes. Effectively, it is a simulation of a table header (the column list).
CREATE TABLE Attributes
(
Attribute VARCHAR (50) NOT NULL PRIMARY KEY ,
DataType VARCHAR (50) NOT NULL DEFAULT ('Text'),
DefaultValue sql_variant NULL ,
CONSTRAINT FK_Attributes_DataTypes FOREIGN KEY (DataType) REFERENCES DataTypes (DataType)
)- A self explanatory table I hope
We add an FK to the ClientAttribute table and this is the ERD we are left with…
Battle with the Devil…
The schema itself is easy to visualise and build, but as it currently stands it is useless or more accurately dangerous.
So far we have only managed to constrain the names of the Attribute in the ClientAttribute table.
We still need type checking to happen.
The best way to explain how this DataTypes table is going to work is to load some data…
DataType |
IsNumericType |
IsDateType |
HasLike |
LikeExpression |
HasDomain |
Domain Expression |
AllowNulls |
Bit |
0 |
0 |
0 |
|
1 |
0|1| |
0 |
Date |
0 |
1 |
0 |
|
0 |
|
1 |
Handed |
0 |
0 |
0 |
|
1 |
Left|Right| |
0 |
Integer |
1 |
0 |
1 |
%[-.]% |
0 |
|
0 |
Number |
1 |
0 |
0 |
|
0 |
|
0 |
Text |
0 |
0 |
0 |
|
0 |
|
0 |
Text Only |
0 |
0 |
1 |
%[0-9]% |
0 |
|
0 |
Positive Number |
1 |
0 |
1 |
%[-]% |
0 |
|
0 |
The HasDomain and DomainExpression column provide us with simple “Look Up” functionality. In our case the Handed DataType can only be Left or Right. It uses the “pipe” delimiter…
Hands up who noticed the bad column name? The LikeExpression should really be NOTLikeExpression
I will assume you understand the rest and move straight away with the validating query for a particular data type and value….
DECLARE @TargetValue sql_Variant
DECLARE @DataType VARCHAR(50)
SELECT @DataType = 'Integer', @TargetValue = 34
SELECT *
FROM dbo.DataTypes D
WHERE D.DataType = @DataType
AND ((IsNumericType = 1 AND ISNUMERIC(CAST(@TargetValue AS NVARCHAR(2048))) = 1) OR IsNumericType = 0)
AND ((IsDateType = 1 AND ISDATE(CAST(@TargetValue AS NVARCHAR(2048))) = 1) OR IsDateType = 0)
AND ((HasLike = 1 AND PATINDEX(LikeExpression, CAST(@TargetValue AS NVARCHAR(2048))) = 0) OR HasLike = 0 )
AND ((HasDomain = 1 AND PATINDEX('%' + CAST(@TargetValue AS NVARCHAR(2048)) + '|%' , DomainExpression)> 0) OR HasDomain = 0 )
OR (AllowNull = 1 AND @TargetValue IS NULL)
Basically it asks the following…
- IsNumericType is true, validate against the ISNUMERIC functions otherwise ignore
- IsDateType is true, validate against the ISDATE functions otherwise ignore
- HasLike is true, validate using PATINDEX and the LikeExpression otherwise ignore
- HasDomain is true, validate using PATINDEX and the DomainExpression otherwise ignore
- AllowNull is true then validate NULL using IS NULL
Hopefully, I don’t need to point out to you all the limitations. But the most note worthy is the casting of the sql_variant. Because the casting operation “loses” the original type, we are only evaluating a representation of the value. This should not be considered a theoretical nuance and must be taken seriously.
Round peg, square hole...hammer
We now need to integrate this validation routine within the database.
There are 3 ways to implement this constraint in SQL Server.
- Views (my personal favourite)
CREATE VIEW ClientAttribute_CONSTRAINT
AS
SELECT ClientID, Attribute, AttributeValue
FROM dbo.ClientAttribute CA
WHERE EXISTS
(
SELECT 1
FROM dbo.Attributes A
INNER JOIN dbo.DataTypes DS ON DS.DataType = A.DataType
WHERE A.Attribute = CA.Attribute
AND ((IsNumericType = 1 AND ISNUMERIC(CAST(CA.AttributeValue AS NVARCHAR(2048))) = 1) OR IsNumericType = 0)
AND ((IsDateType = 1 AND ISDATE(CAST(CA. AttributeValue AS NVARCHAR(2048))) = 1) OR IsDateType = 0)
AND ((HasLike = 1 AND PATINDEX(LikeExpression, CAST(CA.AttributeValue AS NVARCHAR(2048))) = 0) OR HasLike = 0 )
AND ((HasDomain = 1 AND PATINDEX('%' + CAST(CA.AttributeValue AS NVARCHAR(2048)) + '|%' , DomainExpression)> 0)OR HasDomain = 0 )
OR (AllowNull = 1 AND CA.AttributeValue IS NULL)
)
WITH CHECK OPTIONAll INSERT and UPDATE operations against the ClientAttribute table should now be directed to this view.
- Trigger on the ClientAttribute table for INSERT/UPDATE
It’s essentially the same as the view only with branching and transaction management.
- CHECK constraint and a User Defined Function (UDF)
Again this is just a variation on the same query.
Testing the bastard
Some TSQL to load the Client, DataTypes and Attributes table
INSERT INTO dbo.DataTypes(DataType, IsNumericType, IsDateType, HasLike, LikeExpression, HasDomain, DomainExpression, AllowNull)
SELECT 'Positive Number', 1, 0, 1,'%[-]%',0,'',0
UNION ALL
SELECT 'Number', 1, 0, 0,'',0,'',0
UNION ALL
SELECT 'Bit', 0, 0, 0,'',1,'0|1|',0
UNION ALL
SELECT 'Date', 0, 1, 0,'',0,'',1
UNION ALL
SELECT 'Text', 0, 0, 0,'',0,'',0
UNION ALL
SELECT 'Text Only', 0, 0,1,'%[0-9]%', 0,'',0
UNION ALL
SELECT 'Handed', 0, 0, 0,'',1,'Left|Right|',0
UNION ALL
SELECT 'Integer', 1, 0, 1,'%[-.]%',0,'',0
GO
INSERT dbo.Attributes(Attribute, DataType, DefaultValue)
SELECT 'Age','Integer',0
UNION ALL
SELECT 'Weight','Positive Number',0
UNION ALL
SELECT 'QuitDate','Date',NULL
UNION ALL
SELECT 'Handed','Handed','Right'
UNION ALL
SELECT 'Has Insurance','Bit',0
GO
INSERT dbo.Client (ClientID, FirstName, LastName)
SELECT 10, 'Dave', 'Hill'
UNION ALL
SELECT 20, 'Simon', 'Rain'
UNION ALL
SELECT 30, 'Gary', 'Water'
UNION ALL
SELECT 40, 'Vince', 'Fire'
UNION ALL
SELECT 50, 'Jackie', 'Flame'
GO
Now the test...
--Good data example
INSERT dbo.ClientAttribute_CONSTRAINT (ClientID, Attribute, AttributeValue)
SELECT ClientID, 'Age', CAST (ClientID + 3 AS sql_variant)
FROM dbo.Client
UNION ALL
SELECT ClientID, 'Weight', ClientID *5 - 5
FROM dbo.Client
UNION ALL
SELECT ClientID, 'QuitDate', GETDATE() + ClientID
FROM dbo.Client
UNION ALL
SELECT ClientID, 'Has Insurance', ClientID/10%2
FROM dbo.Client
UNION ALL
SELECT ClientID, 'Handed', CASE ClientID/10%2 WHEN 1 THEN 'Right' ELSE 'Left' END
FROM dbo.Client
GO
--Clear out data
DELETE ClientAttribute
GO
--Bad data examples (integer)
INSERT dbo.ClientAttribute_CONSTRAINT (ClientID, Attribute, AttributeValue)
SELECT ClientID, 'Age', -2
FROM dbo.Client
GO
--Bad data examples (Date)
INSERT dbo.ClientAttribute_CONSTRAINT (ClientID, Attribute, AttributeValue)
SELECT ClientID, 'QuitDate', '20069999'
FROM dbo.Client
GO
--Bad data examples (Domain)
INSERT dbo.ClientAttribute_CONSTRAINT (ClientID, Attribute, AttributeValue)
SELECT ClientID, 'Handed', 'No Handed'
FROM dbo.Client
GO
You can do the rest of the testing your self...
State of play
Given the original fixed requirements, the complexity of an EAV implementation is ludicrous compared to the "Compacted" schema.
The amount of testing to validate the schema is very large and boring.
I don't know about you, but the ROI doesn't seem so good.
Querying
The madness
I'll be concentrating on simple restrictions (WHERE).
In the real world, the "Compacted" schema's attributes are known at design time, so the DBA would have known the querying context and constructed, tested and optimised pre-compiled code for the business requirements. But in the EAV world, there is none of that.
Performing queries on our EAV schema is very different from the "Compacted" schema.
For single attribute searches, the difference is not really noticeable..
Select all those with "Ages" greater than 20
--Compacted
SELECT *
FROM dbo.ClientNF
WHERE Age > 20
--EAV
SELECT *
FROM dbo.ClientAttribute
WHERE Attribute = 'Age' AND AttributeValue > 20
The difference becomes painfully obvious when more than 1 attribute is involved in a search request.
Select all those with "Ages" greater than 20 and are Right "Handed"--Compacted
SELECT *
FROM dbo.ClientNF
WHERE Age > 20 AND Handed = 'Right'
--EAV
SELECT CA.ClientID, CA.Attribute AS Age ,CX.AttributeValue AS Handed
FROM dbo.ClientAttribute CA
INNER JOIN ClientAttribute CX ON CX.ClientID = CA.ClientID
WHERE CA.Attribute = 'Age' AND CA.AttributeValue > 20
AND CX.Attribute = 'Handed'
Nasty isn't it? Although this only affects ANDed operations
Bad wisdom
To enable end users to construct queries like this raise serious practical issues that usually result in either application generated code or "dynamic SQL".
According to conventional wisdom, every attribute searched (using AND) requires its own self join.
This "wisdom" has come about mainly from a lack of imagination and understanding
Does anybody see anything strange about the "two attribute" search in the EAV?
What's always disturbed me about the "self-join" approach, is that the EAV query actually returns a "Compacted" type result set. Have at look at the result set of the EAV query. Notice how I have to name the columns.
You would assume that when you query a EAV structure, you get a EAV like result. Right?
EAV IN, EAV OUT
Given a "3 attribute" query...
Select all those with "Ages" greater than 20 are Right "Handed" and Have "Insurance"
So "When in Rome", let's turn our search request into a EAV representation
Attribute | AttributeValue | Operator |
Age | 20 | > |
Handed | Right | = |
HasInsurance | 1 | = |
As we all know and love, the relational model manages sets of data and has a variety of operators that can be used to manipulate them.
And guess what? The one required to do the job is the most poorly implemented and the hardest to construct.
A little division
It is relational division (DIVIDE).
If you don't know what it is, go look it up. Here is a good SQL example Division
It is basically the opposite of PRODUCT (CROSS JOIN).
Unfortunately most (all?) SQL DBMS have no equivalent operation, which can make it a difficult query to implement.
But it is more than worth it from several perspectives.
- Defined result set. The column list is known at design time.
- Minimal performance degradation. The EAV table is only ever referenced a maximum of twice. Regardless of the number of attributes searched.
- Database defined. No "dynamic SQL" or dodgy application code to worry about.
SQL kungfu
So given this weapon of EAV destruction, let's kick its arse...
Below is a stored procedure that is designed as an API for end user interaction and to provide a generic procedure for all RESTRICT based searches.
I needed a few "helper" routines, specifically, any routine that parses a delimited string.
I have used my RowParser function for handling large strings.
There are hundreds of functions for this, so change it to suit your style.
No matter what technique you use, the idea is to construct a search table like the one above. In the following code this is called @SearchTable.
CREATE PROC dbo.upClientAttribute_EAVSearch
(
@AttributeCSV TEXT,
@ValueCSV TEXT,
@OperatorCSV TEXT = NULL,
@Logic VARCHAR(3) = 'AND'
)
AS
/*
Input:
@AttributeCSV = CSV of attributes to search
@ValueCSV = CSV of attribute values to search
@OperatorCSV = CSV of operators to use. The default (NULL) is "Equals" (=)
@Logic = Perform "AND" or "OR" on multiple attribute searches
Flow:
Parse parameters and load into temporary tables
Load search table
Evaluate using Division: ClientAttrbiute DIVIDE @SearchTable
Returns:
Result set: ClientID, Attribute, AttributeValue
*/
BEGIN
SET nocount ON
--How many attributes to search
DECLARE @Counter INT
--Temp tables for parsing of parameters
DECLARE @AttributeTable TABLE (Position INT NOT NULL IDENTITY(1,1) PRIMARY KEY, Data VARCHAR(50) collate database_default)
DECLARE @ValueTable TABLE (Position INT NOT NULL IDENTITY(1,1) PRIMARY KEY, Data sql_variant )
DECLARE @OperatorTable TABLE (Position INT NOT NULL IDENTITY(1,1) PRIMARY KEY, Data VARCHAR(6) collate database_default)
--The divisor
DECLARE @SearchTable TABLE (Position INT NOT NULL UNIQUE , Attribute VARCHAR(50) collate database_default NOT NULL PRIMARY KEY
, AttributeValue sql_variant, Operator VARCHAR(6) NOT NULL DEFAULT ('='))
--Insert Attribute Parameters into temp table
INSERT @AttributeTable (Data)
SELECT LTRIM(Data)
FROM dbo.RowParser(@AttributeCSV,DEFAULT)
SET @Counter = @@rowcount
--Why bother?
IF @Counter = 0
BEGIN
--Exit early with empty set
SELECT ClientID, Attribute, AttributeValue FROM dbo.ClientAttribute WHERE 1=0
RETURN
END
--Insert Values Parameters into temp table
INSERT @ValueTable (Data)
SELECT LTRIM(Data)
FROM dbo.RowParser(@ValueCSV,DEFAULT)
--Insert Operators Parameters into temp table
INSERT @OperatorTable (Data)
SELECT LTRIM(Data)
FROM dbo.RowParser(@OperatorCSV,DEFAULT)
--Load Divisor SearchTable
INSERT @SearchTable (Position, Attribute,AttributeValue, Operator)
SELECT A.Position, A.Data, V.Data, ISNULL(O.Data, '=')
FROM @AttributeTable A
INNER JOIN @ValueTable V ON V.Position = A.Position
LEFT JOIN @OperatorTable O ON O.Position = A.Position
--ClientAttribute DIVIDE @SearchTable
--Branch to optimise single attribute and "OR" Logic searches
IF @Counter > 1 AND @Logic = 'AND'
SELECT x.ClientID, t.Attribute, cx.AttributeValue
FROM
(
SELECT ClientID
FROM @SearchTable T
INNER JOIN dbo.ClientAttribute CA ON T.Attribute = CA.Attribute
WHERE 1 = CASE Operator
WHEN '=' THEN CASE WHEN CAST(CA.AttributeValue AS NVARCHAR(4000)) = CAST(T.AttributeValue AS NVARCHAR(4000)) THEN 1 ELSE 0 END
WHEN '<=' THEN CASE WHEN CAST(CA.AttributeValue AS money) <= CAST(T.Attributevalue AS money) THEN 1 ELSE 0 END
WHEN '<' THEN CASE WHEN CAST(CA.AttributeValue AS money) < CAST(T.Attributevalue AS money) THEN 1 ELSE 0 END
WHEN '>' THEN CASE WHEN CAST(CA.AttributeValue AS money) > CAST(T.Attributevalue AS money) THEN 1 ELSE 0 END
WHEN '>=' THEN CASE WHEN CAST(CA.AttributeValue AS money) >= CAST(T.Attributevalue AS money) THEN 1 ELSE 0 END
WHEN 'like' THEN CASE WHEN CAST(CA.AttributeValue AS NVARCHAR(4000)) LIKE CAST(T.Attributevalue AS NVARCHAR(4000)) THEN 1 ELSE 0 END
WHEN 'ISNULL' THEN CASE WHEN CA.AttributeValue IS NULL THEN 1 ELSE 0 END
ELSE 0 END
GROUP BY ClientID
HAVING COUNT(*) = @Counter
) AS X
INNER JOIN dbo.ClientAttribute CX ON CX.ClientID = X.ClientID
INNER JOIN @SearchTable T ON T.Attribute = cx.Attribute
ELSE
SELECT CA.ClientID, CA.Attribute, CA.AttributeValue
FROM dbo.ClientAttribute CA
INNER JOIN @SearchTable T ON T.Attribute = CA.Attribute
WHERE 1 = CASE Operator
WHEN '=' THEN CASE WHEN CAST(CA.AttributeValue AS NVARCHAR(4000)) = CAST(T.AttributeValue AS NVARCHAR(4000)) THEN 1 ELSE 0 END
WHEN '<=' THEN CASE WHEN CAST(CA.AttributeValue AS money) <= CAST(T.Attributevalue AS money) THEN 1 ELSE 0 END
WHEN '<' THEN CASE WHEN CAST(CA.AttributeValue AS money) < CAST(T.Attributevalue AS money) THEN 1 ELSE 0 END
WHEN '>' THEN CASE WHEN CAST(CA.AttributeValue AS money) > CAST(T.Attributevalue AS money) THEN 1 ELSE 0 END
WHEN '>=' THEN CASE WHEN CAST(CA.AttributeValue AS money) >= CAST(T.Attributevalue AS money) THEN 1 ELSE 0 END
WHEN 'like' THEN CASE WHEN CAST(CA.AttributeValue AS NVARCHAR(4000)) LIKE CAST(T.Attributevalue AS NVARCHAR(4000)) THEN 1 ELSE 0 END
WHEN 'ISNULL' THEN CASE WHEN CA.AttributeValue IS NULL THEN 1 ELSE 0 END
ELSE 0 END
END
GO
Some basic examples...
--Single Attribute with equality
EXEC upClientAttribute_EAVSearch @AttributeCSV = 'Age', @ValueCSV = '20,', @OperatorCSV = DEFAULT, @Logic = DEFAULT
--Two Attribute with mixed operators
EXEC upClientAttribute_EAVSearch @AttributeCSV = 'Age, Handed,', @ValueCSV = '20,Right', @OperatorCSV = '>,=', @Logic = DEFAULT
--Three Attribute with mixed operators
EXEC upClientAttribute_EAVSearch @AttributeCSV = 'Age, Handed, Has Insurance', @ValueCSV = '20,Right, 1', @OperatorCSV = '>,=,=,', @Logic = DEFAULT
Conclusion
Lots and lots of code, large testing cycle, not very trust worthy and poorly performing.
I hope you found this guide useful and that it discourages you from using an EAV design.
Legacy Comments
Mladen
2006-09-15 |
re: Dave’s guide to the EAV nice article. however i've seen a bit more interesting way of implementing EAV model in SS2k5 with the use of XML column. in your case you'd put your ClientAttribute table into an XML type column in the Client table. you add a schema to the column so you can validate data input and check constraints. with XML indexing the XQuery searches are quite fast. |
Jon
2006-09-16 |
re: Dave’s guide to the EAV Very good article, your examples were very helpful. |
Danny Dubroc
2006-10-24 |
Order By Hi I was wandering if you could give an example of Order Varible Data Types in a Table. Thanks Danny D. |
matt
2006-10-26 |
re: Dave’s guide to the EAV dave, is there a way to email you? i wanted to ask you something about one of your (now closed) posts. thanks, matt. |
sitka
2006-10-26 |
re: Dave’s guide to the EAV If one wants to create the ultimate in flexibility why not just have a datatpe called "universe" which is the XML serialized object conatining all the known attributes of "everything" and put it in a single row containing a single xml datatype column. Isn't that the logical progression? Hope it isn't age that prevents me from seing the benefit of this. |
Alistair Bayley
2006-10-26 |
re: Dave’s guide to the EAV How do I contact you about improvements to the DataTable relational operators? |
Alistair Bayley
2006-10-27 |
re: Dave’s guide to the EAV just a test... |
jezemine
2006-10-29 |
re: Dave’s guide to the EAV A company I worked at several years ago used an EAV model (this was before I knew anything about databases). But they took it one step further and put EVERY attribute for EVERY relation in the SAME table!!! Not only that, for the attribute value the used not one sql_variant column, but N different columns, where N-1 were always NULL! so if they attribute was a bit, the VALUE_BIT column had a value, and all others were null. truly astonishing. Needless to say, this product did not go far. It fell down with the first customer that tried to use it. EAV models, like many "generic" solutions, aren't good for much. One-size-fits-all really means one-size-fits-none. |
EVA Rocks
2006-12-20 |
re: Dave’s guide to the EAV We use an EAV metadata constrained model. We have well over 6TB of data, and regularly query time series data with a 10yr horizon and can say, when done right, key word being “done right" EAV makes 3N look like a fossil. Of course EAV does not and cannot for everything nor is it novice or middling DBA friendly. But when applied properly, it stands head and shoulders above many typical or better understood models. Google what model medical, financial and very large GIS applications are built around... that’s right, EAV! |
Mike
2007-11-13 |
re: Dave’s guide to the EAV Dave I have to deal with EAV in a project I am working with. I just wanted to say that this query worked better than anything else I have tried. I have done a lot of research and for some reason nobody has ever mentioned this but you. Thank you for sharing this! |
findev
2008-03-09 |
re: Dave’s guide to the EAV Hi, We use a EAV in a financial services application. Its been working like a charm. The flexibility has allowed us to add a lot of features. I think one of the reasons people tend to shy away is because of querying complexity. But its not a big deal if you make views to flatten your table and then ask people to query against the view. If you have an index on the client attribute table as (attribute_value,client_id) then you actually get an index on all the columns of the view. The view might look something like this - create view client_v as select client_id ,max(case when attribute = 'Age' then attribue_value else null) as Age ,max(case when attribute = 'Handed' then attribute_value else null) as Handed .... from client c join client_attribute ca on c.client_id = ca.client_id group by c.clent_id Thx |
Gurel
2008-04-22 |
re: Dave’s guide to the EAV Hi Please read below article as well http://www.jmir.org/2003/4/e27 |
bf
2008-05-13 |
re: Dave’s guide to the EAV How does an EAV schema help or hinder good reporting practices? The company I'm working with is in the process of developing a huge EAV system. And we'll be converting all the reports to Crystal Reports. Not sure if we'll be using a BO Universe, stored procs, views or ?? for the reports to link to. Are there any pros/cons for the reporting team with this type of system? |
Qingsong Yao
2008-10-17 |
re: Dave’s guide to the EAV Hello, Guys Have you looked at SQL Server's sparse column + filtered index feature. I think it can perfect to make a EAV system. Please let me know if you want to get more infromation. Qingsong |
Bryan Stauffer
2008-12-04 |
re: Dave’s guide to the EAV This is an interesting article, as many have stated, but apparently none so far have actually tested. The inserts into Attributes fail in the example code due to the unions and SQL Server assuming the type of the third column of the unioned set to be an integer. There are two ways around this and I am opting for the shorter one and just breaking the insert into to steps; one with the DefaultValue(s) of 0 and NULL, the ther with 'Handed'. For anyone new to SQL or otherwise unfamiliar with working around this kind of issue, another way to get around this is as follows: declare @place_holder sql_variant INSERT dbo.Attributes(Attribute, DataType, DefaultValue) SELECT * FROM ( SELECT 'ExcludeMe' as PH1, 'ExcludeMe' as PH2, @place_holder as TH UNION ALL SELECT 'Age','Integer',0 UNION ALL SELECT 'Weight','Positive Number',0 UNION ALL SELECT 'QuitDate','Date',NULL UNION ALL SELECT 'Has Insurance','Bit',0 UNION ALL SELECT 'Handed','Handed','Right' ) AS temp WHERE PH1 <> 'ExcludeMe' GO That's it for now. Still working through the example. :) |
bobsmith
2009-05-13 |
re: Dave’s guide to the EAV The company I used to work for used a mix of an EAV and normal relational and the application as quite successfully. It was used as an all in one billing solution for the energy industry and last I heard for one of the larger retailers it was billing several hundred thousand customers with a total annual invoice amount in excess of 200 million. there were some performance issues occurring last I heard, but all it meant was a developer was needed to tune some tables/code. It took them over 3 years to develop the initial core appl though. |
Mark
2009-06-06 |
re: Dave’s guide to the EAV Thank you for writing down such a comprehensive guide. Like some of the commenters have stated above, when it comes down to querying the EAV data there is very little information to be found. I guess EAV isn't exactly made to report on ;) On one of my recent projects we solved this problem by making use of PIVOT statements, that we generated with help from the metadata. While this is quite a different approach than yours, it has proved flexible and easy-to-use so far. In case someone is interested, I have written an article about this technique; perhaps it may prove useful. http://blogs.infosupport.com/blogs/marks/archive/2009/06/01/handling-eav-data.aspx |
Brian
2009-12-31 |
re: Dave’s guide to the EAV EAV is a Bad Idea(tm) and there is no justification for it. The irony is that the uneducated out there who believe they have uncovered new wisdom by the use of "entity" and "attribute" tables have discovered nothing more than the genius of the information principle underlying the relational model itself. It is the bone-headed domain equivalent of reducing some useful proposition such as "E=MC^2" to "1=1" and praising the delightful elegant genercism of this new form. SQL may be a dodgy implementation, but it's a lot more performant and flexible than the semantic and constraint free muddled meta data driven dynamic SQL pivoting catastrophes that EAV unavoidably devolves into. Having roughly agreed with you (I think) however, none of your "possible three designs" are sensible and I think this invalidates any arguments ensuing from the comparison. The realtional model assumes a finite list of known variables. I think any attempt to design outside of the constraints of the underlying model are foolhardy. We're building a skyscraper, from jellow. In my experience (which is around 15 years of repeatedly seeing this kind of nonsense and imploring those around me to stop the insanity) the reasons behind EAV are nothing to do with the profileration or volatility of attributes. They are more similar to the reasons behind "agile" methodology. Which seems to be driven by an absurd, ubiquitous and suicidal desire to defer any understanding of the problem until after the solution is delivered. Stick to the principles of normalisation and orthogonality, understand your problem space, the context of the solution and importantly the nature and content of your data and you will never regret it. |
Hans Z
2010-03-13 |
re: Dave’s guide to the EAV Dave, Dave, Dave... A lot (most? all?) of the problems you point out have really little to do with EAV as such, but more to do with your insistence on building your EAV database on top of a RDBMS and using SQL as your query language. If you ditch that constraint and use another underlying storage/query mechanism you'll see life gets a lot easier. Querying large sets of related tuples effectively is not really a hard problem as such, it just becomes hard when you try to do it in SQL. In fact, most of the problems you outline are problems related to SQL's built in limitations. Seems that many real live applications have outgrown the relational paradigm and clunky old SQL. Ask Google, Facebook, Amazon, Microsoft and others... ;-) |
Joe Celko
2010-09-09 |
re: Dave’s guide to the EAV I wish I had this article to put into in one of my books. Right now i am working at a company that is suffering from EAV design. They are now hitting the data integrity and performance problems. If you don't have to have it fast, or right, then you can use EAV |
DJO
2011-05-19 |
re: Dave’s guide to the EAV You know ALTER TABLE ADD (MY_NEW_COLUMN <datatype>); just ain't that hard. |