Jeff's SQL Server Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server
posts - 157, comments - 2686, trackbacks - 64

My Links

Advertisement

News

Welcome to my weblog. My name is Jeff Smith, I am software developer in Boston, MA and I was recently named a 2009 SQL Server MVP. Check in frequently for tips, tricks, commentary and ideas on SQL Server and .NET programming.


Subscribe





Archives

Post Categories

Programming

Sports

SQL

SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

It's been a while, so let's open up the old mailbag!

Jeremy writes:

Can you spare a few minutes to show me a possible way to restrict a column in a table using a check constraint to only allow characters in a given range of the ASCII table?

I'm interested in filtering ASCII (numbers 0–31 decimal). Thanks!

Well, I don't completely understand what you mean by that second part (feel free to clarify in the comments), but perhaps I can help you out anyway.   When writing constraints on a column, we often need to enforce things like "only allow letters a-z" and/or "only allow digits 0-9".  It is usually easier, shorter and more correct to specify what we will allow, rather than what we will not allow, since it can be hard to anticipate (and type!) all of the characters that we wish to filter out.

Writing a CHECK constraint for that can be a little tricky, however. If we want to allow only letters A-Z, then this:

CHECK Column LIKE '%[A-Z]%'

doesn't quite work, because that check will succeed if the column contains at least one valid character, not only valid characters.  However, let's try to approach this from a different angle.  The opposite of the constraint that we want to ultimately have would be one that only allows strings with at least one invalid character, right?   Well, we can write that fairly easily by altering our like pattern like this:

CHECK Column LIKE '%[^A-Z]%'

The ^ symbol before A-Z means "not within the range of A-Z", so the condition above is only true if there are any characters outside of the range A-Z in the string.  Since it is now the opposite of what we want, we can simply switch negate our expression by changing the LIKE to NOT LIKE and we have the constraint we need:

CHECK Column NOT LIKE '%[^A-Z]%'

The double-negative there can be confusing, but what we are saying is "allow data if there are NOT any characters that are NOT in the range of A-Z in the column."  Which, is exactly what we want when we only want to allow the letters A-Z, just expressed rather backwards.

So, when you need to write a constraint similar to this, sometimes you need to be like George Costanza and do the opposite of what you were thinking.  It just might be exactly what you need.

Update  (This is in the comments, but it may be useful to others) :

We know that LIKE '[A-Z]' allows all letters, and LIKE '[0-9]' allows numbers.  It also turns out that

LIKE '[ -~]'   (note the space before the - symbol)

allows all visible symbols in the ASCII character set between the space and the tilde, excluding all alphanumerics.  So, to write a CHECK constraint that allows only the printable characters of the ASCII set, we can write:

CHECK column NOT LIKE '%[^ -~A-Z0-9]%'

Using the "opposite" logic explained above.

Print | posted on Thursday, April 05, 2007 2:26 PM | Filed Under [ T-SQL Database Design ]

Feedback

Gravatar

# re: Check Constraints: Like George Costanza, Sometimes you need to do the Opposite!

There are 95 printable ASCII characters, numbered 32 to 126. (http://en.wikipedia.org/wiki/Ascii)

Given that I want to include the range Char(32) - the space up to Char(126), the tilde.
Shouldn't one be able to write a constraint like this ?

Check Column Not Like '%[ -~]%'

I want to include characters in the set [ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~], and nothing else.
4/5/2007 5:12 PM | Jeremy
Gravatar

# re: Check Constraints: Like George Costanza, Sometimes you need to do the Opposite!

Jeremy -- you've got to read the article. you want the OPPOSITE of what you are trying to do.

note that LIKE ' -~' allows only symbols, not alphanumerics.

so that means

LIKE ' -~a-z0-9' allows the characters in the range you are looking for. How can we be sure? Test it!

select number, char(number)
from spt_values
where type='p' and number between 1 and 127
and char(number) like '[ -~a-z0-9]'

Notice that returns numbers from 32-126, which is what we want. So, as the article says, reverse that and see what it returns:

select number, char(number)
from spt_values
where type='p' and number between 1 and 127
and char(number) like '[^ -~a-z0-9]'

that returns the non-printable chars we want to filter. So, therefore, our check constraint becomes:

CHECK column not like '%[^ -~a-z0-9]%'

and that's it. Also note that you should check characters 128-255 to see what happens with them as well, since there are other characters that are printable in that range as well.

Two key points to come away with from this:

1) how to set up and test things so that you know for sure.

and

2) Read carefully when someone gives you instructions
4/6/2007 8:29 AM | Jeff
Gravatar

# re: Check Constraints: Like George Costanza, Sometimes you need to do the Opposite!

Jeff,

You're amazing. I have much to learn in my sql-fu from a master such as yourself. The wisdom of experience is amazing.

I was confused with the way SQL handles range inputs to the the LIKE comparison. I was referencing the topic titled "Pattern Matching in Search Conditions". I don't think the SBOL topic on ranges explained that when you express a range, it evaluates what type of character it is to build that set, as well as that you can union those sets together.

It's somewhat counter-intuitive to me to express a range as (symbol characters in the range 32-26) union all (alphabetical characters, regardless of case in the range 32-126) union all (numerical characters in the range 32-126) instead of (all characters in the range of 32-126)


Knowing how to test using the master..spt_values is great. Thanks for being gentle in your explination. Can you recommend any books to help me on my journey towards enlightenment?


4/6/2007 10:38 AM | Jeremy
Gravatar

# re: Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

I also could not find much documentation on how LIKE handles symbols, that's why I put together the test. If you cannot find it is in the docs, sometimes you have to write a "proof of concept" yourself. That's how I discovered that " -~" only returns all symbols.

as for being "counter intuitive" to break it into 3 separate parts -- maybe, but it makes sense to me,and that's the way the LIKE operator works, so that's what we do. As long as it works. The whole point of my article was that sometimes you need to look at a problem from the opposite of the way you were originally thinking about it (thus the "reverse" constraint), which by definition is "counter intuitive" I suppose.

As for books, I recommend ANY book in beginning SQL server if you are just starting out; it is so important to understand the basic architecture and concepts. If you are familiar with all that, then just practice! And do things like what I did to test things when you are not sure.
4/6/2007 2:04 PM | Jeff
Gravatar

# re: Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

I think what's counterintuitive isn't that you have to group things together, it's the fact that the LIKE operator doesn't just deal with ASCII ranges. "[ -~]" looks to me like "everything from ASCII 32 to ASCII 126" -- the fact that it gives you all the symbols in that range is... "creative."

This ( creative | extra intelligent | rather bizarre ) behavior is probably linked to collation, code page, sort order... all that stuff. Some of it is definitely useful. When you ask for [A-Z] and get all the accented A's to boot, that's probably a good thing. However, in this case, where you're dealing in ASCII ranges, the results may be undesirable. Sometimes its real quirky. When I tweaked the query above:

select number, char(number)
from spt_values
where type='p' and number between 1 and 127
and char(number) like '[^ -~a-z0-9]'

to check the high ASCII characters:

select number, char(number)
from master.dbo.spt_values
where type = 'P ' and number between 0 and 255
and char(number) like '[ -~a-z0-9]'

I got the high ASCII "letters" and symbols, as well. Then, extra-quirky, I inserted an ASCII 127 (using the old Alt-NumPad trick) after the "0-9", to see if I could get ASCII 128 to show up. No, but ASCII 166 does!

I perceive no obvious relationship between those two that would put them in the same set.

Meanwhile, this query:

select number, char(number)
from master.dbo.spt_values
where type = 'P ' and number between 0 and 255
and char(number) like '[^' + char(0) + '-' + char(31) + ']'

stuffing the low ASCII characters in the LIKE string, returns everything from ASCII 32 to ASCII 255 (which was maybe what was originally sought). But stuffing the character ASCII values in there doesn't get around collation tricks. If I put in 65 and 90 (the A to Z range), it still takes out the lower case version and the high ASCII foreign-type versions. If you wanted to restrict some column of codes to, for instance, only allow upper-case H through R, you couldn't do it with LIKE logic.

Thanks for bringing this up; it was interesting.

4/10/2007 3:49 PM | Chris Hofland
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

I was tinkering with System.Char in the .Net Framework to find out how the "brains" at Microsoft think about this kind of thing.
I'm not sure if the SQL team had any input into this, or uses some derrivitive internally in the "LIKE" comparison. I'm fairly sure it's an agreed on standard somewhere, and it does in fact play a part in what we're discussing.

static void Main(string[] args)
{
const char sepchar = ',';
for (int i = 0; i < 256; i++)
{
StringBuilder b = new StringBuilder();
b.Append(i); b.Append(sepchar);
b.Append(Char.IsControl(c)); b.Append(sepchar);
b.Append(Char.IsDigit(c)); b.Append(sepchar);
b.Append(Char.IsHighSurrogate(c)); b.Append(sepchar);
b.Append(Char.IsLetter(c)); b.Append(sepchar);
b.Append(Char.IsLetterOrDigit(c)); b.Append(sepchar);
b.Append(Char.IsLower(c)); b.Append(sepchar);
b.Append(Char.IsLowSurrogate(c)); b.Append(sepchar);
b.Append(Char.IsNumber(c)); b.Append(sepchar);
b.Append(Char.IsPunctuation(c)); b.Append(sepchar);
b.Append(Char.IsSeparator(c)); b.Append(sepchar);
b.Append(Char.IsSurrogate(c)); b.Append(sepchar);
b.Append(Char.IsSymbol(c)); b.Append(sepchar);
b.Append(Char.IsUpper(c)); b.Append(sepchar);
b.Append(Char.IsWhiteSpace(c));
Console.WriteLine(b.ToString());
}
}

Also for kicks, I did it with Reflection, because writing more code to do a simple test is always cool, right?
http://www.jeremysimmons.net/blog/2007/04/09/ItsNoLongerHipToBeSquare.aspx

Thanks again Jeff. My database no longer can accept non-printing characters.
4/18/2007 11:15 AM | Jeremy
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

I'm wondering if similar tactics can be used to keep invalid numbers from being entered into a table. Currently I have a program/query that transfers data from one table to another, but I've found that in the first table it resides in it can occasionally have the value -1.#IND which I have since learned is the Microsoft code for an undefined value. Any queries with restraints on this field (ie val>=0) throw an error/exception.

Do you know if a CHECK constraint where I only accept positive numbers would have the same effect (all of the values in this field should be positive anyways)? If it will make the query throw the same error then it won't do me any good, but if it will only create entries for the actual number cases, then it will be exactly what I need.

Thanks!
6/29/2007 12:01 PM | Marc
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

Are you using SQL Server? I've never heard of -1.#IND -- are you sure it is not just a NULL value being presented that way somewhere?

Either way, what is the datatype on this column? If you use true numeric datatype, you won't have any problems at all. If you are using VARCHAR or something like that, then you have a really bad database design and should fix this immediately. I suspect that this is the case if you see values like "-1.#IND" in your data. The #1 rule of data modeling is to *always* use correct datatypes on your tables -- this solves 99% of all issues and doesn't require a single line of code anywhere!

Finally, once your data type is correctly configured, everything will run fine without errors, and you can simply add a CHECK constraint if you like to ensure that only positive values are in this column.
6/29/2007 12:09 PM | Jeff
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

There is a bug post for Microsoft in regards to the accented characters appearing in the A-Z match.

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=289689
7/31/2007 8:22 AM | Tito J.
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

Disregard my last post, it was not related to this blog.
7/31/2007 8:44 AM | Tito J.
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

I have a column which has non printable ascii. I want to see what ascii characters are in there. Is there any function in sql which i can use in a sql query to get the ascii characters?. Something like select function(field name) from table name.
8/29/2007 11:30 AM | kumar
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

if i want to check wheather it check constraints allow only upper case then can you explain
10/8/2007 10:55 PM | srikanth
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

I have new issue related to this.I am having A table with so many Columns and in that several verchar columns are not in readable format.I want to replace that with "XXXXX" or something while selecting.I don't know which column will contains these junks.

Please Help me .....



11/2/2007 8:15 AM | Timson
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

I am quite confused with this check constraints for string? how do I have a check constraints for a word in my data base..the issue is the user is only allowed to have "Record", "Cassette", or "CD" in his/her album type column...Anybody knows how to do this???!.....ANYBODY!!

3/9/2008 8:55 PM | Mary
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

GOOD DAY SIR I WOULD LIKE TO SET CONSTARAINTS FOR AN IDNUM LOOKING LIKE THIS A0001 - Z9999 COULD YOU PLEASE SHED SOME LIGHT WHATS THE BEST APPROACH
7/18/2009 9:53 PM | S EDWARDS
Gravatar

# re: SQL Check Constraints: Like George Costanza, sometimes you need to do the Opposite!

Hi,

very often people want to check not only for special chars but also for a special format.
For instance: "ModuleNumber" should have the format XX-Y-ZZ and lets say chars from a-z.

I would express it that way:

ALTER TABLE Module ADD CONSTRAINT ModuleNumber CHECK(ModuleNumber = '[A-Z]{2}-[A-Z]-[A-Z]{2}');

Please Jeff, correct me if I did something wrong.
Greetz from Germany...
9/27/2009 6:47 AM | Fabian
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET