Criminals and Other Illegal Characters

Wed Jun 19, 2013 by Most Valuable Yak (Rob Volk) in coolness, hotness

SQLTeam's favorite Slovenian blogger Mladen (b | t) had an interesting question on Twitter: http://www.twitter.com/MladenPrajdic/status/347057950470307841

I liked Kendal Van Dyke's (b | t) reply: http://twitter.com/SQLDBA/status/347058908801667072

And he was right! This is one of those pretty-useless-but-sounds-interesting propositions that I've based all my presentations on, and most of my blog posts.

If you read all the replies you'll see a lot of good suggestions. I particularly like Aaron Bertrand's (b | t) idea of going into the Unicode character set, since there are over 65,000 characters available. But how to find an illegal character? Detective work?

I'm working on the premise that if SQL Server will reject it as a name it would throw an error. So all we have to do is generate all Unicode characters, rename a database with that character, and catch any errors.

It turns out that dynamic SQL can lend a hand here:

IF DB_ID(N'a') IS NULL CREATE DATABASE [a];
DECLARE @c INT=1, @sql NVARCHAR(MAX)=N'', @err NVARCHAR(MAX)=N'';
WHILE @c<65536 BEGIN
BEGIN TRY
SET @sql=N'alter database ' +
QUOTENAME(CASE WHEN @c=1 THEN N'a' ELSE NCHAR(@c-1) END) +
N' modify name=' + QUOTENAME(NCHAR(@c));
RAISERROR(N'*** Trying %d',10,1,@c) WITH NOWAIT;
EXEC(@sql);
SET @c+=1;
END TRY
BEGIN CATCH
SET @err=ERROR_MESSAGE();
RAISERROR(N'Ooops - %d - %s',10,1,@c,@err) WITH NOWAIT;
BREAK;
END CATCH
END
SET @sql=N'alter database ' + QUOTENAME(NCHAR(@c-1)) + N' modify name=[a]';
EXEC(@sql);

The script creates a dummy database "a" if it doesn't already exist, and only tests single characters as a database name. If you have databases with single character names then you shouldn't run this on that server.

It takes a few minutes to run, but if you do you'll see that no errors are thrown for any of the characters. It seems that SQL Server will accept any character, no matter where they're from. (Well, there's one, but I won't tell you which. Actually there's 2, but one of them requires some deep existential thinking.)

The output is also interesting, as quite a few codes do some weird things there. I'm pretty sure it's due to the font used in SSMS for the messages output window, not all characters are available. If you run it using the SQLCMD utility, and use the -o switch to output to a file, and -u for Unicode output, you can open the file in Notepad or another text editor and see the whole thing.

I'm not sure what character I'd recommend to answer Mladen's question. I think the standard tab (ASCII 9) is fine. There's also several specific separator characters in the original ASCII character set (decimal 28-31).

But of all the choices available in Unicode whitespace, I think my favorite would be the Mongolian Vowel Separator. Or maybe the zero-width space. (that'll be fun to print!) And since this is Mladen we're talking about, here's a good selection of "intriguing" characters he could use.