Jeff Smith Blog

Random Thoughts & Cartesian Products with Microsoft SQL Server

The MailBag — Super-Sized Edition! String Parsing, Crosstabs, SQL Injection, and more.

OK, boys and girls, it's time for the mailbag!  There's lots of stuff to cover, so let's get to it!



Greg E
writes:

Hello Jeff,

I just found your blog and wanted to know if you could point me in the right direction or possibly toss me a solution.

I am looking at a badly formed telelphone number column in a MS SQL Server db. Entries contain '(555) 555-1212' or '555.555.1212, etc. Do you know how I would go about stripping out unwanted characters from the telephone number?

Thanks for the brain cycles.

Greg -- A simple UDF should do the trick for you.  For example, something like this:

create function NumbersOnly(@txt varchar(1000))
returns varchar(100)
as
begin
    declare @i int
    declare @ret varchar(100)
  
    select @i = 1, @ret = ''

    while (@i <= len(@txt))
        select @ret = @ret + case when substring(@txt,@i,1) like '[0-9]'
                                  then substring(@txt,@i,1) else ''
                             end,
               @i = @i + 1

    return @ret
end

With that, you can write something like this:

select ID, dbo.NumbersOnly(PhoneColumn) as PhoneNumbersOnly
from YourTable

Over at SQLTeam's script library forum, there is a thread with a bunch of parsing functions that you may find useful if your needs are more complex.

And, in case you missed it, be sure to read this post.

---

In response to my blog post on passing arrays to stored procedures, Juan writes:

I know is not the right solution, but I have to say it for the sake of completeness of the discussion: if the amount of items in your "array parameter" is limited (say, for example 5 or 10 items), you can always use optional parameters (i.e. assign them to null when declaring them in the SP), then insert them in a temp table or do whatever you want with them, without using dynamic,nor xml, nor string manipulation.

Great point, something I missed in my article entirely.  Sometimes, it may make sense to declare @Val1, @Val2, ... @ValN parameters if there aren't too many and there's a clearly defined limit.  Thanks for bringing that up, Juan.  The simplest solution is usually the best, and in some cases that's probably all you need.  You still have clearly defined parameters with strong typing and no parsing, and those are the main issues with CSV parameters that I wanted to avoid.

---

Marc writes:

We have three tables.  They all share the same "type" of primary key: let's say ActivityCode.  I need to pull data using an ActivityCode, but there is a catch.  If table 1 has the data, I want to use it.  If table 2 has the data and Table 1 does not I want to use Table 2.  If table 3 has the data and Table 1 and Table 2 does not, I want to use that.  The ActivityCode can be found in both Table 1 and Table 2.  Once I determine which table i am using I will need to do several other inner and/or outer joins with other tables.  I am using JDBC.  I want to be able to do this using a single SQL statement, but I am willing to use multiple statements if it makes more sense.  I just need to keep it to a single transaction under JDBC.

Marc -- I think what you are looking for is described here.  The key is to OUTER JOIN to all of your tables, and then use a CASE expression to determine which of those joined tables has the data you need.

--

Mary writes:

I struggled with a thorny SQL problem all day yesterday and found your post on set based thinking very helpful.  I needed to write an update query that updated a table with many records with the same key from a table with the key and the corresponding new value.  The table with new values didn't exist - I had to derive it from a different table showing the key, new value and date (the new value changed over time.)

Your observations that one needs to break the problem down into its simplest components helped me realize something else.

I made the classic rookie error of grabbing some code that did a similar type of update and try to hack it into my solution.  When I finally realized I was going in the wrong direction (because my solution was getting messier and messier), I went back to the beginning.

I defined the problem in its simplest terms and learned I could do a simple "update  A set A.value = B.value from A join B on B.key = A.key" .

I didn't realize I could update from but once the problem was simply defined a quick question to one of our senior engineers resulted in a quick answer leading to an elegant solution.  The whole thing was completed in less than half an hour.

The moral of the story:  Define the problem first!  Don't even think about syntax until you have written a clear, concise spec from the problem just defined.  Then if you find yourself spending an inordinate amount of time and/or the solution seems too messy or seems to run too long - google or talk to your colleagues.

Thanks for a great blog; your post made me realize it's more about how we think than throwing code at the problem - the code should be the last thing!

Thanks, Mary!  I'm glad I would be of assistance.  The "moral" that you wrote says it all. 90% of programming isn't writing code at all, it is simply defining what your code will do -- and that's always the hardest part! 

---

In response to my post on grouping by month, Mark writes:

I'm so close! I've tried all the things in this article, but can't seem to do what I want to do. I've been tearing my hair out for days! Here's what I'm trying to do.

Basically I need a sql procedure that looks at an invoicing table that totals amounts by month/year and quarter at the same time. Here's how my table looks:

Project ID  Date      Amount
1                3/11/08    10.00
1                4/18/08    10.00
1                6/22/08    10.00
2                3/01/08    10.00
2                9/15/08    10.00

I would like the output to have dynamic columns, so an output may look like:
Project ID  Jan'08  Feb'08  Mar'08  Q1'08  Apr'08  May'08  Jun'08  Q2'08  Jul'08  Aug'08  Sep'08  Q3'08
1              0.00    0.00      10.00    10.00  10.00    0.00      10.00  20.00  0.00    0.00      0.00      0.00
2              0.00    0.00      10.00    10.00  0.00      0.00      0.00    0.00  0.00    0.00      10.00    10.00

I would like the query to know if there was no value in Jan &amp; Feb'08, but still list all the months in Q1.

I'm not opposed to using a calendar table, but would like to try to avoid it if possible.

Any help would be greatly appreciated!

Hi Mark -- First off, never be afraid to use a calendar table!  There is nothing hacky or unusual or tricky about them, they can make your life much easier, your code much shorter, and everything much more efficient.  If grouping by month or some other time period is important to your reporting,  then defining those months in a permanent, nicely indexed table makes perfect sense.

In this case, though, since you are outputting one column per month for a single year, I recommend to simply use CASE expressions to "cross tab" your data.  You can alias your columns as M1,M2,M3...M12 and Q1-Q4 so that no matter what year you are running the report for, your columns will be consistently named, and you can let your presentation layer handle outputting nice column headers with the current year/month for each one.

So, all you really need is something like this:

select projectID, Y as [Year],
  sum(case when m=1 then amount else 0 end) as M1,
  sum(case when m=2 then amount else 0 end) as M2,
  sum(case when m=3 then amount else 0 end) as M3,
  sum(case when m in (1,2,3) then amount else 0 end) as Q1,
  ...
  sum(case when m=12 then amount else 0 end) as M12,
  sum(amount) as Total
from
(
  select projectID, Amount, DatePart(Month, [Date]) as M, DatePart(Year, [Date]) as Y
  from YourTable
  where [Date] >= @StartDate and Date < @EndDate
) x

Of course, you'd define @StartDate and @endDate as '01-01-2008' and '01-01-2009', respectively.

---

In this comment, Stewy writes:

I have an issue with both DISTINCT and GROUP BY.

The issue is that using either one, the results comes back ordered as if using order by.

I need the unique results in the order they are in the database. How can I do this? Thanks

Stewy -- Relational databases have no obligation to store data in any specific order, or to keep track of the order that things were entered, or to return things "as they are in the database."  There is no such thing as getting data out "the way it is stored" because a relational database may move or re-order data temporarily to efficiently execute a query depending on indexes available.   You must always explicitly specify how you want your results using an ORDER BY clause.  If you want to keep track of the order that you added data to a table, you should have a "timestamp" column that records the exact moment each row was added via a DEFAULT value or a trigger.  Or, at the very least, you can use an IDENTITY.  Then, you can simply order by that column.  This is a very important concept to understand when working with relational databases.  Things are returned based on the data itself, not based on physical storage characteristics.  I hope this helps.

---

In response to Always Use Parameters, Karuna writes:

Hi Jeff,
Just wondering if I build the Sql in Stored Procedure (Dynamic Sql) based on the parameters passed to stored proc, will it still be a possible candidate for Sql Injection? Basically I want to build the Sql in the stored procedure instead of doing it in .Net code as displayed in the article.

Dim cm As New SqlCommand("", YourConnection)
cm.CommandText = "DELETE FROM YourTable WHERE ID=@ID "
cm.Parameters.Add("@ID", SqlDbType.Int).Value = ID

If Name <> "" Then
cm.CommandText &= " And Name=@name"
cm.Parameters.Add("@Name", SqlDbType.VarChar).Value = Name
End If
If TranDate <> DateTime.MinValue Then
cm.CommandText &= " And TranDate = @TranDate"
cm.Parameters.Add("@TranDate", SqlDbType.DateTime).Value = TranDate

Hi Karuna -- you are absolutely 100% safe from SQL Injection by doing this.  Remember, SQL Injection is not about genereal SQL concatenation or about building a SQL statement dynamically.  It only can happen when you concatenate user input into a SQL string and execute it.  If you put together a big SQL statement via concatenation but you only incorporate user input via parameters, there's no need for scrubbing data or worrying in any way about SQL Injection -- it will never happen, under any circumstance. 

Avoiding SQL Injection is the easiest thing in the world -- simply do things the easy and correct way and you'll never need to worry about it.   It's like if there was a big controversy in the news about thousands of people crashing their cars because they are driving them with decorative tin foil covering their windshields, and asking the experts "how can we solve this crisis?"   Should we cut holes in the tin foil, or add mirrors, or incorporate a camera and a tv monitor?  Uh .. no.  You should just take the tin foil off of your windshield and do things the easy, simple and correct way and don't make things over complicated.   That's basically what this whole SQL Injection thing is about -- bad programmers doing stupid things when all they need to do is write decent code the easy way -- simply by using parameters.

---

Gocs writes:

I have tried to compute the number of hours based on the datetime in MS SQL 2005.  However, I am not sure the hours is correct.  Do you have any idea on how to do it correctly?


Gocs -- I think you really need to read this very carefully.  I'll be waiting!