Most Valuable Yak (Rob Volk) Blog

…and other neat SQL Server tricks

T-SQL Tuesday #21 - Crap!

Adam Machanic's (blog | twitter) ever popular T-SQL Tuesday series is being held on Wednesday this time, and the topic is…

SHIT CRAP.

No, not fecal material.  But crap code.  Crap SQL.  Crap ideas that you thought were good at the time, or were forced to do due (doo-doo?) to lack of time.

The challenge for me is to look back on my SQL Server career and find something that WASN'T crap.  Well, there's a lot that wasn't, but for some reason I don't remember those that well.  So the additional challenge is to pick one particular turd that I really wish I hadn't squeezed out.  Let's see if this outline fits the bill:

  • An ETL process on text files;
  • That had to interface between SQL Server and an AS/400 system;
  • That didn't use SSIS (should have) or BizTalk (ummm, no) but command-line scripting, using Unix utilities(!) via:
  • xp_cmdshell;
  • That had to email reports and financial data, some of it sensitive

Yep, the stench smell is coming back to me now, as if it was yesterday…

As to why SSIS and BizTalk were not options, basically I didn't know either of them well enough to get the job done (and I still don't).  I also had a strict deadline of 3 days, in addition to all the other responsibilities I had, so no time to learn them.  And seeing how screwed up the rest of the process was:

  • Payment files from multiple vendors in multiple formats;
  • Sent via FTP, PGP encrypted email, or some other wizardry;
  • Manually opened/downloaded and saved to a particular set of folders (couldn't change this);
  • Once processed, had to be placed BACK in the same folders with the original archived;
  • x2 divisions that had to run separately;
  • Plus an additional vendor file in another format on a completely different schedule;
  • So that they could be MANUALLY uploaded into the AS/400 system (couldn't change this either, even if it was technically possible)

I didn't feel so bad about the solution I came up with, which was naturally:

  • Copy the payment files to the local SQL Server drives, using xp_cmdshell
  • Run batch files (via xp_cmdshell) to parse the different formats using sed, a Unix utility (this was before Powershell)
  • Use other Unix utilities (join, split, grep, wc) to process parsed files and generate metadata (size, date, checksum, line count)
  • Run sqlcmd to execute a stored procedure that passed the parsed file names so it would bulk load the data to do a comparison
  • bcp the compared data out to ANOTHER text file so that I could grep that data out of the original file
  • Run another stored procedure to import the matched data into SQL Server so it could process the payments, including file metadata
  • Process payment batches and log which division and vendor they belong to
  • Email the payment details to the finance group (since it was too hard for them to run a web report with the same data…which they ran anyway to compare the emailed file against…which always matched, surprisingly)
  • Email another report showing unmatched payments so they could manually void them…about 3 months afterward
  • All in "Excel" format, using xp_sendmail (SQL 2000 system)
  • Copy the unmatched data back to the original folder locations, making sure to match the file format exactly (if you've ever worked with ACH files, you'll understand why this sucked)

If you're one of the 10 people who have read my blog before, you know that I love the DOS "for" command.  Like passionately.  Like fairy-tale love.  So my batch files were riddled with for loops, nested within other for loops, that called other batch files containing for loops.  I think there was one section that had 4 or 5 nested for commands.  It was wrong, disturbed, and completely un-maintainable by anyone, even myself.  Months, even a year, after I left the company I got calls from someone who had to make a minor change to it, and they called me to talk them out of spraying the office with an AK-47 after looking at this code(for you Star Trek TOS fans)

The funniest part of this, well, one of the funniest, is that I made the deadline…sort of, I was only a day late…and the DAMN THING WORKED practically unchanged for 3 years.  Most of the problems came from the manual parts of the overall process, like forgetting to decrypt the files, or missing/late files, or saved to the wrong folders.  I'm definitely not trying to toot my own horn here, because this was truly one of the dumbest, crappiest solutions I ever came up with.  Fortunately as far as I know it's no longer in use and someone has written a proper replacement.  Today I would knuckle down and do it in SSIS or Powershell, even if it took me weeks to get it right.

The real lesson from this crap code is to make things MAINTAINABLE and UNDERSTANDABLE.  sed scripting regular expressions doesn't fit that criteria in any way.  If you ever find yourself under pressure to do something fast at all costs, DON'T DO IT.  Stop and consider long-term maintainability, not just for yourself but for others on your team.  If you can't explain the basic approach in under 5 minutes, it ultimately won't succeed. 

And while you may love to leave all that crap behind, it may follow you anyway, and you'll step in it again.

 

P.S. - if you're wondering about all the manual stuff that couldn't be changed, it was because the entire process had gone through Six Sigma, and was deemed the best possible way.  Phew!  Talk about stink!