The SQL Server 2005 Paradigm Shift
- By Stephen Swoyer
- May 18, 2005
Skeptics who question Microsoft’s decision to integrate its .NET CLR with SQL Server 2005 tend to use the language of doomsayers such as Jeremiah or Cassandra to make their points. They believe in-process CLR is an insanely bad idea that could jeopardize the scalability and reliability of SQL Server.
It’s easy to dismiss such talk, but it wouldn’t hurt to get a sense for what these folks are up in arms about.
At bottom, skeptics say, the issue is: .NET languages such as C# and Visual Basic .NET are based on a very different paradigm than SQL Server’s Transact-SQL. The CLR languages are procedural in nature, while T-SQL is more of a set-based language. What works for one doesn’t necessarily work for the other, and while it’s possible to use set-based languages to accomplish tasks in the non-relational world, it often isn’t advisable.
The same goes for using procedural languages in relational space.
“In procedural coding, you break open your piggy bank and count the pennies
one at a time,” says SQL consultant Joe Celko, author of Joe Celko’s SQL Programming Style. “In set-oriented coding, you break open your piggy bank and
weigh the pennies as a whole.”
Graeme Malcolm, a SQL Server professional with content management specialist Content Master, offers an example. Let’s assume a user needs a quick way to solve a common business problem—identifying which customers have placed orders in the last two months. Malcolm suggests we think of this problem in terms of a Venn diagram, where we’ve got one circle containing customers and the other containing orders.
“The area where the circles overlap represents the customers who have placed orders in the past two months. Essentially, the Venn diagram is just a visual representation of a set-based operation; specifically the intersection of a set of customers and a set of orders,” he says.
This is the kind of thing T-SQL can do in its sleep, says Malcolm. “To do the same task ‘purely’ in a procedural language, you'd have to: get the first customer; find the first order made by the customer; check the date; if the date is in the past two months, add the customer to the results; otherwise find the next order and repeat until an order in the last two months is found or there are no more orders for that customer, and then move onto the next customer.”
So why are some SQL programmers apoplectic? Because, they say, programmers unfamiliar with T-SQL will usually opt to use procedural languages, even when this clearly isn’t the best idea.
As a result, code bloat and other procedural byproducts will seriously impact the performance and reliability of SQL databases. “These guys are already writing cursors because they do not understand declarative, set-oriented programming. Why would they stop when they have languages that do not support set operations?” Celko asks. “Even worse, they will assume that their native language is universal and start modifying data with their rules. MOD() functions are not the same in various languages; C# and VB disagree about (0,+1) or (0 -1) for BOOLEAN values…. It just keeps getting worse.”