In-Depth

Looking to LINQ

Julie Lerman remembers well the October day she first learned about Microsoft's Project LINQ, or Language Integrated Query.

The independent consultant from Huntington, Vt., was in Los Angeles for the 2005 Microsoft Professional Developers Conference (PDC) when Group VP of Platforms Jim Allchin and Technical Fellow Anders Hejlsberg introduced the technology, which promised to re-invent the way developers work with data. In short, LINQ would meld queries of multiple data stores into a common development environment, transforming the way queries are programmed into code.

"I was jumping up and down," recalls Lerman, who describes herself as an old FoxPro hand.

Hejlsberg's pet project for more than three years, LINQ promises to bring about the most profound change in the way database queries are built into programs since the arrival of SQL and XQuery, some proponents believe. That's because it will be built into programming environments such as C# and Visual Basic.

"We're taking the concepts of query, step operations and transformation, and we're making them first class in the .NET platform and in the .NET programming languages," Allchin told PDC attendees at the time. "Rich queries that you could previously only write in SQL or in XQuery, you can now write in C# or VB, going against any kind of data source, be it objects, relational or XML."

That's what drew Lerman and other developers to their feet. "It's very powerful and it's a really important addition to the languages and to .NET," Lerman says.

During the past year and change since that PDC keynote, LINQ for the most part has remained under the radar, despite a lot of work in the background. But with the new .NET Framework 3.0 and the pending release of the next version of Visual Studio (code-named "Orcas"), LINQ is ready to come out of its shell. And those following it say that if LINQ performs as advertised, it may very well give the .NET environment a major leg up as a platform for developing query-centric programs.

"It's a very big deal and can be fundamental in terms of programmer productivity for data driven applications," says Peter O'Kelly, an analyst at the Midvale, Utah-based Burton Group.

The Hejlsberg Factor
The fact that Hejlsberg has championed the project certainly bodes well for LINQ. Hejlsberg is credited with building the first integrated development environment (IDE) -- Borland's Turbo Pascal and its successor Delphi. Since joining Microsoft in 1996, Hejlsberg has led the development of J++, Windows Foundation Classes (WFC) and ultimately the C# programming language that's central to the .NET Framework.

"If we're going to attack the dreaded so-called impedance mismatch between program-type systems and back-end database systems and now add XML to the mix, I don't think there are many people walking the planet that have the expertise he has to pull this off," O'Kelly says.

In an interview with Redmond Developer News, Hejlsberg made the case that LINQ will ultimately bind data querying into programming much as it was done decades ago with programs such as dBase and FoxPro. "It's my hope that in five to 10 years, programming languages simply will have queries as a concept built in, because that's just a must," Hejlsberg says.

The effort to reach that goal, Hejlsberg points out, was significant. "It's not like we just sort of slapped the SQL Server query processor into C# and inextricably married it to a particular database engine," he says. "Rather, when you look at how it enters into the languages, just seven or eight new language features, that each have married on its own, they come together to form this whole that's bigger than the sum of the parts."

In a nutshell, today's programming languages work well when the native data is based on object-oriented code. With the proliferation of XML and relational data, however, programming code that can integrate and access non-object-oriented code has become a huge exercise in complexity, but a norm that developers have come to expect.

Hejlsberg and his team decided to take the approach of building general-purpose query facilities into the .NET Framework rather than building relational or XML-specific features directly into its programming languages and runtime environments. In other words, the query is integrated directly into the programming languages, specifically C# and Visual Basic. There's speculation that it will be offered in other platforms such as Delphi, though CodeGear isn't saying one way or the other.

Taking advantage of the latest general-purpose language features of C# 3.0 and Visual Basic 9.0, query expressions developed using LINQ in those environments can make use of rich metadata, compile-time syntax checking and IntelliSense, currently only available to the code. LINQ also provides a single, general-purpose query facility to both in-memory data, as well as external repositories. It bears noting that LINQ contains a number of features and facilities that provide an extensible means of defining APIs that can be queried.

There are various facilities within LINQ to address data-specific attributes of content. Among them is DLinq, which can be used to query relational data stores using the syntax and compilers of the existing programming languages. When querying against a relational store, DLinq translates the query from an expression tree into a SQL expression. DLinq will also be integrated into ADO.NET, a Microsoft tool for creating queries of relational data over Web-based networks.

Another key LINQ facility, XLinq, will play a similar role for querying all types of XML data. XML has historically been difficult for developers to work with. XLinq allows developers to combine XML queries and transformations with queries from other data sources.


FAQ: Anders Hejlsberg on Project LINQ

Anders Hejlsberg

What is the bottom-line problem that LINQ addresses?
There's an impedance mismatch today between general-purpose programming languages and databases. LINQ adds native query capability into programming languages such as C# and VisualBasic.

How does that mismatch affect developers today?
They have to learn database languages such as SQL and what comes with it, such as stored procedures and data types, while also mastering programming languages such as C#, VisualBasic or even Java. Furthermore, they have to master the APIs that bind these two worlds together.

So LINQ brings the two together?
Developers will be able to take the expressive power of SQL or XQuery and put it right in the middle of C# or VisualBasic. Then the developer can apply this to any kind of data in a program-not just relational data that sits in a database, but also in memory arrays of objects or XML documents that can be manipulated within the program.

Why now?
Some 20 years ago in the days of dBase and FoxPro, these worlds were integrated. Client/server computing separated them. The intent now is to bring them together without sacrificing any capabilities in the process.

Should developers be planning for LINQ now or is it premature?
LINQ will start to appear in the next versions of C#, Visual Studio and the .NET Framework. With the beta of Visual Studio due out in a few weeks, developers will get their first exposure to LINQ.

How does this rate with other innovations including Delphi and C#?
Hopefully within five to 10 years, programming languages simply will have queries as a concept built-in because that's a must.

LINQ can improve the productivity of programming teams by reducing the amount of coding performed. Are there other business reasons to try it?
Performance. If you're querying in-memory data and there's a multiprocessing machine, then the job can parallelize and, hence, sort quicker.

-- Jeffrey Schwartz

The View Beyond Redmond
Microsoft is working with other database vendors to build interoperability into their repositories as well. Oracle Corp. officials would only say they're monitoring LINQ's progress, but IBM Corp. says it plans to support LINQ in both its DB2 database and the Informix IDS platform.

Curt Cotner, an IBM Fellow and CTO for database servers, says LINQ could very well be critical in eliminating the fragmentation that exists today among programmers and database developers. Still, Cotner expects LINQ to have its share of both ardent supporters and critical detractors. "It's going to be popular for a significant segment of customers, but there will be another segment that will see it as not down the path they're trying to follow," he says.

Those in the Java world, for example, now have the Java Persistence API, or JPA, part of the Java Enterprise Edition 5.5 platform. According to Sun Microsystems Inc., JPA simplifies development using data persistence with a single API. It standardizes such technologies as Hibernate, TopLink and JDO. Its object-relational mapping specification supports the use of Java-based metadata descriptors and/or XML descriptors to define the mapping between Java objects and a relational database. It supports SQL-like queries -- both static and dynamic.

But Cotner doesn't see JPA and LINQ as competing specifications per se. "It addresses a different segment of the population," he says. And frankly, he admits from a developer standpoint, JPA can't touch LINQ in terms of its ability to build native queries into code from the language and development environment.

"I don't think there's anything along the same lines in the Java world," he says. "I think this is an area Microsoft has innovated and gone in a direction that is different than a lot of the other programming languages have gone."

That praise comes from someone who would know. Regarded as one of the early innovators of relational database technology, Cotner is the chief architect of IBM's mainframe-based DB2 and the architect of database connectivity for IBM's WebSphere application server line.

Despite its potential to ease programming, Cotner warns that uptake of LINQ may be slow as old programming habits die hard. "There will be some shops that take the view that they don't want the developers formulating SQL queries in their application, they'd rather move that stuff out of the application so they can have people who specialize in that kind of activity write the queries," Cotner says.

Burton Group's O'Kelly says that the drive for better productivity will ultimately outweigh that reluctance. "Right now there's a big problem in programming," he says. "People see it as a sausage factory, you just don't want to know what goes on in there. Most developers would tell you it's just not a lot of fun to tie in database systems with programming-type systems."

Cotner says IBM has no qualms about supporting LINQ. In fact, it's in IBM's interest to support it. "The DB2 and the Informix IDS customers we have do use .NET and Windows for their application development," he says. "It's a strategic platform for us and something we want to support and provide a first-class solution in that space."

Oracle's lack of enthusiasm for LINQ doesn't surprise O'Kelly. The company's existing foil for LINQ, called PL/SQL, addresses only a subset of the functionality offered by LINQ. O'Kelly points out that both technologies target the "impedance mismatch" between traditional programming language models and database models, but PL/SQL is designed specifically for database-resident programming while LINQ is an extension to a complete programming framework.

While still widely in use, O'Kelly says Oracle de-emphasized PL/SQL a decade ago in favor of Java. But Java has no competing solution. "The bigger question is when and how the Java community will standardize on something analogous to LINQ, and, as far as I know, that's an open question at this point."


The different LINQ modules
[click image for larger view]
The different LINQ modules enable C# or Visual Basic code to call directly to a variety of data sources, including relational databases, XML repositories, and object properties.

Breaking New Ground
In the end, the proof of LINQ's potential to reshape data-centric programming will lie in the implementation. It can only be helped by a broader ecosystem. But the business case to try it out is there -- certainly for anyone managing a developer team looking to improve productivity and reduce the amount of tedious work performed by programmers.

Hejlsberg points to another benefit of LINQ: performance. "If we're querying in memory data and discover we're on a multiprocessing machine, well fine, we can actually parallelize the sort and get it done quicker," Hejlsberg says.

Behind this flexibility is a key advancement in the state of the art. Hejlsberg says LINQ departs from imperative programming languages like C#, which are based on statements, loops and variables, to introduce elements of functional programming to the .NET environment. Previously the realm of academic languages like LISP, functional programming offers a lot of promise.

"Your program in many ways isn't just a specification of what you want done, but also in a lot of detail, how you want it done," explains Hejlslberg, who says LINQ frees developers from detailing how things get done. "There's room in the execution infrastructure to be smart about how the queries are executed."

Bill McCarthy, a Microsoft MVP who runs the small consulting firm, Total Environment in Australia, offers a clear analogy -- making a sandwich.

"You grab the bread, put some fillings in and that defines the sandwich, at least it should," McCarthy says. "Unfortunately, today programmers go into the kitchen, then start writing on the fridge their definition of a sandwich before they even start. With LINQ they'll just focus on making the sandwich."

Article provided courtesy of Redmond Developer News (April 1, 2007 edition).