User Tools

Site Tools


linq

Table of Contents

LINQ

Snippet from Wikipedia: Language Integrated Query

Language Integrated Query (LINQ, pronounced "link") is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

LINQ extends the language by the addition of query expressions, which are akin to SQL statements, and can be used to conveniently extract and process data from arrays, enumerable classes, XML documents, relational databases, and third-party data sources. Other uses, which utilize query expressions as a general framework for readably composing arbitrary computations, include the construction of event handlers or monadic parsers. It also defines a set of method names (called standard query operators, or standard sequence operators), along with translation rules used by the compiler to translate query syntax expressions into expressions using fluent-style (called method syntax by Microsoft) with these method names, lambda expressions and anonymous types.

Ports of LINQ exist for PHP (PHPLinq), JavaScript (linq.js), TypeScript (linq.ts), and ActionScript (ActionLinq), although none are strictly equivalent to LINQ in the .NET inspired languages C#, F# and VB.NET (where it is a part of the language, not an external library, and where it often addresses a wider range of needs).


Why LINQ beats SQL

“If you're not a LINQ addict, you might wonder what the fuss is about. SQL isn't broken, so why fix it? Why do we need another querying language?

The popular answer is that LINQ is INtegrated with C# (or VB), thereby eliminating the impedance mismatch between programming languages and databases, as well as providing a single querying interface for a multitude of data sources. While that's true, it's only part of the story. More importantly: when it comes to querying databases, LINQ is in most cases a significantly more productive querying language than SQL.

Compared to SQL, LINQ is simpler, tidier, and higher-level. It's rather like comparing C# to C++. Sure, there are times when it's still best to use C++ (as is the case with SQL), but in most situations, working in a modern tidy language and not having to worry about lower-level details is a big win.

SQL is a very old language—invented in 1974. Since then it's been extended endlessly, but never redesigned. This has made the language messy—rather like VB6 or Visual FoxPro. You might have become so accustomed to this that you can't see anything wrong!

Let's take an example. You want to write a simple query that retrieves customers as follows:

SELECT UPPER(Name) FROM Customer WHERE Name LIKE 'A%' ORDER BY Name That doesn't look too bad, right? But now suppose these results are feeding a web page, and we want to retrieve just rows 21-30. Suddenly, you need a subquery:

SELECT UPPER(Name) FROM (

  SELECT *, RN = row_number()
  OVER (ORDER BY Name)
  FROM Customer
  WHERE Name LIKE 'A%'
) A WHERE RN BETWEEN 21 AND 30 ORDER BY Name And if you need to support older databases (prior to SQL Server 2005), it gets worse:

SELECT TOP 10 UPPER (c1.Name) FROM Customer c1 WHERE

  c1.Name LIKE 'A%'
  AND c1.ID NOT IN
  (
     SELECT TOP 20 c2.ID
     FROM Customer c2
     WHERE c2.Name LIKE 'A%'
     ORDER BY c2.Name
  ) 
ORDER BY c1.Name Not only is this complicated and messy, but it violates the DRY principle (Don't Repeat Yourself). Here's same query in LINQ. The gain in simplicity is clear:

var query =

  from c in db.Customers
  where c.Name.StartsWith ("A")
  orderby c.Name
  select c.Name.ToUpper();

var thirdPage = query.Skip(20).Take(10); Only when we enumerate thirdPage will the query actually execute. In the case of LINQ to SQL or Entity Framework, the translation engine will convert the query (that we composed in two steps) into a single SQL statement optimized for the database server to which it's connected.

Composability You might have noticed another more subtle (but important) benefit of the LINQ approach. We chose to compose the query in two steps—and this allows us to generalize the second step into a reusable method as follows:

IQueryable<T> Paginate<T> (this IQueryable<T> query, int skip, int take) {

  return query.Skip(skip).Take(take);
} We can then do this:

var query = … var thirdPage = query.Paginate (20, 10); The important thing, here, is that we can apply our Paginate method to any query. In other words, with LINQ you can break down a query into parts, and then re-use some of those parts across your application.

Associations Another benefit of LINQ is that you can query across relationships without having to join. For instance, suppose we want to list all purchases of $1000 or greater made by customers who live in Washington. To make it interesting, we'll assume purchases are itemized (the classic Purchase / PurchaseItem scenario) and that we also want to include cash sales (with no customer). This requires querying across four tables (Purchase, Customer, Address and PurchaseItem). In LINQ, the query is effortless:

from p in db.Purchases where p.Customer.Address.State == “WA” || p.Customer == null where p.PurchaseItems.Sum (pi ⇒ pi.SaleAmount) > 1000 select p Compare this to the SQL equivalent:

SELECT p.* FROM Purchase p

   LEFT OUTER JOIN 
       Customer c INNER JOIN Address a ON c.AddressID = a.ID
   ON p.CustomerID = c.ID	
WHERE
  (a.State = 'WA' || p.CustomerID IS NULL)
   AND p.ID in
   (
       SELECT PurchaseID FROM PurchaseItem
       GROUP BY PurchaseID HAVING SUM (SaleAmount) > 1000
   )
Extending this example, suppose we want to sort the results in reverse order of price, and also want the salesperson's name and number of purchased items in the final projection. Notice how naturally we can express these additional criteria without repetition:

from p in db.Purchases where p.Customer.Address.State == “WA” || p.Customer == null let purchaseValue = p.PurchaseItems.Sum (pi ⇒ pi.SaleAmount) where purchaseValue > 1000 orderby purchaseValue descending select new {

  p.Description,
  p.Customer.SalesPerson.Name,
  PurchaseItemCount = p.PurchaseItems.Count()
} Here's the same query in SQL:

SELECT

   p.Description,
   s.Name,
   (SELECT COUNT(*) FROM PurchaseItem pi WHERE p.ID = pi.PurchaseID) PurchaseItemCount	
FROM Purchase p
   LEFT OUTER JOIN 
       Customer c 
           INNER JOIN Address a ON c.AddressID = a.ID
           LEFT OUTER JOIN SalesPerson s ON c.SalesPersonID = s.ID
   ON p.CustomerID = c.ID	
WHERE
   (a.State = 'WA' OR p.CustomerID IS NULL)
   AND p.ID in
   (
       SELECT PurchaseID FROM PurchaseItem
       GROUP BY PurchaseID HAVING SUM (SaleAmount) > 1000
   )
ORDER BY
   (SELECT SUM (SaleAmount) FROM PurchaseItem pi WHERE p.ID = pi.PurchaseID) DESC
An interesting point is that it's possible to transliterate the above SQL query back into LINQ, yielding a query that's every bit as repetitive - and at least as clumsy. You'll often see (typically non-working versions of) such queries posted on forums - this happens as a result of thinking in SQL, rather than thinking in LINQ. It's rather like transliterating a Fortran program into C# 7, and then complaining about the clumsy syntax for GOTO.

Shaping Data Selecting from more than one table in SQL requires joining - the end result being rows of flat tuples. If you've used SQL for many years, you may have become so accepting of this that it may not occur to you that this forced denormalization is often undesirable: it leads to data duplication and makes result sets awkward to work with on the client. In contrast, LINQ lets you retrieve shaped or hierarchical data. This avoids duplication, makes results easier to work with, and in most cases it even obviates the need for joining. For example, suppose we want to retrieve a selection of customers, each with their high-value purchases. In LINQ, you can do this:

from c in db.Customers where c.Address.State == “WA” select new {

  c.Name,
  c.CustomerNumber,
  HighValuePurchases = c.Purchases.Where (p => p.Price > 1000)
} HighValuePurchases, here, is a collection. And because we were querying an association property, we didn't need to join. Which means the detail of whether this was a inner or outer join is nicely abstracted away. In this case, the query, when translated to SQL, would be an outer join: LINQ doesn't exclude rows just because a subcollection returns zero elements. If we wanted something that translated to an inner join, we could do this:

from c in db.Customers where c.Address.State == “WA” let HighValuePurchases = c.Purchases.Where (p ⇒ p.Price > 1000) where HighValuePurchases.Any() select new {

  c.Name,
  c.CustomerNumber,
  HighValuePurchases
} LINQ also supports flat outer joins, adhoc joins, subqueries, and numerous other kinds of queries through a rich set of operators.

Parameterization What if we wanted to parameterize our previous example, so that the state “WA” came from a variable? This is all we do:

string state = “WA”;

var query =

  from c in db.Customers
  where c.Address.State == state
  ...
No messing with parameters on DbCommand objects or worrying about SQL injection attacks. LINQ's parameterization is inline, typesafe, and highly readable. It doesn't just solve the problem—it solves it really well.

As because LINQ queries are composable, we can add predicates conditionally. For example, we could write a method as follows:

IQueryable<Customer> GetCustomers (string state, decimal? minPurchase) {

   var query = Customers.AsQueryable();
   
   if (state != null)
       query = query.Where (c => c.Address.State == state);
   
   if (minPurchase != null)
       query = query.Where (c => c.Purchases.Any (p => p.Price > minPurchase.Value));
   
   return query;
} If we call this method with null state and minPurchase values, the following SQL is generated when we enumerate the result:

SELECT [t0].[ID], [t0].[Name], [t0].[AddressID] FROM [Customer] AS [t0] However, if we specify values for state and minPurchase, LINQ to SQL will not only add the predicates to the query, but the necessary joins as well:

SELECT [t0].[ID], [t0].[Name], [t0].[AddressID] FROM [Customer] AS [t0] LEFT OUTER JOIN [Address] AS [t1] ON [t1].[ID] = [t0].[AddressID] WHERE (EXISTS(

   SELECT NULL AS [EMPTY]
   FROM [Purchase] AS [t2]
   WHERE ([t2].[Price] > @p0) AND ([t2].[CustomerID] = [t0].[ID])
   )) AND ([t1].[State] = @p1)
Because our method returns an IQueryable, the query is not actually translated to SQL and run until enumerated. This give the caller a chance to add further predicates, pagination, custom projections, and so on.

Static Type Safety In the preceding queries, if we had declared the state variable as an integer rather than a string, the query would fail at compile-time rather than run-time. The same applies if you get any of the table or column names wrong. This is of real benefit when refactoring: the compiler will tell you if you haven't completely done your job.

Client Processing LINQ lets you effortlessly shift part of the query onto the client for processing. Why would you want to do this? With a heavily burdened database server, it can actually improve performance. As long as you don't take more data than you need (in other words, you still do all the filtering on the server) you can often help performance by shifting some of the burden of reordering, transforming and regrouping the results onto a less loaded application server. With LINQ, all you need to do is to slip AsEnumerable() into the query, and everything from that point on executes locally.

When not to use LINQ for querying databases Despite its power, LINQ doesn't deprecate SQL. It takes more than 95% of the querying brunt, but you still sometimes need SQL for hand-tweaked queries - especially with optimization or locking hints. Most providers don't support the use of LINQ for predicated updates, and of course you still need SQL for triggers. (SQL's also needed for stored procedures and functions, although the need for these crops up less often when you're using LINQ). You can combine LINQ with SQL by writing table-valued functions in SQL, and then calling those functions within more elaborate LINQ queries.

Having to know two querying languages is not really an issue because you'll want to learn LINQ anyway—LINQ is so useful for querying local collections and XML DOMs. If you're still using the old XmlDocument-based DOM, you'll find LINQ to XML's DOM a dramatic step up.

And because LINQ is easier to master than SQL, the goal of writing really good queries is more achievable with LINQ than with SQL.

LINQ in the Field I use LINQ almost exclusively for querying databases because it's more productive.

For writing applications, my personal experience is a LINQ-enabled data access layer (using an API such as LINQ to SQL or Entity Framework) cuts the data access development time by more than a half, as well as making maintenance far easier.”

Fair Use Source: https://www.linqpad.net/WhyLINQBeatsSQL.aspx

linq.txt · Last modified: 2021/02/13 14:03 by 127.0.0.1