Understanding LINQ


The purpose of this tutorial is to present some basics about LINQ for certain individuals who may have not gained an understanding of LINQ. LINQ unifies data access, whatever the source of data, and allows mixing data from different kind of sources. LINQ means "Language-Integrated Query". It allows for query and set operations, similar to what SQL statements offer for databases. LINQ, though, integrates queries directly within .NET languages like C# and Visual Basic through a set of extensions to these languages. Before LINQ, developers had to juggle with different languages like SQL, XML or XPath and various technologies and APIs like ADO.NET or System.Xml in every application written using general-purpose languages like C# or VB.NET. It goes without saying that this had several drawbacks1. LINQ kind of welds several worlds together. It helps us avoid the bumps we would usually find on the road from one world to another: using XML with objects, mixing relational data with XML, are some of the tasks that LINQ will simplify. One of the key aspects of LINQ is that it was designed to be used against any type of objects or data source, and provide a consistent programming model for doing this. The syntax and concepts are the same across all of its uses: once you learn how to use LINQ against an array or a collection, you also know most of the concepts needed to take advantage of LINQ with a database or an XML file. Another important aspect of LINQ is that when you use it, you work in a strongly-typed world. Examine this basic code and see if it shows any link to a data source: 

using System;
using System.Linq;
public sealed class Program
{
    static double Square(double n)
    {
        Console.WriteLine("Computing Square(" + n + ")...");
        return Math.Pow(n, 2);
    }
    public static void Main()
    {
        int[] numbers = { 1, 2, 3 };
        var query =
            from n in numbers
            select Square(n);
        foreach (var n in query)
            Console.WriteLine(n);
    }
}

OUTPUT:

Computing Square(1)...
1
Computing Square(2)...
4
Computing Square(3)...
9

The code declares a method Square to then declare an implicitly-typed local variable to perform that said operation on an array, or sequence of, three integers. The Select method emits a sequence where each input element is transformed within a given lambda expression. The iteration of each element enables the operation to be performed on each element. As a matter of fact, the general idea behind an enumerator is that of a type whose sole purpose is to advance through and read another collection's contents. Enumerators do not provide write capabilities. This type can be viewed as a cursor that advances over each individual element in a collection, one at a time. The IEnumerable represents a type whose contents can be enumerated, while the IEnumerator is the type responsible for performing the actual enumeration. The basics units of data in LINQ are sequences and elements. A sequence is any object that implements the generic IEnumerable interface and an element is each item in the sequence. Here is a basic code example:

using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
    public static void Main()
    {
        string[] names = { "Tom", "Mitch", "Steve" };
        IEnumerable<string> filteredNames = System.Linq.Enumerable.Where
        (names, n => n.Length >= 4);
        foreach (string n in filteredNames)
            Console.Write(n + "|");
    }
}

And here is the output:

Mitch
Steve

Lambda Expressions: Chaining Query Operators

The previous example was not too realistic because it showed two basic lambda queries, each comprising as single query operator. To build more complex queries, you chain the operators:   

using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
public class Program
{
    public static void Main()
    {
        string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" };
        IEnumerable query = names
                         .Where   (n => n.Contains ("a"))
                         .OrderBy (n => n.Length)
                         .Select  (n => n.ToUpper());
        foreach (string name in query)
        Console.Write(name + "|");
    }
}

// end of program
// The same query constructed progressively:

IEnumerable filtered   = names.Where      (n => n.Contains ("a"));
IEnumerable sorted     = filtered.OrderBy (n => n.Length);
IEnumerable finalQuery = sorted.Select    (n => n.ToUpper());

Here is a more complex query that uses implicitly-typed local variables by using the keyword "var":

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
static class LanguageFeatures
{
    class ProcessData
    {
        public Int32 Id { get; set; }
        public Int64 Memory { get; set; }
        public String Name { get; set; }
    }
    static void DisplayProcesses(Func<Process, Boolean> match)
    {
        // implicitly-typed local variables
        var processes = new List<ProcessData>();
        foreach (var process in Process.GetProcesses())
        {
            if (match(process))
            {
                // object initializers
                processes.Add(new ProcessData
                {
                    Id = process.Id,
                    Name = process.ProcessName,
                    Memory = process.WorkingSet64
                });
            }
        }
        // extension methods
        Console.WriteLine("Total memory: {0} MB",
          processes.TotalMemory() / 1024 / 1024);
        var top2Memory =
          processes
            .OrderByDescending(process => process.Memory)
            .Take(2)
            .Sum(process => process.Memory) / 1024 / 1024;
        Console.WriteLine(
          "Memory consumed by the two most hungry processes: {0} MB",
          top2Memory);
        // anonymous types
        var results = new
        {
            TotalMemory = processes.TotalMemory() / 1024 / 1024,
            Top2Memory = top2Memory,
            Processes = processes
        };
        ObjectDumper.Write(results, 1);
        ObjectDumper.Write(processes);
    }
    static Int64 TotalMemory(this IEnumerable<ProcessData> processes)
    {
        Int64 result = 0;
        foreach (var process in processes)
            result += process.Memory;
        return result;
    }
    static void Main()
    {
        // lambda expressions
        DisplayProcesses(process => process.WorkingSet64 >= 20 * 1024 * 1024);
    }
}

If you examine this code, you will see that "ObjectDumper" is not defined, yet referred to. This means that we have a DLL reference file to compile as well: 

using System;
using System.IO;
using System.Collections;
using System.Collections.Generic;
using System.Reflection;
public class ObjectDumper
{
    public static void Write(object element)
    {
        Write(element, 0);
    }
    public static void Write(object element, int depth)
    {
        Write(element, depth, Console.Out);
    }
    public static void Write(object element, int depth, TextWriter log)
    {
        ObjectDumper dumper = new ObjectDumper(depth);
        dumper.writer = log;
        dumper.WriteObject(null, element);
    }
    TextWriter writer;
    int pos;
    int level;
    int depth;
    private ObjectDumper(int depth)
    {
        this.depth = depth;
    }
    private void Write(string s)
    {
        if (s != null)
        {
            writer.Write(s);
            pos += s.Length;
        }
    }
    private void WriteIndent()
    {
        for (int i = 0; i < level; i++) writer.Write("  ");
    }
    private void WriteLine()
    {
        writer.WriteLine();
        pos = 0;
    }
    private void WriteTab()
   {
        Write("  ");
        while (pos % 8 != 0) Write(" ");
    }
    private void WriteObject(string prefix, object element)
    {
        if (element == null || element is ValueType || element is string)
        {
            WriteIndent();
            Write(prefix);
            WriteValue(element);
            WriteLine();
        }
        else
        {
            IEnumerable enumerableElement = element as IEnumerable;
            if (enumerableElement != null)
            {
                foreach (object item in enumerableElement)
                {
                    if (item is IEnumerable && !(item is string))
                    {
                        WriteIndent();
                        Write(prefix);
                        Write("...");
                        WriteLine();
                        if (level < depth)
                        {
                            level++;
                            WriteObject(prefix, item);
                            level--;
                        }
                    }
                    else
                    {
                        WriteObject(prefix, item);
                    }
                }
            }
            else
            {
                MemberInfo[] members = element.GetType().GetMembers(BindingFlags.Public | BindingFlags.Instance);
                WriteIndent();
                Write(prefix);
                bool propWritten = false;
                foreach (MemberInfo m in members)
                {
                    FieldInfo f = m as FieldInfo;
                    PropertyInfo p = m as PropertyInfo;
                    if (f != null || p != null)
                    {
                        if (propWritten)
                        {
                            WriteTab();
                        }
                        else
                        {
                            propWritten = true;
                        }
                        Write(m.Name);
                        Write("=");
                        Type t = f != null ? f.FieldType : p.PropertyType;
                        if (t.IsValueType || t == typeof(string))
                        {
                            WriteValue(f != null ? f.GetValue(element) : p.GetValue(element, null));
                        }
                        else
                        {
                            if (typeof(IEnumerable).IsAssignableFrom(t))
                            {
                                Write("...");
                            }
                            else
                            {
                                Write("{ }");
                            }
                        }
                    }
                }
                if (propWritten) WriteLine();
                if (level < depth)
                {
                    foreach (MemberInfo m in members)
                    {
                        FieldInfo f = m as FieldInfo;
                        PropertyInfo p = m as PropertyInfo;
                        if (f != null || p != null)
                        {
                            Type t = f != null ? f.FieldType : p.PropertyType;
                            if (!(t.IsValueType || t == typeof(string)))
                            {
                                object value = f != null ? f.GetValue(element) : p.GetValue(element, null);
                                if (value != null)
                                {
                                    level++;
                                    WriteObject(m.Name + ": ", value);
                                    level--;
                                }
                            }
                        }
                    }
                }
            }
        }
    }
    private void WriteValue(object o)
    {
        if (o == null)
        {
            Write("null");
        }
        else if (o is DateTime)
        {
            Write(((DateTime)o).ToShortDateString());
        }
        else if (o is ValueType || o is string)
        {
            Write(o.ToString());
        }
        else if (o is IEnumerable)
        {
            Write("...");
        }
        else
        {
            Write("{ }");
        }
    }
}

Now we compile our ObjectDumper.cs file into a DLL by using the '/target:library' switch on the command-line, or we compile it as a class file on VS Studio 2010. Note that if you are using VS 2010, be sure and go to the Project's properties and ensure that the .NET platform is 4.0.Now we compile the above file, MyProgram.cs, with a reference to the ObjectDumper.dll: csc.exe /r:ObjectDumper.dll MyProgram.cs. Here is the output:  

C:\Windows\MICROS~1.NET\FRAMEW~1\V40~1.303>myprogram
Total memory: 968 MB
Memory consumed by the two most hungry processes: 314 MB
TotalMemory=968         Top2Memory=314  Processes=...
  Processes: Id=3244      Memory=65527808         Name=sqlservr
  Processes: Id=5320      Memory=23556096         Name=sqlservr
  Processes: Id=3320      Memory=37498880         Name=DkService
  Processes: Id=952       Memory=47443968         Name=svchost
  Processes: Id=5272      Memory=167903232        Name=WINWORD
  Processes: Id=1108      Memory=68866048         Name=svchost
  Processes: Id=1096      Memory=90230784         Name=svchost
  Processes: Id=500       Memory=120848384        Name=AcroRd32
  Processes: Id=2856      Memory=75415552         Name=explorer
  Processes: Id=1672      Memory=71299072         Name=digitaleditions
  Processes: Id=4348      Memory=162045952        Name=LINQPad
  Processes: Id=2576      Memory=35442688         Name=Babylon
  Processes: Id=2172      Memory=49131520         Name=SearchIndexer
  Id=3244                 Memory=65527808         Name=sqlservr
  Id=5320                 Memory=23556096         Name=sqlservr
  Id=3320                 Memory=37498880         Name=DkService
  Id=952                  Memory=47443968         Name=svchost
  Id=5272                 Memory=167903232        Name=WINWORD
  Id=1108                 Memory=68866048         Name=svchost
  Id=1096                 Memory=90230784         Name=svchost
  Id=500                  Memory=120848384        Name=AcroRd32
  Id=2856                 Memory=75415552         Name=explorer
  Id=1672                 Memory=71299072         Name=digitaleditions
  Id=4348                 Memory=162045952        Name=LINQPad
  Id=2576                 Memory=35442688         Name=Babylon
  Id=2172                 Memory=49131520         Name=SearchIndexer


Stated loosely, the significant additions to managed code involving LINQ would be:
  • Implicitly typed local variables
  • Object initializers
  • Lambda expressions
  • Extension methods
  • Anonymous types
Now reconsider this code snippet:

var processes =
    Process.GetProcesses()
    .Where(process => process.WorkingSet64 > 20 * 1024 * 1024)
    .OrderByDescending(process => process.WorkingSet64)
    .Select(process => new
    {
        process.Id,
        Name = process.ProcessName
    });

We declare a variable using the C# 3.0 var keyword. This is the implicitly typed local variable. WorkingSet64 is the lambda expression. Most query operators take lambda expressions as an argument. The .OrderByDescending and its parameters are the extension methods. The keyword new is the anonymous type, and Name is the object initializer. Note that everything sort of dovetails to form a complete solution. 

Up Next
    Ebook Download
    View all
    Learn
    View all