Introduction
The code in this article
demonstrates an alternative way to do multithreaded programming using .NET 2.0
and a freely available library called CSP.NET. The application presented is
meant to introduce the programmer to the most basic elements of CSP.NET. It is a
console application that lets the user input a file search pattern and a word.
The application then searches the files matching the pattern for the word
entered by the user - if there are any matches they are recorded and written to
a text file. All the tasks are done concurrently meaning that user input, word
search and storing of the results can run simultaneously. This simultaneous
execution is of course just simulated on a single core machine but a reality if
more processors are present.
Traditional
concurrent programming with multiple threads
If no external libraries
are used the only way to write concurrent programs in .NET is through the use of
the Thread class - or the ThreadPool. This class offers a lot of methods used to
synchronize one thread with other threads, put the current thread in a sleep
state, resume from the sleep state, abort the current thread and so on.
In the real world
different parts of a program need to communicate with each other and in the
world of threads that is typically done through shared data. Access to shared
data has to be controlled carefully to avoid that more than one thread modifies
the data at the same time. To that end most modern languages offer a set of
features to control access to shared data and to signal between threads. In c#
these features are ,among others, made up of locks, mutexes, semaphores,
waithandles and monitors.
Just looking at the number
of methods and classes designed to support thread programming it is evident that
multithreaded programs can get very complex very easily. With complexity comes
the risk of introducing errors, and as a matter of fact thread programming is
often regarded as one of the most challenging areas in programming. That may
have been all right when all computers only had one core and one processor and
multithreaded programming was the realm of a select few. But with the advent of
multicore processors, programs need to be multithreaded to exploit the power of
modern computers.
Concurrent
programming the CSP way
What if there was a way to
write multithreaded programs where you did not have to work with threads or any
of the supporting features like the ones mentioned above. What if all you had to
do was to write the different parts of the concurrent program as normal single
threaded programs communicating through a series of predefined channels.
That way of thinking about
concurrency is one of the key elements in a model known as CSP (Communicating
Sequential Processes). If you want to use CSP in .NET you have two choices.
The first is to read the
specification and implement your own library containing the CSP constructs
needed in your programs.
If that seems a bit too much there is a library for
.NET 2.0 called CSP.NET that can be downloaded free of charge, which implements
all the common CSP constructs needed. As this article is meant as a very basic
introduction to multithreaded programming with CSP.NET only the essential
elements of CSP.NET are discussed - the more advanced features are the subject
of another article.
In CSP.NET a multithreaded
application is made of a number of sequential programs communicating through
channels all executing concurrently. These sequential programs are called
processes and are of course not separate programs but rather classes
implementing the ICSProcess interface. This interface has one method called Run
that needs to be implemented. Think of the Run method as a normal thread and put
the code you want that thread to contain here.
It most programs
communication between threads and thus also between different classes
implementing the ICSProcess interface is required. In CSP.NET communication
between ICSPprocesses is done through a set of predefined channels. The channels
are called One2OneChannel, Any2OneChannel, One2AnyChannel and Any2AnyChannel.
They are used to send data from one ICSProcess to another. They are all generic
meaning that data of any type can be send through a channel. It is important to
note that CSP.NET channels are unidirectional meaning that data can only travel
in one direction. That means that a ICSProcess can either read from or write to
a given channel but not both.
A One2OneChannel is used
to communicate between two ICSProcesses - one process reads and the other one
writes.
An Any2OneChannel is used when on of multiple ICSProcesses wants to
write to the same ICSProcess.
A One2AnyChannel is used when one ICSProcess
wants to write to one of multiple ICSProcesses.
An Any2AnyChannel is used
when one of multiple ICSProcesses wants to write to one of multiple receiving
processes.
The channels are not
broadcast channels, meaning that even though you use a One2AnyChannel only the
first process reading from the channel receives the data.
By default all the
channels are blocking, which means that an ICSProcess that tries to read from a
channel will block if no other process has written some data to the channel.
Likewise an ICSProcess trying to write from a channel will block until another
process wants to read the data from the channel.
This behavior is
appropriate in many cases but there are situations in which it would be better
if the ICSProcess writing to the channel could continue execution even though no
process is waiting to read from the channel. In cases like that CSP.NET offers
buffered channels that can hold a certain amount of data. When the buffer is
full they behave like the normal channels. If the buffer is not full they allow
the process writing to the channel, to continue execution even though no
processes are waiting to read the data. In the example application both buffered
and unbuffered channels are demonstrated.
When a set of ICSProcesses
have been defined and appropriate channels set up and connected all that remains
in order to execute the program is to start the processes to execute in
parallel. In CSP.NET that is done through the Parallel class. The Parallel class
takes an array of ICSProcesses and when the Run method on the Parallel class is
invoked all the ICSProcesses are executed concurrently. Only when all the
processes have terminated does the Run method return.
In this section I have
only scratched the surface of CSP and CSP.NET. Remember that underneath the nice
surface CSP.NET still uses all the thread stuff in NET 2.0. Using CSP.NET any
programmer can write nice high performance applications exploiting the power of
multiple processors and cores without explicitly having to deal with monitors,
semaphores, locks and synchronization of threads.
Installing
CSP.NET
Installation of CSP.NET is
quite easy. Go to the website: www.cspdotnet.com. Click on "Downloads" and download
the CSP.NET Library. Run the installer and that's it.
To use the library in your
own projects, just add a reference to the Csp.dll just installed and remember to
import the namespace into your source files.
Example
program
To illustrate the elements
of CSP.NET discussed above, a small example application is shown below. The
application is so simple that all the code is included in the article. Despite
its simplicity, the program is multithreaded and designed to scale with the
number of CPU-cores in the system.
The program is a console
based app that takes two inputs from the user. The first is a file pattern,
possibly including wildcards and the second is a word to search for. When the
user has entered a file pattern and a word the program searches all files
matching the file pattern for the word entered by the user. All matches are
written to a text file named "results.txt".
If the program was written
as a normal single threaded app the user interface would block until all
matching files had been searched, meaning that the user interface would not
respond to further input. Also the program would not take into account the
number of CPU-cores in the machine and perform the same even though you had just
bought the latest monster machine with two processors each containing two cores.
The program is divided
into three logical parts each performing a specific task. These parts a
naturally defined as classes implementing the ICSProcess interface communicating
though CSP.NET channels. The first part of the program implements the user
interface. The second part does the actual work and searches through files for a
specific word. The third part writes the results to a text file.
public
struct SearchData
{
public string filePattern;
public string searchWord;
public SearchData(string pattern, string word)
{
filePattern = pattern;
searchWord = word;
}
}
The program uses a
SearchData struct to hold the file pattern and the word entered by the user.
SearchData is shown above and needs no explanations. The code for the user
interface class looks like this:
public class
UI :
ICSProcess
{
IChannelOut searchDataChannel;
public
UI(IChannelOut searchDataChannel)
{
this.searchDataChannel =
searchDataChannel;
}
public
void
Run()
{
while (true)
{
string
filePattern, searchWord;
Console.Write("Enter file search
pattern: ");
filePattern = Console.ReadLine();
Console.Write("Enter word to search
for: ");
searchWord
= Console.ReadLine();
searchDataChannel.Write(new SearchData(filePattern,
searchWord));
}
}
}
As explained in the
text above the class needs to implement the ICSProcess interface, which means
that the Run method has to be implemented. The Constructor of the UI class takes
one parameter of the type: IChannelOut.
IChannelOut is an
interface implemented by all the channels in CSP.NET. It means that the channel
is restricted to provide the write functionality of a CSP.NET channel. There is
also defined an IChannelIn that only provides the reading functionality of the
CSP.NET channels. By using these interfaces you avoid the accidental use of a
channel as both a writer and a reader in the same process. The channel used in
the UIprocess is a channel that only supports writing objects of the type
SearchData.
The Run method is very
simple. It reads two strings from the command line. The first one is the file
patterns and the second one the word to search for. When the two strings have
been entered by the user they are written to the searchDataChannel as a
SearchData object. As all the code in the Run methods in enclosed in an infinite
while loop it starts over by asking the user for a new file pattern and word.
The ICSProcess that
receives the SearchData from the UI process is shown next.
public class
WordFinder :
ICSProcess
{
IChannelIn searchDataChannel;
IChannelOut fileWriterChannel;
//Search only files below this
directory...
const
string path =
@"c:\testdir\";
public
WordFinder(IChannelIn searchDataChannel, IChannelOut
fileWriterChannel)
{
this.searchDataChannel =
searchDataChannel;
this.fileWriterChannel =
fileWriterChannel;
}
public
void
Run()
{
while (true)
{
SearchData
sd = searchDataChannel.Read();
string[] files =
Directory.GetFiles(path, sd.filePattern);
for (int i = 0; i < files.Length;
i++)
{
using (StreamReader sr =
new
StreamReader(files[i]))
{
string
line;
int linecount =
0;
StringBuilder
sb = new
StringBuilder();
sb.AppendLine("Searching " + files[i] +
" for searchword: " + sd.searchWord);
while ((line = sr.ReadLine()) !=
null)
{
if
(line.Contains(sd.searchWord))
sb.AppendLine(sd.searchWord
+ " found at line: " + linecount);
linecount++;
}
}
}
fileWriterChannel.Write(sb.ToString());
}
}
}
WordFinder takes the user
input and searches through all the files matching the file pattern for the
specified word. The constructor takes two CSP.NET channels. The first one called
searchDataChannel is defines as an input-channel meaning that it can only be
used to read data from a channel - in this case it can read data of the type
SearchData. The second channel, called fileWriterChannel, is an output-channel
that can write the data type string.
As with the UI process all
the actual code in the Run method is implemented inside an infinite while loop.
That means that the program will continue running until it is terminated
explicitly by closing the program. From a CSP.NET point of view only two lines
of code are interesting.
The first line in the
while loop reads SearchData from the searchDataChannel. When that is done all
the files matching the file pattern given by the user are searched for the word
specifies by the user. All matches are recorded using the StringBuilder. When
all the files have been searched thestring built using the StringBuilder is
written to the fileWriterChannel.
public class
FileWriter :
ICSProcess
{
IChannelIn fileWriterChannel;
const
string file =
@"c:\testdir\results.txt";
public
FileWriter(IChannelIn fileWriterChannel)
{
this.fileWriterChannel =
fileWriterChannel;
}
public
void
Run()
{
while (true)
{
string
fileData = fileWriterChannel.Read();
using
(StreamWriter sw = new StreamWriter(file, true))
{
sw.Write(fileData);
sw.WriteLine();
}
}
}
}
The FileWriter class takes
an input-channel called fileWriterChannel and reads a string from the channel.
This string is the one constructed in the WordFinder class listing all the
matches of a word search. The string is appended to a text file and that's it.
The code for the FileWriter class is listed above.
What remains in order to
have a working CSP.NET program is to create objects of the three classes UI,
WordFinder and FileWriter and connect them with CSP.NET channels. All that is
done in the Main method shown below.
static void
Main(string[]
args)
{
CspManager.InitStandAlone();
Any2OneChannel fileWriterChannel =
Factory.GetAny2One();
One2AnyChannel searchDataChannel = Factory.GetOne2Any(new
FifoBuffer(10));
ICSProcess[] processes = new ICSProcess[Environment.ProcessorCount +
2];
processes[0] = new UI(searchDataChannel);
processes[1] = new FileWriter(fileWriterChannel);
for
(int i = 0; i <
Environment.ProcessorCount; i++)
processes[i+2] = new WordFinder(searchDataChannel,
fileWriterChannel);
Parallel
par = new
Parallel(processes);
par.Run();
}
The first line
CspManager.InitStandAlone() initializes the CSP.NET library and tells it that we
are working with a standard CSP.NET program with no distributed processes.
The next line creates a
CSP.NET channel, fileWriterChannel , of the type Any2OneChannel. As described in
the section about CSP.NET this channel can be connected to multiple processes
wishing to write but only one process can read from the channel. The Factory
methods are part of CSP.NET and are used to create all the various types of
channels. The fileWriterChannel is constructed as a channel to transport
strings.
The next line creates
another channel called searchDataChannel. This channel is created as
One2AnyChannel able to transport data of type SearchData. Remember from above
that a One2AnyChannel can be connected to only one writing process but to
multiple reading processes. Note that the searchDataChannel is created as a
buffered channel, meaning that it doesn't block when a writing process tries to
write without a reading process being ready. The buffer is created as a
FiFoBuffer which is means that the elements are retrieved in the same order as
they are written. The capacity of the buffer is set to 10 which means that at
most 10 elements can be written to the buffer before it blocks - provided that a
reading process has not read any elements before that.
Now it is time to create
the processes making up our CSP.NET program. Remember that I said that the
program scaled with the number of available CPU-cores. As only the WordFinder
process does some intensive work it only makes sense to increase the number of
WordFinder processes in order to take advantage of multiple CPU-cores. First an
ICSProcess array is created which can hold as many processes as there are cores
plus 2. The first two processes are the UI and the FileWriter.
The same number of
WordFinder processes as there are CPU-cores are created next. Regardless of the
number of WordFinder processes created the same two channels are used. That is
possible because the channels are created as a Any2OneChannel and One2AnyChannel
respectively. Had we known that only one WordFinder process would ever be
created we could have used two One2OneChannels instead.
When the processes to
execute are created we only need to create an instance of the Parallel class
defined in CSP.NET. The processes to run in parallel are defined in the
constructor. The last line starts out concurrent program by calling the Run
method on the Parallel class.
Conclusion
This article demonstrates
an alternative, and much easier way, to write multithreaded programs using a
freely available library called CSP.NET. I have given a brief introduction to
the most basic constructs and left out some very powerful features such as the
possibility to write concurrent distributed programs with ease. The more
advanced features will be the topic of another article. If you want to discuss
CSP.NET, there is a forum on the CSP.NET website dedicated to discussing the
library.