There is a saying going around that we love to quote over and over, "Oh, so you're using regular expressions, huh? Well, now you've got two problems.". That is just as absurd as saying, "Oh, you're including a Visual Basic project in a WPF application, where you need to maintain XAML and C#. Now you've got three problems.". The point is that if we need to learn something to make our lives easier, we should not put forth such arguments; instead, we should strive to be better than our current selves, so that our code shines.
This series of articles will help us do just that. I will take us through this journey step-by-step, so that others can be convinced that regular expressions are not scary; they are just a reason for a good challenge and a good motivator too!
So, what is a regular expression anyway? A regular expression is a string of characters that has meaning to a special program known as the regular expression engine. This program allows us to talk to it using a special language for text parsing. We use regular expressions if we want to study the parts of sentences or any other form of user input. Now, we don't need to be experts when it comes to human or programming languages to use these regular expression engines effectively, however we need to understand the syntax to enable ourselves to talk to the engine. Having said that, let's consider some ways we might use regular expressions.
Whether we are developing a database, a search engine, or whether we just want to check if a user gave us the correct input, regular expressions are always ready for us to use (assuming we have included the correct namespaces, that is). Regular expressions can be thought of as rules for the types of text we want. We can ask the engine to search for text that contains lowercase letters and numbers, for example. We can also tell it to search for a specific piece of text, only if it follows a certain character, or only if it meets a certain criteria. Screen readers and text editors such as your IDE might use regular expressions too. The possibilities are endless; the only limit is our creativity! In fact, regular expressions are so essential and important that every mainstream programming language has a library devoted to them. Some of these languages include (but are not limited to) Ruby, Java, .NET, Scala, JavaScript, C++ and Perl (of course.) The engine we will study is the .NET Framework, though a lot of the syntax is transferrable to other languages. We will be looking at the syntax to form our own regular expressions, then we will look at the API offered by the framework so we can do all sorts of surgery to the text we want to find. So, with the introduction out of the way, let's begin, shall we?
We will begin by looking at the very basics of a regular expression that is written inside the folowing C# string.
- string expression = @"test expression";
Notice that we placed the @ symbol to the left of the first quotation mark. This will be explained in later articles, but for now, let's make it a habit and a best practice to do this in every regular expression we create.
The text inside our string will ask the engine to search for exactly "test" followed by a space, followed by the word "expression." So, let us provide the program some text to parse.
- string sampleText = "This is my first test expression.";
Here, the engine will check to see if the test written on the expression variable matches the text written on the sampleText variable. If we were to ask the engine, it would tell us that the text was found. As one can see, the text in the regular expression must match exactly with the string that contains the text we want to study. This may not seem very powerful at first, because we haven't looked at more advanced ways of writing regular expressions, but we will get there.
What if we change our sampleText variable to say the following?
- string sampleText = "This is my first Test EXPression";
The engine would tell us that the text was not found, because it must match exactly with the text written in our expression variable.
Future articles will cover more advanced features of the regular expression language and the API that is offered by the .NET Framework. We will take it slow, since there are many tutorials on the web that are not always clear. This topic is not something we can learn in a few minutes. It requires thought, time, effort and motivation.
For now, let's take what's written here and study it, so that future articles make sense.
Happy coding!