Use Verbal Expressions to Create Readable Regexs in C#
Regular expressions are quite possibly the least enjoyable thing about programming, mostly because I can't read them they're terrible.
What Is VerbalExpressions?
VerbalExpressions is a library that builds regular expressions from readable code. For example, let's say we had this regex:
^(http)(s)?(://)([^\ ]*)$
This regex is designed to match simple URLs. Here're the rules for matching:
- The URL must start with either "HTTP" or "HTTPS".
- The URL must then have "://".
- The URL can then have anything following "://", as long as it is isn't space.
VerbalExpressions allows us to write the following C# code to produce this regex:
var urlExp = new VerbalExpressions()
.StartOfLine()
.Then("http")
.Maybe("s")
.Then("://")
.AnythingBut(" ")
.EndOfLine();
Which, if I do say so myself, is a LOT better than trying to read through dense, impossible-to-parse regular expressions.
But let's say you don't believe me, and would like to test it yourself. In order to test that this regex is valid, we could use simple assertions.
var url = "http://www.topnguyen.net";
Assert.IsTrue(urlExp.Test(url), "The URL is not valid!");
A Few More Examples
Let's walk through converting two more common regular expressions. First up is a regex that is designed to do simple validation in an email:
^(.*)(@)([^\ ]*)(\.)([^\ ]*)$
Here's the rules:
- The email may start with any text, followed by an '@' symbol.
- After the '@', the email may contain any text (except a blank space), followed by a '.'
- After the '.', the email address may contain any text (except a blank space).
Here's how we would write that using VerbalExpressions:
var emailExp = new VerbalExpressions()
.StartOfLine()
.Anything()
.Then("@")
.AnythingBut(" ")
.Then(".")
.AnythingBut(" ")
.EndOfLine();
var email = "test@example.com";
var invalidEmail = "test@example";
Assert.IsTrue(emailExp.Test(email), "The email is not valid!");
//This assert will fail!
Assert.IsTrue(emailExp.Test(invalidEmail), "The email is not valid!");
What about a phone number? For simplicity's sake, I'm assuming a United States ten-digit telephone number. Possible matches include:
(123) 456-7890
123 456-7890
1234567890
The regex for this looks like the following (this absolutely can be shortened):
^(\()?[0-9]{3}(\))?(\ )?[0-9]{3}(-)?[0-9]{4}$
Here're the rules:
- The phone number may start with "(".
- The phone number must then have 3 digits, each of which is in the range 0-9.
- The phone number may then have ")".
- Following the optional ")", the phone number may also have space.
- Following the optional space, the phone number must have 3 digits, each in the range 0-9.
- Following this set of digits, the phone number may optionally include a dash ("-").
- Following the optional dash, the phone number must have 4 digits, each in the range 0-9.
Here's the VerbalExpressions code for this:
var phoneExp = new VerbalExpressions()
.StartOfLine()
.Maybe("(")
.Range('0', '9')
.RepeatPrevious(3)
.Maybe(")")
.Maybe(" ")
.Range('0', '9')
.RepeatPrevious(3)
.Maybe("-")
.Range('0', '9')
.RepeatPrevious(4)
.EndOfLine();
var phone = "(123) 456-7890";
var invalidPhone = "(123) 456-789";
Assert.IsTrue(phoneExp.Test(phone), "The phone number is invalid.");
//This assert will fail!
Assert.IsTrue(phoneExp.Test(invalidPhone), "The phone number is invalid.");
Testing the Generated Expressions
Let's say we don't trust this package and want to prove that it is creating a regex that actually matches the appropriate input. For simple testing, we can use Assert. Let's test all three of the above regexes:
var url = "http://www.topnguyen.net";
var email = "test@example.com";
var invalidEmail = "test@example";
var phone = "(123) 456-7890";
Assert.IsTrue(urlExp.Test(url), "The URL is not valid!");
Assert.IsTrue(emailExp.Test(email), "The email is not valid!");
Assert.IsTrue(phoneExp.Test(phone), "The phone number is invalid.");
//This assert will fail!
Assert.IsTrue(emailExp.Test(invalidEmail), "The email is not valid!");
Easy enough, right? I'd like to see more complex testing examples, so if anyone out there comes up with some, let me know!
An Important NuGet Note
As of this writing, there is a NuGet package for the C# edition of VerbalExpressions, but the package is woefully behind the most recent version of the code on GitHub. Here's hoping the creator of the package gets this on NuGet so we can use it from there. For this demo, I just downloaded and included the code files in my project (there are only two of them).
PM> Install-Package CSharpVerbalExpressions
VerbalExpressions for other languages
Summary
Regular Expressions still suck, but now they suck less (at least in C#) thanks to VerbalExpressions! Use this package to build readable, easy-to-understand regular expressions that can still be used in everyday coding.
As always, if I missed something or the code can be made better, feel free to let me know in the comments. If you hate regular expressions, feel free to vent your anger below.