Regular expressions and Visual Studio

It can take a while to get to know how to use regular expressions well. After all most of us are used to

*.*

as our wild card language. It’s the old DOS style filtering that’s still used in windows to this day. Or perhaps you’re more used to SQL wild cards, eg:

‘%some text%’

Well, just because we didn’t have enough wild card standards now there’s regular expressions! They look more like:

.*\..*
^.*[a-zA-Z0-9 ]*
etc

Regular expressions, though they do come with a bit of a learning curve, are actually a really great improvement on wild card languages. The biggest problem I’ve had with them so far is that the standard is not completely solid. The way they work in .Net, Java, Perl is very similar, but not exact! This can lead to some huge headaches when trying to write / port them.

One of the greatest features of Regular expressions is it’s ability to search and replace strings in files. If you’ve ever had a strange pattern that you needed to search for and didn’t know how, that’s where RegExp would come in.

Say you had a whole bunch of html that had href’s that looked like this:

<a href="/page?queryItem=1234">content</a>
<a href="/page?queryItem=125">content</a>
<a href="/page?queryItem=3456">content</a>
<a href="/page?queryItem=1289834">content</a>

and you were given a project that required those url’s change from
/page?queryItem=1234
to
/queryItem=1234.html
you would have allot of work on your hands! Well not any more. In Visual studio you can do a search and replace like this:
Find what:
/page?queryItem={[0-9]*}
Replace with:
/queryItem=\1\.html

What is happening here is the curly brackets {} are creating groups. These groups are numbered in the order they show up in the RegExp so the replace text is all just hardcoded text with two exceptions:
1) \1 this means use the content found in group 1 from the find
2) \. this is just the escape sequence for a period. A period in a RegExp means “any character” and must be escaped if you actually need a period.

What if you wanted to do it in another Regular expression parser you say? Well, it depends on the parser. I’ve found that the same replacement in another parser I regularly use requires the grouping not use curly braces {}, but round ones (). So the replacement would change to:

Find what:
/page?queryItem=([0-9]*)
Replace with:
/queryItem=\1\.html

That’s the only difference, of course it matters what your Regular expression contains. Some parsers require

[a-zA-Z]

if you want both upper and lower case, other parser allow one and call it case insensitive. In that case

[a-z]
and
[A-Z]

are the same thing.

Best of luck with your Regular expressions.

Good reference:
http://www.regular-expressions.info/

Advertisements
  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: