While working on an ASP ticket system today that required regular expressions, I came up with a couple of useful regular expression patterns that may save people a few hours of thinking time.

Matching and extracting a string

Problem: I have the following chunk of arbitrary text and I want to extract the order number prefixed “ORD_”:

The quick brown fox... ORD_1012345678 ...jumped over the lazy dog

Solution: ORD_[a-zA-Z0-9_-]*

What is going on? Well, quite simply the regular expression engine is being asked to match the first three letters “ORD” followed by an underscore “_”. It then requires a series (*) of letters, numbers, underscores or dashes (but nothing else). Therefore, once the regular expression engine has found the order number “ORD_1012345678″ and then it comes to a whitespace, new line, period or whatever - it stops parsing.

ASP VBScript Code:

Set regEx = New RegExp
With regEx
	.Pattern = "ORD_[a-zA-Z0-9_-]*"
	.IgnoreCase = true
	.Global = false
End With
set matches = regEx.Execute(text)
if matches.count > 0 then
	result = matches.item(0).value
end if

The string “ORD_1012345678″, extracted from the chunk of text, will be stored in the variable “result”

A very similar version of string extraction

Problem: I have the following chunk of arbitrary text and I want to extract the ID number in square brackets (prefixed “[#”):

The quick brown fox jumped over the lazy dog [#101234-56789]

Solution: \[#([a-zA-Z0-9_-]*)

What is going on? In a similar way to the first one, this regular expression match pattern is asking for a square bracket followed by a hash “[#” - but because the opening square bracket is a reserved character (used to define sets), we have to escape it with a backwards slash before hand. We then surround the series of allowed characters with parenthesis ( ) which groups the match as a “sub match”.

ASP VBScript Code:

Set regEx = New RegExp
With regEx
	.Pattern = "\[#([a-zA-Z0-9_-]*)"
	.IgnoreCase = true
	.Global = false
End With
set matches = regEx.Execute(text)
if matches.count > 0 then
	result = matches(0).subMatches(0)
end if

The ID number “101234-56789″ will be stored in “result”

The important difference to note in this code is the use of “subMatches(0)” which returns the first match found in the brackets.

Stripping HTML tags

This function can be used to strip HTML tags from a string. It is very similar to the PHP function strip_tags(); but this one is not as advanced (yet).

A more advanced version is now available here :)

Let’s just jump straight to the code, you don’t really need to know what is going on (you can probably guess anyway)…

ASP VBScript Code:

function stripTags(strHTML)
	dim regEx
	Set regEx = New RegExp
	With regEx
		.Pattern = "< (.|\n)+?>"
		.IgnoreCase = true
		.Global = false
	End With
	stripTags = regEx.replace(strHTML, "")
end function

Trimming unwanted whitespace

If you want to trim unwanted whitespace from a string, e.g: turning “Text[space]spaced[space]normally[space][space][space]or[space][space]not?” into: “Text[space]spaced[space]normally[space]or[space]not?” use the following method:

function trimWhitespace(strIn, singleSpacing)
	dim regEx
	Set regEx = New RegExp
	With regEx
		.Pattern = "\s+"
		.IgnoreCase = true
		.Global = false
	End With
	if singleSpacing then
		space = " "
	else
		space = ""
	end if
	trimWhitespace = regEx.replace(strIn, space)
end function

When set to false, the second parameter “singleSpacing” will simply remove all whitespaces from a string, giving: “Textspacednormallyornot?”

I hope the above examples help someone!

You may find the following websites useful, I certainly did!