Friday, August 1, 2008

Regular Expressions.

The asterisk -- * -- matches any number of repeats of the character string or RE preceding it, including zero instances.

"1133*" matches 11 + one or more 3's: 113, 1133, 1133333, and so forth.

The dot -- . -- matches any one character, except a newline."13." matches 13 + at least one of any character (including a space):

1133, 11333, but not 13 (additional character missing).

The caret -- ^ -- matches the beginning of a line, but sometimes, depending on context, negates the meaning of a set of characters in an RE.

The dollar sign -- $ -- at the end of an RE matches the end of a line.

"XXX$" matches XXX at the end of a line.

Brackets -- [...] -- enclose a set of characters to match in a single RE.
"[xyz]" matches the characters x, y, or z.
"[c-n]" matches any of the characters in the range c to n.
"[B-Pk-y]" matches any of the characters in the ranges B to P and k to y.
"[a-z0-9]" matches any lowercase letter or any digit.

Escaped "angle brackets" -- \<...\> -- mark word boundaries.

"\" matches the word "the," but not the words "them," "there," "other," etc.


No comments: