- Code: Select all
Regular expressions (regexes) are like super-intelligent wildcards. If you learn regexes, you too can become super-intellegent (and have more fun using BRU).
WILDCARDS
---------
Wildcards are probably familiar to all BRU users.
You can experiment with wild cards using Windows Explorer:
Windows Explorer > "Search" dialog box > "Search for files or folders named"
Summary:
? means "any single character"
* means "zero or more characters"
------------------------------------------------------------
Expression Matches Doesn't match
---------- --------------------------- -------------------
Notes* Notes Notes_2005_0302.txt aNotes
*Notes* Notes Notes_2005_0302.txt
?Notes* aNotes aNotes_2005_0302.txt Notes_2005_0302.txt
------------------------------------------------------------
REGULAR EXPRESSIONS
-------------------
Regexes are the same general idea as wildcards, but are considerably more powerful.
Please glance at the table, then refer to the discussion, below.
-----------------------------------------------
Equivalent
Regex Wildcard Matches
----- ---------- ----------------------------
cat cat The literal characters "cat"
. ? Any single character
.. ?? Any two characters
... ??? Any three characters
.* * Zero or more characters
..* ?* One or more characters
...* ??* Two or more characters
.+ ?* One or more characters
..+ ??* Two or more characters
Discussion:
* means "the preceding character occurs zero or more times"
+ means "the preceding character occurs one or more times"
Why would you ever want a character to occur zero or more times? It means that the character is optional. For example: ru.*n matches run, ruin, and ruffian
On the other hand, ru.+n matches ruin and ruffian, but not run (because we need at least one character between the "u" and "n".)
Note that there are two equivalent ways of saying "one or more characters": ..* and .+
-----------------------------------------------
There are no wildcard equivalents to the following regexes (at least, not in Windows Explorer).
----------------------------------------------------------------
Regex Matches
------------- -------------------------------------------------
\. A period.
\t A tab character
\n A newline character
ca?t "c" followed by zero or one "a", followed by "t"
ca*t "c" followed by zero or more "a", followed by "t"
ca+t "c" followed by one or more "a", followed by "t"
[efgh] any one of efgh
[e-h] any one of efgh
[a-cF-H] any one of abcFGH
[e-h]* any one of efgh, occurring zero or more times
[e-h]+ any one of efgh, occurring one or more times
[a-c][e-h]+ any one of abc; followed by any one of efgh, occurring one or more times
([a-c][e-h])+ any one of abcefgh, occurring one or more times.
Discussion:
\ \ in front of a regex operator changes it to an ordinary ascii character.
\ \ Also refers to non-printable ascii characters such as tab and newline (\t and \n).
? means "the preceding character occurs zero or one time"
? * + are called "quantifiers" because they specify the number of times a regex expression must occur
[] always refers to a single character, picked from all those in the square brackets
() parentheses are used for grouping expressions together.
()+ means the expression in the parentheses occurs one or more times
----------------------------------------------------------------
BACKREFERENCING!!!!
-------------------
In addition to "grouping", there is a second, more powerful use for parentheses, called "backreferencing". The idea is that you can save the matching characters to be used later. For example, suppose you want to change date format from 12-31-2005 to 2005_1231.
Use this as your "search-regex":
(12)-(31)-(2005)
and use this as your "replace-regex":
\3_\1\2
In backreferencing, \1 always refers to the contents of the first pair of parentheses in the search-regex, \2 refers to the contents of the second pair, and \3 to the contents of the third pair.
Understanding and using backreferencing is essential if you want to take advantage of the powerful regex capability of BRU.
OTHER PROGRAMS
--------------
Here are some programs that can help you get comfortable with regexes, before you start changing your filenames with BRU.
1. TextPad is shareware with an unlimited trial duration [url]http://www.textpad.com/[/url]
This is my favorite text editor. The main thing you need to know is that the grouping symbol is \( \) instead of (). Otherwise, regex-gurus-in-training can assume Textpad regexes are identical to BRU.
To get started:
- Open a text file
- Search menu > Find...
- Make sure that you've selected the "Regular expressions" check box.
- Type in a regex, and click the Find Next button.
To try out the above backreferencing example:
In a text file, type 12-31-2005
- Search menu > Replace...
- Make sure that you've selected the "Regular expressions" check box.
- Find what: \(12\)-\(31\)-\(2005\)
- Replace with: \3_\1\2
- Click the "Find Next" button
- Click the "Replace" button.
2. Visual Regex [url]http://laurent.riesterer.free.fr/regexp/[/url]
Visual Regex is unique because it highlights each regex group () with a different color, then highlights the matching text in the same color. This lets you see what group is matching what text, helps you debug the regex, and helps you learn more about regular expressions.
3. Regex Buddy [url]http://www.regexbuddy.com/[/url]
4. Regex Coach [url]http://weitz.de/regex-coach/[/url]
5. Regex Designer [url]http://www.radsoftware.com.au/regexdesigner/[/url]
6. The Regulator [url]http://regex.osherove.com/[/url]