Generic question about how BRU handles RegEx

A swapping-ground for Regular Expression syntax

Generic question about how BRU handles RegEx

Postby JMM » Fri Jan 20, 2017 1:05 am

Please excuse me for the long post, but I need to explain my perspective properly.

In the past I never liked regular expressions, I didn't grasp how they worked, and they looked cumbersome and unefficient to them. Well that's part of the learning process, phase one it seems. In my last stint as programmer I was having problems handling large amounts of text data, and there was no library matching my needs. So I started to devise a simple but flexible function to seek, match, and replace strings. I was putting my ideas in written form before doing any code, so I had clear ideas of what I was trying to achieve, when I realized that what I was describing was a primitive version of RegEx. So in a couple of days and after some tutorials, I changed from a hater to a lover of regex. I got a regex library and in a matter of seconds my program was doing what I needed (it took me about one hour finding the right regex to do what I wanted, but the program already worked as intended).

So then I started to use regex whenever I always saw them available but never touched them: LibreOffice's Calc and Write, Notepad++, and well I also tried in BRU. To my surprise, BRU didn't match and replace a part of a string. Whenever BRU matches the regex within the name, the *whole* filename is removed, and changed to the Replace. I really didn't expect this, and it's totally counterintuitive. Moreover, it makes you work with non-passive groups (the groups within parenthesis) and backreferences (the \1 \2 components referring to the matched groups within parentheses) even when you don't need them. And to add a little more surprise, you can't remove what you matched without using those, although that's what you expect if you put something in the match string and nothing in the replace string (to remove certain matches in the filename, for example).

So, somewhere else, I want to replace a group of characters, a-f and digits, if they are 16 chars long (an MD5 signature). I put [0-9a-fA-F]{16} on the match*, nothing on the replace, and hit run, process, or whatever. And it's done. In BRU, I need to put (.*)[0-9a-fA-F]{16}(.*) and then in replace put \1\2 and hit rename several times in case there are more matches in a single filename, since BRU matches as in 'yes, it matches', not 'matches from here to here, there could be more matches ahead'. (This example has been simplified, I'm not adding the case where I want the MD5 surrounded by spaces or symbols to distinguish it from other hashes).
(*) Note: in perl-like regex, what BRU uses, it can be shortened to [\da-f]{16}/i but I'm more used to POSIX regex, and I prefer to not use pattern modifiers for clarity of reading, unless needed.

So, benefits of only one match and being forced to use non-passive groups and backreferences? Well everyone who uses RegEx within BRU will know them even if not by that name, unlike others like me who never used them thinking they were 'advanced stuff' until way late when they knew what they were missing. Disadvantages? Only one match per string, being forced to reconstruct the filename in all cases, not being able to simply delete a certain match without such reconstruction.

I would like to hear other opinions on this matter, and if it's worth it suggesting to change the way RegEx is handled, maybe adding a second selectable method while maintaining the classic method.

And there goes my CTS again...

Regards, JMM.
JMM
 
Posts: 29
Joined: Sat Jan 25, 2014 12:12 pm

Return to Regular Expressions