by bru » Mon Feb 17, 2020 6:36 am
OK.. Finally done with the coloring.. Major pain in the ***. Sorry I didnt get to all your questions.
Yes, I know were not supposed to put | in names.. I just do it cuz its unique, but I never leave them that way.
I thought you were pulling my leg with that 'illegal' character bit, so I googled it.. Turns out you're right.
You know, I'd seriously consider leaving any country that wastes its time passing such ridiculous laws.
Even if they seized your computer, how do they prove it was actually you who did the naming?
I hate to ask, but what's the penalty? Do they fine you for every little | char they find?
If you ask me, those politicians need a | stuck up their ***, a really big one!
Global-warming, poverty, wars... No time for all that.. We've got citizens putting | into their filenames!!
Sorry, I couldnt resist.. Your post gave me quite the chuckle.. Had to get you back.
There's definitely no version of Windows that allows | in filenames.
Anywho, here's the batch in its original form, along with my best effort in trying to describe it:
@echo off
cd "C:\YourFolderPath"
set reg1=/Regexp:"^([A-Z]{2,})-([A-Z][^A-Z -]*?)([A-Z].*)$:\1|\2 \3"
set reg2=/Regexp:"^(.*[|][^-]*)-([A-Z][^A-Z -]*?)([A-Z].*)$:\1, \2 \3"
set max10=%reg2% %reg2% %reg2% %reg2% %reg2% %reg2% %reg2% %reg2% %reg2%
set reg3=/Regexp:"(.*[|].*), (.*):\1 and \2"
set reg4=/Regexp:"(.*)[|](.*):\1 - \2"
brc32 /Pattern:*-*.docx %reg1% %max10% %reg3% %reg4% /Execute
pause>nul
BRC's commandline is basically: brc32 /Pattern /Reg1 /Reg2(many-times) /Reg3 /Reg4 /Execute
BATCH TECHNIQUE ***
If I plan on using many regexes, or repeating 1 many times, I usually set them as a variable.
Setting var=aaabbbcccddd just lets you type %var% instead of aaabbbcccddd wherever you want.
It makes the brc-line easier to read/edit, especially when it comes to rearranging the regex-order.
Another very helpful technique, since /Pattern is so limited in its file selection:
I use the 1st-regex as a file-matcher by inserting | or some other illegal filename character.
Then make all subsequent regexes match for | as they continue processing the filename in memory.
This ensures that nothing in your regex-chain will touch anything the 1st-regex didnt match.
You can even get fancy, having each regex add another | as a true-condition for the next regex, & so on.
Nothing ever gets renamed until the /Execute param, everything else processes in memory.
You can put multiple /Execute's on brc's commandline, but I rarely use it that way.
RENAME LOGIC ***
One thing I try to do is think ahead about NewName-formats & design my 1st-regex to NOT match them.
That way, you dont have worry about protecting already-renamed-files by moving into other folders.
Its not always possible, but definitely worth the effort, especially when helping newcomers.
BRC's regex is specified as /Regexp:"Match:Replace"
I figure its easier to convert the batch regexes into the BRU-format usually posted here.
In all cases, the below equivalents exactly match those in the batch (colon/quotes removed).
When Match begins with ^ it means: Match-only-as-beginning-text.
If the Match ends with $ it means: Match-only-as-ending-text.
Please disregard commas in the descriptions, they're just for readability.
When you see OrigName --> NewName, that's just whats happening in memory at the time.
%reg1%
^([A-Z]{2,})-([A-Z][^A-Z -]*?)([A-Z].*)$
\1|\2 \3
Match: (2orMoreUppercase)-(1Uppercase,AnythingBesides[Uppercases,Spaces,Dashes] UntilNext)(1Uppercase,Anything)
Rplace: Group1|Group2 Group3
GER|MartinSchmidt-NinaWagner-RonaldStanford-JoeMahoney ---------> GER|Martin Schmidt-NinaWagner-RonaldStanford-JoeMahoney
Since we create a space in NewName (it'll stay there) the AnythingBesides[Uppercases,Spaces,Dashes] ensures we cant touch previous renames.
The | in replace 'tags' the filename.. All future regexes will seek it in their match.. They cant process anything this one doesnt tag.
I figure most songs only have 1 ArtistName, so this also converts the 1st-occurence of ArtistName --> Artist Name
%reg2% --- BRC processes this regex 9 times as %max10%
^(.*[|][^-]*)-([A-Z][^A-Z -]*?)([A-Z].*)$
\1, \2 \3
Match: (AnythingTillLast|,AnythingBesides[Dashes]Until)FirstDash(1Uppercase, AnythingBesides[Uppercases,Spaces,Dashes] UntilNext)(1Uppercase,Anything)$
Rplace: Group1, Group2 Group3
Run1: GER|Martin Schmidt-NinaWagner-RonaldStanford-JoeMahoney -----> GER|Martin Schmidt, Nina Wagner-RonaldStanford-JoeMahoney
Run2: GER|Martin Schmidt, Nina Wagner-RonaldStanford-JoeMahoney ---> GER|Martin Schmidt, Nina Wagner, Ronald Stanford-JoeMahoney
Run3: GER|Martin Schmidt, Nina Wagner, Ronald Stanford-JoeMahoney -> GER|Martin Schmidt, Nina Wagner, Ronald Stanford, Joe Mahoney
Run4: No effect, the Dash inbetween (Group1)-(Group2) can no longer match
Each run coverts the 1stDash into CommaSpace, then 1Space is inserted between Group2(Firstname) & Group3(LastnameEverythingElse)
The AnythingBesides[Uppercases,Spaces,Dashes] is too strict, it could've been AnythingBesides[Uppercases] or even .*?
At the time I was concerned about matching names like JuliaLouis-Dreyfus.. I decided against it, but left %reg2% over-complicated.
I can be abit air-headed sometimes.. If I were concerned of such names, I shouldnt 'tag' them in the 1st-place!
A much easier to read %reg2% match would be:
^(.*[|].*)-([A-Z].*?)([A-Z].*)$
If you want, test it out against your massive list of names, I think its easier to follow.
I'm used to overkilling as a precaution.. Its safer, but hurts readability.. A couple more bad habits:
Originally you couldnt lazy-match .*?X to find the 1st-X, so we had to use [^X]*X instead.. Identical, just harder to read.
Also, by dashes I do mean hyphen, I know its the correct name.. Old habits die hard.
With names like JuliaLouis-Dreyfus, the batch creates "Julia Louis-Dreyfus", causing all future %reg2%'s to stop matching.
So you'd end up with something like: GER - First Last, First Last, and Julia Louis-Dreyfus-First-Last-First-Last
I did write a better version that doesnt 'tag' them in the 1st place.
%reg3%
(.*[|].*), (.*)
\1 and \2
Match: (AnythingUntilLast|AnythingTillLast)CommaSpace(Anything)
Replaces a final CommaSpace --> SpaceandSpace
GER|Martin Schmidt, Nina Wagner, Ronald Stanford, Joe Mahoney ---> GER|Martin Schmidt, Nina Wagner, Ronald Stanford and Joe Mahoney
Without a tag, this regex could be dangerous, since any .docx filename matched by /Pattern with CommaSpace would indeed be affected.
Thats another benefit to tagging, it often simplifies secondary regexes (instead of re-checking for a complicated-match, they just match the tag).
%reg4%
(.*)[|](.*)
\1 - \2
Match: (AnythingUntilLast)|(Anything)
The tag-killer: Replace | with -
All done, no more need for the tag.. Per the brc-commandline, processing now goes from %reg4% to /Execute
GER|Martin Schmidt, Nina Wagner, Ronald Stanford and Joe Mahoney -> GER - Martin Schmidt, Nina Wagner, Ronald Stanford and Joe Mahoney
Sorry for not making it easier to read.. Its the formatting that took so long.. Kept getting lost in the text.
Trust me, typing this post was at least 100x harder than just writing some batch/brc commandlines.
If someone told me I had to write an entire manual, I'd be like "You might as well just shoot me now, cuz it aint gonna happen!"
Here are some helpful regex techniques that I dont often see posted, thought I'd mention them.
You may already have them in the manual.. Stopped reading after the case-insensitivity.. Had to try it out!
GROUP FORMAT-MATCHING
If you wanna match something like AA11BB22CC33DD44 (any repeating series of Any2UpperCase,Any2Digits):
([A-Z{2}][\d]{2})+ Its the numbered-occurence modifiers like + or {#} that I almost never see.
It lets you spec a groups repeating-format (just the format-itself, not the exact text).
So if you wanted the keep only the 10th occurence of such a repeating format:
^(.*?)([A-Z]{2}[\d]{2}){10}([A-Z]{2}[\d]{2})+(.*)$ with \1\2\4
beginAA11BB22CC33DD44EE55FF66GG77HH88II99JJ00KK11end ---> beginJJ00end
If you wanted to keep everything besides the 10th occurence when there's 11-or-more occurences:
^(.*?)(([A-Z]{2}[\d]{2}){9})([A-Z]{2}[\d]{2})(([A-Z]{2}[\d]{2})+.*)$ with \1\2\5
beginAA11BB22CC33DD44EE55FF66GG77HH88II99JJ00KK11end --> beginAA11BB22CC33DD44EE55FF66GG77HH88II99KK11end
Since BRU doest support numbered-match-specifiers in the replacement, this sometimes provides a nice workaround.
Not to mention BRU's capture-limit of 9Groups.. The {9} could just as well have been {20} with extremely long repeats.
The main thing is, if you need to capture the 1st 9Groups, you need: ((Group){9})
You can also use ([A-Z]{2}[\d]{2})+ in your #12 with regex to show names with 1-or-more of the format-repeats.
Anywho, almost never see it posted.
I used this method to write an improved version of this batch to filter out names like JuliaLouis-Dreyfus.
Basically, it makes sure that only 2Uppercase may preceed any - in filename (or end of filename).
(exluding the first dash right after country-code)
EXACT-REPEAT MATCHING or case-insensitive matching using the (?i) modifier
Something else I rarely see is putting \2 etc into the match for exact-repeats of a previous group's match.
Using (ABCD)(anytext)(\2)(anytext) Group3 would have to be ABCD, so you could omit either \1 or \3 to kill the repeat.
Of course, if you wanted to kill the the 2nd-occurence, use \2 not (\2).
Hope it all makes sense.. We appreciate all that you're going through for the manuals.. Hopefully this can give you some good ideas.
Any questions, please feel free to ask.