Remove characters NOT between numbers

A swapping-ground for Regular Expression syntax

Remove characters NOT between numbers

Postby DerekLee1 » Sat Sep 24, 2022 6:39 pm

I have a lot of files that will include names, dates, and titles or locations, and a lot of variations in the delimiters. What I'd like to be able to do is have a regex that can remove all characters NOT between numbers, and then maybe another regex to replace characters ONLY if between numbers.

Examples of removing characters between numbers (for me this will always be dates, but potentially different date formats as below):

John Smith - 2022.07.14.Tampa.Florida.mp4 becomes John Smith - 2022.07.14 Tampa Florida by removing all periods NOT between numeric characters
tracy.johnson.19.06.22.houston.texas becomes tracy johnson 19.06.22 houston texas by removing all periods NOT between numberic characters
chris-moore-2-22-2017-detroit, michigan becomes chris moore 2-22-2017 detroit, michigan by removing all dashes NOT between numeric characters


Then the next step would be to replace characters between numbers with a different character for uniformity, sometimes as a second process after the above regex, sometimes as its own and only process:

John Smith - 2022.07.14 Tampa Florida becomes John Smith - 2022-07-14 Tampa Florida by replacing periods with dashes ONLY if the periods are between numbers
tracy johnson 19.06.22 houston texas becomes tracy johnson 19-06-22 houston texas by replacing periods with dashes ONLY if the periods are between numbers
chris moore - 2-22-2018 - detroit, michigan becomes chris moore - 2.22.2018 - detroit, michigan by replacing dashes with periods ONLY if the dashes are between numbers
jupiter jones 2017 11 13 san jose, california becomes jupiter jones 2017-11-13 san jose, california by replacing spaces with dashes ONLY if the spaces are between numbers

I hope this makes sense. I tried to give different examples so that I can be flexibile with both of the regexes. I need two different ones because when I get the files they don't always have the same issues, and I need to be able to use one or the other, and possibly in different orders of operations to accomplish the final naming schema that I need. Thanks for any help here!
DerekLee1
 
Posts: 1
Joined: Sat Sep 24, 2022 6:19 pm

Rename outside of date-formats, then inside of date-formats

Postby Luuk » Sun Sep 25, 2022 3:54 pm

The "v2" regexs will grant using (?X) to separate many different "Match" and "Replaces" like...
Match--1(?X)Match--2(?X)Match--3(?X)...
Replace1(?X)Replace2(?X)Replace3(?X)...

So this 1st-one will only rename outside of those date-formats...
\x20- /g(?X)[-.](?=.*(?:\d{1,2}[-. ]\d\d[-. ](?:19|20)\d\d|(?:19|20)?\d\d[-. ][01]\d[-. ][0-3]\d))/g(?X)(\G(?!^)|(?:\d{1,2}[-. ]\d\d[-. ](?:19|20)\d\d|(?:19|20)?\d\d[-. ][01]\d[-. ][0-3]\d))[^-.]*\K[-.]/g(?X):/g
:(?X) (?X) (?X) -\x20

Since ' - ' should not be replaced, the 1st-part just converts them to ':', and the last-part converts them back, renaming like...
chris-noore-2-22-2017-detroit, michigan ----> chris noore 2-22-2017 detroit, michigan
chris-noore-2-22-2017-Word1 - Word2 -------> chris noore 2-22-2017 Word1 - Word2
John Smith - 2022.07.14.Tampa.Florida -----> John Smith - 2022.07.14 Tampa Florida
jupiter jones 2017 11 13 jose california -----> (no changes, because no - or . outside of date)
tracy.johnson.19.06.22.houston.texas -------> tracy johnson 19.06.22 houston texas

To rename inside of those date-formats, will also need more than just 1-Match/Replace, because the "Replace" wont always be the same.
Except now Im guessing that date-formats starting with years want '-' separators, but when ending with years, they want '.' separators?
Im also guessing that the original date-formats can use either '.', or space, or '-' as any of their separators?

So if the guessing is true, this 2nd-one renames inside of those date-formats, using a "Match" and "Replace" like...
(\d{1,2})[-. ]([0-3]\d)[-. ]((19|20)\d\d)(?X)((?:19|20)?\d\d)[-. ]([01]\d)[-. ]([0-3]\d)
$1.$2.$3(?X)$1-$2-$3

This to change just the date-formats like...
2-22-2018 ----> 2.22.2018
2022.07.14 ---> 2022-07-14
19.06.22 ------> 19-06-22
2017 11 13 ----> 2017-11-13

You could also join both of them together, using (?X) inside of the Match and Replace boxes like...
\x20- /g(?X)[-.](?=.*(?:\d{1,2}[-. ]\d\d[-. ](?:19|20)\d\d|(?:19|20)?\d\d[-. ][01]\d[-. ][0-3]\d))/g(?X)(\G(?!^)|(?:\d{1,2}[-. ]\d\d[-. ](?:19|20)\d\d|(?:19|20)?\d\d[-. ][01]\d[-. ][0-3]\d))[^-.]*\K[-.]/g(?X):/g(?X)(\d{1,2})[-. ]([0-3]\d)[-. ]((19|20)\d\d)(?X)((?:19|20)?\d\d)[-. ]([01]\d)[-. ]([0-3]\d)
:(?X) (?X) (?X) - (?X)$1.$2.$3(?X)$1-$2-$3

So this last-one just combines everything with (?X) to rename both inside and outside of the date-formats like...
chris-moore-2-22-2017-detroit, michigan ---> chris moore 2.22.2017 detroit, michigan
chris-moore-2-22-2017-Word1 - Word2 ------> chris moore 2.22.2017 Word1 - Word2
John Smith - 2022.07.14.Tampa.Florida -----> John Smith - 2022-07-14 Tampa Florida
jupiter jones 2017 11 13 jose california -----> jupiter jones 2017-11-13 jose california
tracy.johnson.19.06.22.houston.texas -------> tracy johnson 19-06-22 houston texas

Whenever you see the \x20, its really just saying space, because the forums like to delete spaces at the start and end of lines.
If needing to simplify everything, can just replace all of the (?X) with a carriage-return to see all the different Match/Replaces.
Sorry for making the last-one too small, but can always use the browser to zoom-in, its only small to make the copy/pasting easier.

If having more date-formats to conduct, or if the guessing is wrong, can always just post more examples.
I did try to make all of the [replace-chars] in red, in case you need to add more characters somewhere inside.
If adding more characters, just make sure that - is always 1st, because otherwise regex think its saying a range.
If needing explanations on anything else, just say which parts should be described.. Good luck!
Luuk
 
Posts: 706
Joined: Fri Feb 21, 2020 10:58 pm

Re: Remove characters NOT between numbers

Postby Admin » Mon Sep 26, 2022 10:30 am

I just want to add that, with Javascript renaming, a script could be made that:
- search for character A
- if not between numbers replace it with character B
- if between numbers replace it with character C
etc
Admin
Site Admin
 
Posts: 2354
Joined: Tue Mar 08, 2005 8:39 pm


Return to Regular Expressions