In any case, I've already figured out the difficult work of separating those characters with the following regex
- Code: Select all
(.)[`?`?]
(Note: that last character is not a *, it's just the weirdness of the two Japanese symbols in the character class)
However, what that does is capture the character without the vocalizing mark. I.e., ? matches ? in the first capturing group, whereas my ultimate goal is for it to turn into ?. If you're confused why ? and ? are not the same, read my first paragraph again.
I'm not sure how to turn an unvoiced character into a voiced character with regex. If it's helpful, this was my latest regex attempt for matching and includes every voiced character I could think of.
- Code: Select all
([??????????]?[??????????]?[??????????]?[??????????]?[??]?)[`?]|([??????????]?)[`?]
However, it's not going to return anything because all it's matching is the dakuten or the handakuten portion. To actually match the respective character I'd need to put it as unvoiced in my matching string. Is there a way to do it other than one at a time? For example, a finished one at a time solution would look like this:
- Code: Select all
Match: (.*)(?)[`?](.*)
Replace: \1?\3
Any ideas?
Edit: it seems this forum isn't designed to handle Japanese text. So I'll send you to the website I was using to test it with with some sample text and the above matching string.
https://regex101.com/r/olkxL2/6