Page 1 of 1
Regex problem. Finding the file name embedded in a windows path
Posted: Thu Dec 05, 2024 11:35 pm
by BenDupre
Hello,
I am using this regex
[^\\]+$
with an xpath
//Image/@file
to find file names embedded in full Windows style paths contained in the file attribute of the Image elements in my document.
There are 26 of these in the doc. oXygen only returns one of them when the FIND ALL button is pressed.
Any idea why it's behaving like this?
Re: Regex problem. Finding the file name embedded in a windows path
Posted: Fri Dec 06, 2024 12:00 pm
by Radu
Hi Ben,
From what I remember using XPath epressions like "//Image/@file" in the Find/Replace dialog creates certain filtered intervals in which the find should be performed. The filtered intervals include the attribute name, quotes and values. So for an XML element like:
the search interval would be
.
I do not know regexp that well, from what I see $ is defined as:
$ matches the position before the first newline in the string.
But there are no new lines in these filtered intervals.
Maybe instead you could search for quoted values like:
Regards,
Radu
Re: Regex problem. Finding the file name embedded in a windows path
Posted: Mon Dec 09, 2024 10:30 pm
by BenDupre
Thanks for the suggestions.
When I use the Xpath, it returns everything within the quotes, and the quotes as well.
$ in regex is meant to indicate the end of the string so it can match backwards and is the key to getting the last member of the sequence. If it is expecting a newline in there, it will not find one. I think I have the regex worked out, but I am still looking for the answer as to why the find routine grabs one matching value in the middle of the document and stops. This smells buggy to me.
Ben
Re: Regex problem. Finding the file name embedded in a windows path
Posted: Tue Dec 10, 2024 12:54 pm
by adrian
Hello Ben,
If you are familiar with Perl regex, I see why you identify $ as the end of the string. However, Oxygen uses the Java implementation of regex, which even though is based on the Perl one, does not have the same meaning for ^ and $.
See
java.util.regex.Pattern
Boundary matchers
^ The beginning of a line
$ The end of a line
Looking at the Oxygen docs (
Search and Find/Replace Features > Regular Expressions Syntax), I do realize that somehow this specific difference between Java and Perl 5 regex was not highlighted.
The documentation team has mostly handled the "Comparison to Perl 5" from the Java docs, but that also seems to omit this specific difference. I've added a documentation issue to address this.
I am still looking for the answer as to why the find routine grabs one matching value in the middle of the document and stops. This smells buggy to me.
If you'd like to report a bug, I would like to request more details, perhaps a small sample file where you can reproduce the issue.
Please take into consideration the implementation difference that I mentioned between Java and Perl 5 regex.
If you'd like to keep this private, please send an email with the issue to
support@oxygenxml.com or use the
Technical Support form.
Regards,
Adrian