<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hello Tobias,<br>
<br>
Note that only 4 digits hex codes are supported by the Java/Oxygen
regex engine with the \u unicode code point.<br>
If you use 5 digits, the 5th digit is interpreted independently as a
literal, so this creates undesired side effects.<br>
<br>
e.g.<br>
[\u0100-\u1F9FF] is interpreted as [\u0100-\u1F9F]|[F]. So you are
inadvertently also matching "F".<br>
<br>
Regards,<br>
Adrian<br>
<pre class="moz-signature" cols="72">Adrian Buza
oXygen XML Editor and Author Support
Tel: +1-650-352-1250 ext.2020
Fax: +40-251-461482
</pre>
<br>
<div class="moz-cite-prefix">On 24.06.2016 11:17, Tobias Fischer |
pagina GmbH wrote:<br>
</div>
<blockquote
cite="mid:123b227c-834f-5833-b9ab-13e8c0b06962@pagina-tuebingen.de"
type="cite">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<p>Hi Andreas,</p>
<p>sure, this can be done with basic regex query:<code><span
class="pun"> [</span><span class="pln">\u</span><span
class="lit">00D8</span><span class="pun">-</span><span
class="pln">\u</span><span class="lit">00F6</span><span
class="pun">]</span></code><br>
<code><span class="pun"></span></code></p>
<pre style="" class="lang-py prettyprint prettyprinted"><code><span class="pun">And for your example:
[\u0100-\u1F9FF]
Unfortunately, oXygen 18 seems to have a bug with this query (precisely: with 5 digit hex codes) as it also matches characters below \u0100 (which is the following of \u00FF).
However, you can also work with negation:
[^\u0000-\u00FF]
And this seems to work fine :)
Regards,
Tobias
</span></code></pre>
<pre class="moz-signature" cols="72">Tobias Fischer
XML- und E-Book-Entwicklung
Telefon: +49 (0)7071 9876-44 · Fax: -22
Mail: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:tobias.fischer@pagina-tuebingen.de">tobias.fischer@pagina-tuebingen.de</a>
pagina GmbH - Publikationstechnologien
Herrenberger Straße 51 | D-72070 Tübingen
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="http://www.pagina-online.de">www.pagina-online.de</a> | <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="http://www.parsx.de">www.parsx.de</a>
Handelsregister Stuttgart - HRB 380249
Geschäftsführer: Tobias Ott
</pre>
<div class="moz-cite-prefix">Am 24.06.2016 um 09:50 schrieb
Andreas Wagner:<br>
</div>
<blockquote cite="mid:20160624075049.GW895@hermes.commontology.de"
type="cite">Dear all, <br>
<br>
In order to make sure that we have caught all special characters
in an externally transcribed TEI/XML file, I would like to seach
for all characters above Unicode Codepoint 0x00ff. Can this be
done in the Regular Expression Find box? (I found the search for
single unicode codepoints with \u, \x etc., but can't figure out
if this can be used to search for characters (not) in codepoint
ranges. <br>
<br>
Thanks for any suggestion, <br>
<br>
Andreas <br>
<br>
<br>
<br>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
oXygen-user mailing list
<a class="moz-txt-link-abbreviated" href="mailto:oXygen-user@oxygenxml.com">oXygen-user@oxygenxml.com</a>
<a class="moz-txt-link-freetext" href="https://www.oxygenxml.com/mailman/listinfo/oxygen-user">https://www.oxygenxml.com/mailman/listinfo/oxygen-user</a>
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">
</pre>
</body>
</html>