<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hello Tobias,<br>

    <br>

    Note that only 4 digits hex codes are supported by the Java/Oxygen

    regex engine with the \u unicode code point.<br>

    If you use 5 digits, the 5th digit is interpreted independently as a

    literal, so this creates undesired side effects.<br>

    <br>

    e.g.<br>

    [\u0100-\u1F9FF] is interpreted as [\u0100-\u1F9F]|[F]. So you are

    inadvertently also matching "F".<br>

    <br>

    Regards,<br>

    Adrian<br>

    <pre class="moz-signature" cols="72">Adrian Buza

oXygen XML Editor and Author Support

Tel: +1-650-352-1250 ext.2020

Fax: +40-251-461482

</pre>

    <br>

    <div class="moz-cite-prefix">On 24.06.2016 11:17, Tobias Fischer |

      pagina GmbH wrote:<br>

    </div>

    <blockquote

      cite="mid:123b227c-834f-5833-b9ab-13e8c0b06962@pagina-tuebingen.de"

      type="cite">

      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

      <p>Hi Andreas,</p>

      <p>sure, this can be done with basic regex query:<code><span

            class="pun"> [</span><span class="pln">\u</span><span

            class="lit">00D8</span><span class="pun">-</span><span

            class="pln">\u</span><span class="lit">00F6</span><span

            class="pun">]</span></code><br>

        <code><span class="pun"></span></code></p>

      <pre style="" class="lang-py prettyprint prettyprinted"><code><span class="pun">And for your example:

[\u0100-\u1F9FF]

Unfortunately, oXygen 18 seems to have a bug with this query (precisely: with 5 digit hex codes) as it also matches characters below \u0100 (which is the following of \u00FF).

However, you can also work with negation:

[^\u0000-\u00FF]

And this seems to work fine :)

Regards,

Tobias

</span></code></pre>

      <pre class="moz-signature" cols="72">Tobias Fischer

XML- und E-Book-Entwicklung

Telefon: +49 (0)7071 9876-44 · Fax: -22

Mail: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:tobias.fischer@pagina-tuebingen.de">tobias.fischer@pagina-tuebingen.de</a>

pagina GmbH - Publikationstechnologien

Herrenberger Straße 51 | D-72070 Tübingen

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="http://www.pagina-online.de">www.pagina-online.de</a> | <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="http://www.parsx.de">www.parsx.de</a>

Handelsregister Stuttgart - HRB 380249

Geschäftsführer: Tobias Ott

</pre>

      <div class="moz-cite-prefix">Am 24.06.2016 um 09:50 schrieb

        Andreas Wagner:<br>

      </div>

      <blockquote cite="mid:20160624075049.GW895@hermes.commontology.de"

        type="cite">Dear all, <br>

        <br>

        In order to make sure that we have caught all special characters

        in an externally transcribed TEI/XML file, I would like to seach

        for all characters above Unicode Codepoint 0x00ff. Can this be

        done in the Regular Expression Find box? (I found the search for

        single unicode codepoints with \u, \x etc., but can't figure out

        if this can be used to search for characters (not) in codepoint

        ranges. <br>

        <br>

        Thanks for any suggestion, <br>

        <br>

        Andreas <br>

        <br>

        <br>

        <br>

      </blockquote>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

oXygen-user mailing list

<a class="moz-txt-link-abbreviated" href="mailto:oXygen-user@oxygenxml.com">oXygen-user@oxygenxml.com</a>

<a class="moz-txt-link-freetext" href="https://www.oxygenxml.com/mailman/listinfo/oxygen-user">https://www.oxygenxml.com/mailman/listinfo/oxygen-user</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">

</pre>

  </body>

</html>