[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

[xsl] Saxon and ZWNJ


Subject: [xsl] Saxon and ZWNJ
From: Mohsen Saboorian <mohsens@xxxxxxxxx>
Date: Mon, 10 Jun 2013 02:12:20 +0430

Hi,
I'm trying to evaluate an XPATH expression with saxon-9.1.0.8 using
the following code snippet:

  Configuration conf = new Configuration();
  conf.setValidation(false);
  Processor p = new Processor(false);
  DocumentBuilder documentBuilder = p.newDocumentBuilder();
  XPathCompiler xpathCompiler = p.newXPathCompiler();

  XPathExecutable xpe = xpathCompiler.compile(expression);
  XPathSelector xpath = xpe.load();
  xpath.setContextItem(documentBuilder.build(new
DOMSource(cleanHtml.document)));

  XdmItem result = xpath.evaluateSingle();

The HTML is in Persian script (whose cleaned DOM is passed as
cleanHtml.document in the above code) which has ZWNJ (U+200C) not
escaped.

The matched XdmItem has ZWNJ (U+200C) (non-escaped) but when obtaining
result.getStringValue(), the result has escaped ZWNJ as (&zwnj;) which
doesn't seem to be correct because I'm getting node 'string' value.

Is this a bug, or is there any flag to disable escaping special
Unicode characters in saxon?

Regards,
Mohsen


Current Thread
Keywords