[oXygen-user] using automatic character conversion in XML to XML transformation

George Cristian Bina
Mon Oct 9 02:30:19 CDT 2006


Hi Paul,

When the output method is html then the XSLT processor uses a different 
output serializer than the one used when you set the output method to 
xml. For XML there are different rules than the ones for HTML.

In XSLT 1.0 you can set the output encoding to ASCII for instance or to 
some encoding that cannot represent the characters you want to output as 
entities and those characters will be output as as character references, 
the copyright symbol will appear as &_#169; (added one underscore _ to 
avoid the conversion of the character reference to the actual character 
by some email clients).

In XSLT 2.0 you can use character maps to output &_copy; (again I added 
an _ ). You can find below a sample stylesheet that copies the source to 
the output representing the copyright characters as &_copy;

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:character-map name="test">
         <xsl:output-character character="©" string="&amp;copy;"/>
     </xsl:character-map>
     <xsl:output use-character-maps="test"/>
     <xsl:template match="node() | @*">
         <xsl:copy>
             <xsl:apply-templates select="node() | @*"/>
         </xsl:copy>
     </xsl:template>
</xsl:stylesheet>

This stylesheet applied on a document like:

<test>
     <a> © </a>
     <b> © </b>
     <c> © </c>
</test>

will result in
<?xml version="1.0" encoding="UTF-8"?><test>
     <a> &copy; </a>
     <b> &copy; </b>
     <c> &copy; </c>
</test>

But note that the result document is not wellformed as the copy entity 
is used but not declared. To have an wellformed result you need to 
create a DTD like below

test.dtd
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY copy "©">

and change the xsl:output to refer to this DTD

<xsl:output doctype-system="test.dtd" use-character-maps="test"/>

And the result will be now:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test
   SYSTEM "test.dtd">
<test>
     <a> &copy; </a>
     <b> &copy; </b>
     <c> &copy; </c>
</test>

which is wellformed (but not valid against the DTD). If you want the 
output to be also valid you need to update the DTD to contain the 
elements and attributes declarations, in the above example that will be

<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY copy "©">
<!ELEMENT test (a,b,c)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT b (#PCDATA)>
<!ELEMENT c (#PCDATA)>

Best Regards,
George
---------------------------------------------------------------------
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com


Dever, Paul (ELS) wrote:
> If I transform my XML document to HTML using the HTML output method 
> oXygen automatically turns special characters into HTML character 
> entities (e.g., a copyright symbol gets transformed into "&copy;"). 
>  
> I want to use that functionality in a transformation from XML to XML 
> using XML as an output method, but I can't figure out how to do it.  Is 
> there a way?
>  
> Thanks,
> --Paul
> ___________________________
> *Paul Dever* *::* Manager, Electronic Workflows *::* EPD-US
> (T) +1 314-995-3291 *:: *(E)  
> <mailto:> 
> 11830 Westline Industrial Drive, St. Louis, MO 63146
> *ELSEVIER*
> ** 
> ** 
>  
>  
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> oXygen-user mailing list
> 
> http://www.oxygenxml.com/mailman/listinfo/oxygen-user



More information about the oXygen-user mailing list