[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

RE: [xsl] Encoding issues with document() function


Subject: RE: [xsl] Encoding issues with document() function
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Sat, 4 Nov 2006 17:56:55 -0000

>      I am facing problems in removing 0xb,0xc,0xe,0xf. What 
> will be the representation for these characters in UTF-8. 

You obviously didn't understand Joe's reply, and I'm having trouble finding
a way of rephrasing it so that you do understand it. Your problem is not
that these characters are incorrectly encoded, but that they are not legal
characters in XML. Changing the encoding will not help. You need to find
some way of representing the information content of your document using
characters that are legal in XML. Only you know what these characters
actually mean, and therefore how to replace them with something legal.

Michael Kay
http://www.saxonica.com/

For 
> 0x1 ia m using::
> '\u0001' and it works fine. But the problem is with 0xb,0xc,0xe,0xf.
> 
> thanks
> pankaj
> ----- Original Message -----
> From: "Joe Fawcett" <joefawcett@xxxxxxxxxxx>
> To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
> Sent: Saturday, November 04, 2006 6:00 PM
> Subject: Re: [xsl] Encoding issues with document() function
> 
> 
> > It doesn't matter about the encoding. XML cannot have 0xb, 
> 0xc, 0xe and
> 0xf
> > in it.
> > You can base64encode the data if it's part of an element's 
> content before
> > passing it to the XML parser, or replace the characters 
> with allowed ones
> > and then post process the data later to re-insert.
> >
> > Joe
> >
> >
> > >From: "Pankaj Bishnoi" <pankaj.bishnoi@xxxxxxxxxxx>
> > >Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > >To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
> > >Subject: Re: [xsl] Encoding issues with document() function
> > >Date: Sat, 4 Nov 2006 17:53:11 +0530
> > >
> > >Thanks for your help michael. Now i am replacing unicode 
> characters.
> > >
> > >I have the encoding UTF-8 now::
> > >
> > >for 0x2 i can use replace('\u0002','')
> > >
> > >but for following characers what will be the replace character::
> > >
> > >0xa,0xb,0xc,0xd,0xe,0xf
> > >
> > >
> > >Thanks
> > >Pankaj
> > >
> > >----- Original Message -----
> > >From: "Michael Kay" <mike@xxxxxxxxxxxx>
> > >To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
> > >Sent: Saturday, November 04, 2006 3:08 PM
> > >Subject: RE: [xsl] Encoding issues with document() function
> > >
> > >
> > > > If the document really does contain the Unicode character with
> codepoint
> > > > 0x02, then it's not a well-formed XML document, and you 
> won't be able
> to
> > > > read it from XSLT or from anything else that's designed 
> to process
> XML.
> > >You
> > > > need  to correct the program that created the document 
> so that it
> > >outputs
> > > > well-formed XML.
> > > >
> > > > The other possibility is that the document contains some other
> character
> > > > which is being misread as codepoint 0x02 because the 
> parser is using
> the
> > > > wrong encoding, for example because the XML declaration 
> is incorrect.
> > > >
> > > > Michael Kay
> > > > http://www.saxonica.com/
> > > >
> > > > > -----Original Message-----
> > > > > From: Pankaj Bishnoi [mailto:pankaj.bishnoi@xxxxxxxxxxx]
> > > > > Sent: 04 November 2006 09:24
> > > > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > > > > Subject: [xsl] Encoding issues with document() function
> > > > >
> > > > > Hi All
> > > > >         I am having a xsl in which i use XSLT document()
> > > > > function. The problem i am facing is that the xml file i am
> > > > > trying to read by using
> > > > > document() function is having some Unicode characters and the
> > > > > exception thrown at transformation time is ::
> > > > >
> > > > > SystemId Unknown; Line #133;Column #104; Can not load
> > > > > requested doc: An invalid XML character(Unicode: 0x2) was
> > > > > found in the element content of the document
> > > > >
> > > > > The source xml file is having encoding UTF-8. I tried to
> > > > > search the web for this issue and one alternate specified is
> > > > > to replace thos '0x2' character.
> > > > > Now there can be other characters as well that might come in
> > > > > other scenarios such as 0x1,0x13 etc. Now my quesstion is is
> > > > > there any encoding that supports all these characters?
> > > > >
> > > > > Is there any way out for this issue . Any help will be highly
> > > > > appreciated.
> > > > >
> > > > > Thanks
> > > > > Pankaj


Current Thread
Keywords