[oXygen-user] How to I get Oxygen to identify invalid characters? [SEC=UNCLASSIFIED]

Eliot Kimber
Thu Mar 31 08:45:56 CDT 2011


If you're trying to track down encoding issues (which is what this must be),
the SC Unipad tool is an excellent resource for Windows users:
www.unipad.org.

Cheers,

E.

On 3/31/11 7:16 AM, "George Cristian Bina" <> wrote:

> Dear John,
> 
> There is nothing wrong with the "<" character in an XML document, this
> is allowed so oXygen cannot report an error when there is none.
> If you want to check if your document contains such characters then you
> can pass it though a Schematron validation using a Schematron schema
> like below:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <schema xmlns="http://www.ascc.net/xml/schematron">
>      <pattern name="testInvalidCharacter">
>          <rule context="*">
>              <assert test="not(exists(text()[contains(., '<')]))">
>                  The "<" character is not allowed!
>              </assert>
>          </rule>
>      </pattern>
> </schema>
> 
> Best Regards,
> George
> --
> George Cristian Bina
> <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
> http://www.oxygenxml.com
> 
> On 3/31/11 9:01 AM,  wrote:
>> Hi George,
>> 
>> I notice that the content below doesn't include the M$ characters. I expect
>> it is because I have sent the email in plain text. I have temporarily loaded
>> the document up onto the following URL:
>> 
>> http://asdd.ga.gov.au/asdd/work/OESRexmple.xml
>> 
>> Uploading the file seems to have changed the characters from  â~@~T to
>> \342\200\224 or my editor is showing it differently on a different machine.
>> 
>> I hope that this helps.
>> 
>> Thanks.
>> 
>> 
>> John
>> 
>>> -----Original Message-----
>>> From: 
>>> [mailto:] On Behalf Of Hockaday John
>>> Sent: Thursday, 31 March 2011 4:24 PM
>>> To: 
>>> Cc: 
>>> Subject: Re: [oXygen-user] How to I get Oxygen to identify
>>> invalid characters? [SEC=UNCLASSIFIED]
>>> 
>>> Hi George,
>>> 
>>> Here is a snippet of the offending content:
>>> 
>>> <gmd:abstract>
>>>          <gco:CharacterString>The Housing Rental Vacancy Rates
>>> Brief summarises data from a quarterly
>>>            survey of Queensland real estate agencies. The data
>>> presented relates to vacancy rates of
>>>            residential rental detached houses and units, and
>>> is broken down by 5 regions: Inner
>>>            Brisbane, Remainder of Brisbane LGA, Brisbane
>>> Surrounds, Gold Coast and Rest of
>>>            Queensland. Information on each quarter is posted
>>> on the website according to the
>>>            following schedule: March Quarter-first week of
>>> April June Quarter-first week of July
>>>            September Quarter-first week of October December
>>> Quarter-first week of January . Surveys
>>>            have been conducted in 2002-2003, 2003-2004,
>>> 2004-2005, 2005-2006, and 2006-2007.
>>>          </gco:CharacterString>
>>>        </gmd:abstract>
>>> 
>>> Note that the hyphens are of the format:  â~@~T when viewed
>>> in a vi editor on a Solaris platform. Our XML declaration is:
>>> 
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> 
>>> Other validator report these as invalid characters. So I
>>> would like to set Oxygen so that it detects and reports these
>>> characters.
>>> 
>>> Thanks.
>>> 
>>> 
>>> John
>>> 
>>>> -----Original Message-----
>>>> From: George Cristian Bina [mailto:]
>>>> Sent: Thursday, 31 March 2011 10:22 AM
>>>> To: Hockaday John
>>>> Cc: 
>>>> Subject: Re: [oXygen-user] How to I get Oxygen to identify
>>>> invalid characters? [SEC=UNCLASSIFIED]
>>>> 
>>>> Dear John,
>>>> 
>>>> Can you please provide a cut down sample file and a short
>>>> description of
>>>> the exact steps we should follow to reproduce this issue.
>>>> 
>>>> Thank you,
>>>> George
>>>> --
>>>> George Cristian Bina
>>>> <oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
>>>> http://www.oxygenxml.com
>>>> 
>>>> On 3/31/11 1:50 AM,  wrote:
>>>>> Hi All,
>>>>> 
>>>>> I have a ISO 19139 metadata record that someone has cut and
>>>> paste M$ word content into one of the CharacterString fields.
>>>> According to the Validome XML online validator these
>>>> characters (â~@~T or M$ long hyphens) are invalid. Also, when
>>>> we use SAXON to translate this metadata record into XHTML,
>>>> SAXON says that there is a validation error.
>>>>> 
>>>>> When I load this document into Oxygen, version 10.0 build
>>>> 2008102212, it validates OK. Is there some way for me to
>>>> catch these invalid characters during validation?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>>    John Hockaday
>>>>>    Spatial Standards Group (OSDM)
>>>>>    http://www.osdm.gov.au/
>>>>>    john.hockaday\@osdm.gov.au
>>>>> _______________________________________________
>>>>> oXygen-user mailing list
>>>>> 
>>>>> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
>>>> 
>>> _______________________________________________
>>> oXygen-user mailing list
>>> 
>>> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
>>> 
> _______________________________________________
> oXygen-user mailing list
> 
> http://www.oxygenxml.com/mailman/listinfo/oxygen-user

-- 
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.reallysi.com
www.rsuitecms.com




More information about the oXygen-user mailing list