[oXygen-user] How to I get Oxygen to identify invalid characters? [SEC=UNCLASSIFIED]

George Cristian Bina
Thu Mar 31 07:16:28 CDT 2011


Dear John,

There is nothing wrong with the "—" character in an XML document, this 
is allowed so oXygen cannot report an error when there is none.
If you want to check if your document contains such characters then you 
can pass it though a Schematron validation using a Schematron schema 
like below:

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
     <pattern name="testInvalidCharacter">
         <rule context="*">
             <assert test="not(exists(text()[contains(., '—')]))">
                 The "—" character is not allowed!
             </assert>
         </rule>
     </pattern>
</schema>

Best Regards,
George
-- 
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

On 3/31/11 9:01 AM,  wrote:
> Hi George,
>
> I notice that the content below doesn't include the M$ characters. I expect it is because I have sent the email in plain text. I have temporarily loaded the document up onto the following URL:
>
> http://asdd.ga.gov.au/asdd/work/OESRexmple.xml
>
> Uploading the file seems to have changed the characters from  â~@~T to \342\200\224 or my editor is showing it differently on a different machine.
>
> I hope that this helps.
>
> Thanks.
>
>
> John
>
>> -----Original Message-----
>> From: 
>> [mailto:] On Behalf Of Hockaday John
>> Sent: Thursday, 31 March 2011 4:24 PM
>> To: 
>> Cc: 
>> Subject: Re: [oXygen-user] How to I get Oxygen to identify
>> invalid characters? [SEC=UNCLASSIFIED]
>>
>> Hi George,
>>
>> Here is a snippet of the offending content:
>>
>> <gmd:abstract>
>>          <gco:CharacterString>The Housing Rental Vacancy Rates
>> Brief summarises data from a quarterly
>>            survey of Queensland real estate agencies. The data
>> presented relates to vacancy rates of
>>            residential rental detached houses and units, and
>> is broken down by 5 regions: Inner
>>            Brisbane, Remainder of Brisbane LGA, Brisbane
>> Surrounds, Gold Coast and Rest of
>>            Queensland. Information on each quarter is posted
>> on the website according to the
>>            following schedule: March Quarter-first week of
>> April June Quarter-first week of July
>>            September Quarter-first week of October December
>> Quarter-first week of January . Surveys
>>            have been conducted in 2002-2003, 2003-2004,
>> 2004-2005, 2005-2006, and 2006-2007.
>>          </gco:CharacterString>
>>        </gmd:abstract>
>>
>> Note that the hyphens are of the format:  â~@~T when viewed
>> in a vi editor on a Solaris platform. Our XML declaration is:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> Other validator report these as invalid characters. So I
>> would like to set Oxygen so that it detects and reports these
>> characters.
>>
>> Thanks.
>>
>>
>> John
>>
>>> -----Original Message-----
>>> From: George Cristian Bina [mailto:]
>>> Sent: Thursday, 31 March 2011 10:22 AM
>>> To: Hockaday John
>>> Cc: 
>>> Subject: Re: [oXygen-user] How to I get Oxygen to identify
>>> invalid characters? [SEC=UNCLASSIFIED]
>>>
>>> Dear John,
>>>
>>> Can you please provide a cut down sample file and a short
>>> description of
>>> the exact steps we should follow to reproduce this issue.
>>>
>>> Thank you,
>>> George
>>> --
>>> George Cristian Bina
>>> <oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
>>> http://www.oxygenxml.com
>>>
>>> On 3/31/11 1:50 AM,  wrote:
>>>> Hi All,
>>>>
>>>> I have a ISO 19139 metadata record that someone has cut and
>>> paste M$ word content into one of the CharacterString fields.
>>> According to the Validome XML online validator these
>>> characters (â~@~T or M$ long hyphens) are invalid. Also, when
>>> we use SAXON to translate this metadata record into XHTML,
>>> SAXON says that there is a validation error.
>>>>
>>>> When I load this document into Oxygen, version 10.0 build
>>> 2008102212, it validates OK. Is there some way for me to
>>> catch these invalid characters during validation?
>>>>
>>>> Thanks.
>>>>
>>>>    John Hockaday
>>>>    Spatial Standards Group (OSDM)
>>>>    http://www.osdm.gov.au/
>>>>    john.hockaday\@osdm.gov.au
>>>> _______________________________________________
>>>> oXygen-user mailing list
>>>> 
>>>> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
>>>
>> _______________________________________________
>> oXygen-user mailing list
>> 
>> http://www.oxygenxml.com/mailman/listinfo/oxygen-user
>>



More information about the oXygen-user mailing list