Line end characters being removed

Having trouble installing Oxygen? Got a bug to report? Post it all here.
dancj
Posts: 9
Joined: Fri Sep 30, 2011 4:05 pm

Line end characters being removed

Post by dancj »

Hi - I'm just evaluating oXyGen XML Developer for my company and I've found the following issue.

If I format some XML that has line end characters within an element it removes the line ends (unless it seems the tagname is "address").

So:
<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
</unit>

Becomes:
<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street Sometown Norway EX2 7HY</address1>
</unit>

Is there any fix for this?

Thanks

Dan
dancj
Posts: 9
Joined: Fri Sep 30, 2011 4:05 pm

Re: Line end characters being removed

Post by dancj »

Since posting that I have found Options/Preferences/Editor/Format/XML where you can specify specific Element names and XPath expressions that don't get formatted, but having to rely on pre-warning the app about any fields that I don't want messed up seems very dangerous - and putting //* in as one of the options just stops the formatting from working at all.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Line end characters being removed

Post by sorin_ristache »

Hello,

The Format and Indent action breaks the line only when it exceeds the maximum length specified in the option Line Width - Format and Indent from Options -> Preferences -> Editor / Format. That means if the line is shorter it is joined with the next one.

You have 2 options for preserving the text nodes from the address1 element:
  • add the attribute xml:space="preserve" to the address1 element
  • add the element name (address1) to the Preserve space list of elements from Options -> Preferences -> Editor / Format / XML


Regards,
Sorin
dancj
Posts: 9
Joined: Fri Sep 30, 2011 4:05 pm

Re: Line end characters being removed

Post by dancj »

Thanks for the reply.

Unfortunately both of those options mean I can't just stick an unknown piece of XML and format it without danger of changing the XML.

I think that's going to be a deal breaker for us.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Line end characters being removed

Post by sorin_ristache »

dancj wrote:having to rely on pre-warning the app about any fields that I don't want messed up seems very dangerous
Formatting an XML document applies the rules set in the Format and Format / XML preferences panels. You want an exception in the normal process of formatting for some elements, this is why you have to mark the exception elements explicitly.


Regards,
Sorin
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Line end characters being removed

Post by sorin_ristache »

dancj wrote:Thanks for the reply.

Unfortunately both of those options mean I can't just stick an unknown piece of XML and format it without danger of changing the XML.

I think that's going to be a deal breaker for us.
Please give us some examples of the expected result for the formatting action. Do you want to preserve all text nodes of the XML document? If yes, how would you want to re-format the nodes by running the Format and Indent action?


Thank you,
Sorin
dancj
Posts: 9
Joined: Fri Sep 30, 2011 4:05 pm

Re: Line end characters being removed

Post by dancj »

sorin wrote:
dancj wrote:Thanks for the reply.

Unfortunately both of those options mean I can't just stick an unknown piece of XML and format it without danger of changing the XML.

I think that's going to be a deal breaker for us.
Please give us some examples of the expected result for the formatting action. Do you want to preserve all text nodes of the XML document? If yes, how would you want to re-format the nodes by running the Format and Indent action?


Thank you,
Sorin
I would expect:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<unit><address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1><anotherNode><childNode>aa</childNode></anotherNode></unit>
to become:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
<anotherNode>
<childNode>aa</childNode>
</anotherNode>
</unit>
So basically the content text-elements get left alone, but the space between elements gets adjusted to make the XML easier to read. I thought this was pretty standard behaviour for XML editors.
dancj
Posts: 9
Joined: Fri Sep 30, 2011 4:05 pm

Re: Line end characters being removed

Post by dancj »

That should say "content of text-elements"
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Line end characters being removed

Post by sorin_ristache »

Removing whitespaces like end of line, tab, space is called normalization of an XML document and does not change the canonical form of the document. Is it important for you to preserve all text nodes? In such a case you can specify //text() in the Preserve space list from Options -> Preferences -> Editor / Format / XML.


Regards,
Sorin
dancj
Posts: 9
Joined: Fri Sep 30, 2011 4:05 pm

Re: Line end characters being removed

Post by dancj »

sorin wrote:Removing whitespaces like end of line, tab, space is called normalization of an XML document and does not change the canonical form of the document.
I'm not sure about "canonical form" but if you do it within text nodes you're changing the data contained in the XML
sorin wrote:Is it important for you to preserve all text nodes? In such a case you can specify //text() in the Preserve space list from Options -> Preferences -> Editor / Format / XML.
I just tried that. Unfortunately it didn't work. Does it rely on the XML having an XSD that specifies that the element is a text datatype? Is it not enough just to have text contained in the element?
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Line end characters being removed

Post by sorin_ristache »

I am sorry, only a subset of the XPath language is supported for the Preserve space list. You have to specify the element names.


Regards,
Sorin
dancj
Posts: 9
Joined: Fri Sep 30, 2011 4:05 pm

Re: Line end characters being removed

Post by dancj »

sorin wrote:I am sorry, only a subset of the XPath language is supported for the Preserve space list. You have to specify the element names.


Regards,
Sorin
Okay, thanks.
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Re: Line end characters being removed

Post by george »

Hi Dan,

What you want is accomplished with the "Preserve text as it is" option from Options->Preferences -- Editor / Format / XML.

Best Regards,
George
George Cristian Bina
dancj
Posts: 9
Joined: Fri Sep 30, 2011 4:05 pm

Re: Line end characters being removed

Post by dancj »

Thanks. That is a lot better, but it does still insert indents into the text - so I get:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
<anotherNode>
<childNode>aa</childNode>
</anotherNode>
</unit>
instead of:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
<anotherNode>
<childNode>aa</childNode>
</anotherNode>
</unit>
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Re: Line end characters being removed

Post by george »

Hi,

Can you try to use the "Reset defaults" on that page and then set the "Preserve text as it is" option? My tests show that the following document should remain unchanged after format and indent:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<unit>
<address1>123 Somewhere Street
Sometown Norway
EX2 7HY</address1>
<anotherNode>
<childNode>aa</childNode>
</anotherNode>
</unit>
Best Regards,
George
George Cristian Bina
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Line end characters being removed

Post by sorin_ristache »

dancj wrote:Thanks. That is a lot better, but it does still insert indents into the text - so I get:
I cannot reproduce the problem. If I select the option Preserve text as it is in Preferences - Editor - Format - XML only the XML tags are re-indented, not the text that appears between the XML tags. Please send us using this online form a sample XML document for reproducing the problem. Please include also your user preferences which you can export from menu Options -> Export Global Options.


Regards,
Sorin
dancj
Posts: 9
Joined: Fri Sep 30, 2011 4:05 pm

Re: Line end characters being removed

Post by dancj »

Ah - it wasn't the formatting. It put the indents in because "Indent on paste - sections with number of lines less than 300" on the same page was ticked.

With that unticked it all works well.

Thanks

Dan
Post Reply