format adding whitespace

Having trouble installing Oxygen? Got a bug to report? Post it all here.
alaue
Posts: 4
Joined: Thu Jul 07, 2005 3:11 am

format adding whitespace

Post by alaue »

Hello,

I'm sorting through the various methods for dealing with the addition and deletion of whitespace by the oXygen format feature. I've implemented the xml:space attribute in my DTD, but one instance of this problem seems to escape such systematic treatment.

In my original TEI document, I have the following encoding:

Code: Select all


<p>pcdata content <del><del>alo<add>d</add><del>w</del></del> a</del> pcdata content</p>
If I format the document instance, I get the following:

Code: Select all


<p>pcdata content <del>
<del>alo<add>d</add>
<del>w</del>
</del> a</del> pcdata content</p>
Thus, the format function effectively adds whitespace between the d and the w and between the w and the a.

What I don't understand is why "<add>d</add>" doesn't get a newline but "<del>w</del>" does. In other words, why doesn't the result of format look like this:

Code: Select all


<p>pcdata content <del>
<del>alo
<add>d</add>
<del>w</del>
</del>
a</del> pcdata content</p>
I realize that either result still adds text-only nodes to my document. So, I guess I really have two questions:

1. Why the formatting described above?

2. We can't specify that all whitespace-only nodes in <p> can be removed as there are instances when we intend there to be whitespace; however, we can't preserve these introduced whitespace-only nodes either. Any advice on how we can accomplish one without the other?

Thanks,
Andrea
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Post by george »

Hi Andrea,

Thank you for your message.
We added a couple of fixes to the format and indent action to make sure the text nodes are not removed/added in mixed content elements. They will be available in 6.1 tomorrow.

Best Regards,
George
alaue
Posts: 4
Joined: Thu Jul 07, 2005 3:11 am

format and whitespace in oXygen 6.1

Post by alaue »

Hi, George:

Thanks for the update and for the improvements to the software. Several of our problems were solved by it.

Unfortunately, all instances of the problem weren't solved by the recent update. In elements that allow mixed content, some spaces still seem to dropout during the format transform.

For instance, we're losing whitespace between two elements when the only other content between those two elements in an empty element.

</a> <b/><a>

We're losing the space after </a> and before <b/>. The parent here is a paragraph (<p>), which allows mixed content.

I have declared a div1[@xml:space="preserve"] in our DTD. The paragraphs (<p>s) in question are children of a <div1>.

I can provide real examples, if that would be more helpful.

Thanks,
Andrea
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Post by george »

Hi Andrea,

Oxygen does not read the DTD/Schema on format and indent so adding xml:space in the DTD has no effect on the action. You can add that in the document and then it will be taken into account by the format and indent action. Also you can add an element to the preserve elements list in the options, that is equivalent with having all the elements in the document with that name with an xml:space="preserve" attribute.

Best Regards,
George
alaue
Posts: 4
Joined: Thu Jul 07, 2005 3:11 am

format and whitespace in oXygen 6.1

Post by alaue »

Good Morning, George:

I had experimented with the preserve space option in oXygen, but in my case it really defeats the purpose of format. Since the element in question is our paragraph element (<p>)--we're using modified TEI--adding it to the preserve space list in oXygen basically defeats the purpose of format. We end up with paragraphs the length of many screens. Lots of horizontal scrolling, and very little actual formatting.

Any other ideas?

Thanks,
Andrea
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Post by george »

Hi Andrea,

Please provide some real examples so we can give a clear answer.
In general one can use xml:space="preserve" only on the specific element that has element only content to prevent that to be formated.

Best Regards,
George
alaue
Posts: 4
Joined: Thu Jul 07, 2005 3:11 am

format and whitespace in oXygen 6.1

Post by alaue »

Hi, George:

After a long absence, I'm back on the case...

Seems like the problem with whitespace persists when you have nested elements that allow mixed content. For instance:

Code: Select all


<p><hi rend=”shaded”>This is <hi rend=”italic”>the</hi> <hi rend=”bold”>text</hi></hi></p>
I'm losing the space between the italicized "the" and the bode "text". However, in this case:

Code: Select all


<p>This is <hi rend=”italic”>the</hi> <hi rend=”bold”>text</hi></p>
I don't lose my space here. (The only difference in the encoding being the <hi rend="shaded">.)

Any ideas?

Thanks,
Andrea
george
Site Admin
Posts: 2095
Joined: Thu Jan 09, 2003 2:58 pm

Post by george »

Hi Andrea,

We cannot reproduce the problem. I get a line break between the and text when I try to format it:

Code: Select all


<?xml version="1.0" encoding="UTF-8"?>
<p>
<hi rend="shaded">This is <hi rend="italic">the</hi>
<hi rend="bold">text</hi></hi>
</p>
Best Regards,
George
Post Reply