Synchronizing XML formatting between Oxygen and external tools (+ Git)
Post here questions and problems related to editing and publishing DITA content.
-
- Posts: 922
- Joined: Thu May 02, 2019 2:32 pm
Synchronizing XML formatting between Oxygen and external tools (+ Git)
Post by chrispitude »
Hi all,
We have some external tools that we run from Oxygen to update our DITA topics and maps. For example,
Our DITA content is stored in a Git repo, which means that Git notices changes in XML formatting as file differences. When I drag-and-drop a file into the DITA Maps Manager in Oxygen, then make it conditional, long attributes are placed on separate lines as follows:
Our external tool uses the perl XML::Twig package to read/write XML. When the content round-trips through the utility, they are placed on the same line:
Oxygen does quite a commendable job of leaving existing XML structures intact. However, changes in structure can cause the content to get reformatted to multiple lines again, such as if the topic is reordered or surrounding structure is changed.
These XML formatting differences are causing Git conflicts. As a first step, I'd like to align the XML formatting between Oxygen and my external tools as much as possible.
In Oxygen's preferences, I see the following settings:
We have some external tools that we run from Oxygen to update our DITA topics and maps. For example,
- Update topics with target text for cross-book xrefs
- Update maps with corrected subtopic structure for nested topic structures in a single file
Our DITA content is stored in a Git repo, which means that Git notices changes in XML formatting as file differences. When I drag-and-drop a file into the DITA Maps Manager in Oxygen, then make it conditional, long attributes are placed on separate lines as follows:
Code: Select all
<chapter>
<topicref
href="ptug/using_primetime_with_spice/simlink/correlating_arc_based_coupled_primetime_si_spice_analysis.dita"
keys="correlating_arc_based_coupled_primetime_si_spice_analysis"
product="library(LC)"/>
</chapter>
Code: Select all
<chapter>
<topicref href="ptug/using_primetime_with_spice/simlink/correlating_arc_based_coupled_primetime_si_spice_analysis.dita" keys="correlating_arc_based_coupled_primetime_si_spice_analysis" product="library(LC)"/>
</chapter>
These XML formatting differences are causing Git conflicts. As a first step, I'd like to align the XML formatting between Oxygen and my external tools as much as possible.
In Oxygen's preferences, I see the following settings:
image.png
but the setting that sounds like it would keep the attributes on the same line is already unchecked. Does adding topics in the DITA Maps Manager bypass this setting?You do not have the required permissions to view the files attached to this post.
-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Synchronizing XML formatting between Oxygen and external tools (+ Git)
Hi Chris,
In the Oxygen Preferences->"Editor / Format" there is a "Line width" setting. Oxygen tries to add line breaks so that the maximum number of characters on each line does not overflow that limit. You can experiment setting a very large line width there.
So even with that "Break line before attribute" checkbox unchecked, Oxygen will still break lines between attributes if it considers that otherwise the line of text will be longer than the maximum line width set in the preferences. I will try to explain this behavior more clearly in the user's manual for the next releae.
That ""Break line before attribute"" takes effect more when the element's start tag along with the attributes does not go over the maximum line width. So if you have a small element like:
setting and unsetting the "Break line before attribute" checbox will take effect on it because the element's start tag does not overflow the maximum line width.
If you set a very long "Line width" value, Oxygen will still add line breaks in element-only content, but this will influence also the DITA topics.
Regards,
Radu
In the Oxygen Preferences->"Editor / Format" there is a "Line width" setting. Oxygen tries to add line breaks so that the maximum number of characters on each line does not overflow that limit. You can experiment setting a very large line width there.
So even with that "Break line before attribute" checkbox unchecked, Oxygen will still break lines between attributes if it considers that otherwise the line of text will be longer than the maximum line width set in the preferences. I will try to explain this behavior more clearly in the user's manual for the next releae.
That ""Break line before attribute"" takes effect more when the element's start tag along with the attributes does not go over the maximum line width. So if you have a small element like:
Code: Select all
<elem a="b" c="d">
If you set a very long "Line width" value, Oxygen will still add line breaks in element-only content, but this will influence also the DITA topics.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 922
- Joined: Thu May 02, 2019 2:32 pm
Re: Synchronizing XML formatting between Oxygen and external tools (+ Git)
Post by chrispitude »
Thanks Radu, this is exactly what I needed!
Does the "Format and Indent" operation behave identically to new content/element creation in the topic editor and the DITA Maps Manager? If so, then what I can do is
I'll follow up here with my progress!
Does the "Format and Indent" operation behave identically to new content/element creation in the topic editor and the DITA Maps Manager? If so, then what I can do is
- Run Format and Indent on every file in the repo
- Write a simple perl script to read/write the XML
- Compare #1 and #2 for differences
I'll follow up here with my progress!
-
- Posts: 922
- Joined: Thu May 02, 2019 2:32 pm
Re: Synchronizing XML formatting between Oxygen and external tools (+ Git)
Post by chrispitude »
Hi Radu,
I ran into my first obstacle. Let's say I start with this DITA content:
In the Author mode, if I define @format="dita" in the Attributes view, the underlying XML turns into this:
Note that the ending "/>" tag fragment of the <xref> element is wrapped to the next line. The perl XML package I'm using does not support separately breaking and wrapping just this "/>" fragment to the next line. Is there a way to disable this in Oxygen? I don't see anything in Editor / Format / XML that pertains specifically to these ending-tag fragments.
I ran into my first obstacle. Let's say I start with this DITA content:
Code: Select all
<p><xref href="#creating_the_virtual_top_level_netlist/fig_ipq_mw3_qlb" type="fig"/>
shows a 2D and 3D view of a top-level design with an SoC and memory design.</p>
<fig id="fig_ipq_mw3_qlb">
<title>2D and 3D views of a Top-Level Design</title>
</fig>
Code: Select all
<p><xref format="dita" href="#creating_the_virtual_top_level_netlist/fig_ipq_mw3_qlb" type="fig"
/> shows a 2D and 3D view of a top-level design with an SoC and memory design.</p>
<fig id="fig_ipq_mw3_qlb">
<title>2D and 3D views of a Top-Level Design</title>
</fig>
-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Synchronizing XML formatting between Oxygen and external tools (+ Git)
Hi Chris,
About this question:
About Oxygen sometimes adding line breaks, for example before the "/>" in order to obey the maximum line width specified in its settings, we do not have a setting to control this, Oxygen considers that the resulting XML is data-wise equivalent to the original one and it is. I'm sorry but I'm not sure what we can do about this, we cannot guarantee that our serialization has all the settings to make it behave exactly like the serialization of another tool that you are using.
Regards,
Radu
About this question:
The Author visual editing mode and the DITA Maps Manager have the same internal structure which has the same serialization behavior.Does the "Format and Indent" operation behave identically to new content/element creation in the topic editor and the DITA Maps Manager?
About Oxygen sometimes adding line breaks, for example before the "/>" in order to obey the maximum line width specified in its settings, we do not have a setting to control this, Oxygen considers that the resulting XML is data-wise equivalent to the original one and it is. I'm sorry but I'm not sure what we can do about this, we cannot guarantee that our serialization has all the settings to make it behave exactly like the serialization of another tool that you are using.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 922
- Joined: Thu May 02, 2019 2:32 pm
Re: Synchronizing XML formatting between Oxygen and external tools (+ Git)
Post by chrispitude »
Hi Radu,
Is it even feasible to ask for an "Allow line breaks before />" option in your serializer? I realize I'm the only person asking for this. I don't know if you're using your own serializer or a standardized one.
Most serializers accessible from perl/Pythron/etc. provide control over indent, line length, and keep spaces. But this "/>" behavior is something I cannot emulate in them.
Right now I read the XML file twice - once in XML tree form, and once as a single large string. I use the tree form to structurally explore the content and figure out what elements to modify, then I attempt to find the same elements using regex and element IDs to update them in string form. It is just as awful as it sounds.
But, it ensures I modify only the areas I want, while leaving everything else precisely identical.
Is it even feasible to ask for an "Allow line breaks before />" option in your serializer? I realize I'm the only person asking for this. I don't know if you're using your own serializer or a standardized one.
Most serializers accessible from perl/Pythron/etc. provide control over indent, line length, and keep spaces. But this "/>" behavior is something I cannot emulate in them.
Right now I read the XML file twice - once in XML tree form, and once as a single large string. I use the tree form to structurally explore the content and figure out what elements to modify, then I attempt to find the same elements using regex and element IDs to update them in string form. It is just as awful as it sounds.

-
- Posts: 9434
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Synchronizing XML formatting between Oxygen and external tools (+ Git)
Hi Chris,
As far as I know there is no standard when it comes to serializing XML, so we use our own code along with the settings to format it. I added an internal issue for your request EXM-46298 - Format and indent setting to avoid line break between attribute and end of tag but I cannot guarantee a timeline for it.
About your external update tools, in Oxygen 22.1 we added API to be able to load XML content in memory in an Author-mode node structure (but without any visual aspects of it), to modify that structure and then save it back on disk. So at some point you could consider trying to migrate your Perl scripts to an Oxygen Java based plugin which adds for example a contextual menu action in the Project view and processes all content as if it would be loaded and modified in the Author visual editing mode.
Regards,
Radu
As far as I know there is no standard when it comes to serializing XML, so we use our own code along with the settings to format it. I added an internal issue for your request EXM-46298 - Format and indent setting to avoid line break between attribute and end of tag but I cannot guarantee a timeline for it.
About your external update tools, in Oxygen 22.1 we added API to be able to load XML content in memory in an Author-mode node structure (but without any visual aspects of it), to modify that structure and then save it back on disk. So at some point you could consider trying to migrate your Perl scripts to an Oxygen Java based plugin which adds for example a contextual menu action in the Project view and processes all content as if it would be loaded and modified in the Author visual editing mode.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 922
- Joined: Thu May 02, 2019 2:32 pm
Re: Synchronizing XML formatting between Oxygen and external tools (+ Git)
Post by chrispitude »
Hi Radu,
Thanks for filing the low-priority enhancement!
My perl utility reads in a ditamap, makes some changes, and writes it back out. As a workaround, I implemented this:
Here's the perl code for reference if it helps anyone:
along with some helper functions:
Then you can do something like this:
Thanks for filing the low-priority enhancement!
My perl utility reads in a ditamap, makes some changes, and writes it back out. As a workaround, I implemented this:
- Hash all actual element strings (with linefeeds, etc.) in input file by a normalized-whitespace version of the element.
- Search and replace all actual element strings in the output file, replacing it with the actual string from #1 if a normalized-whitespace string match exists.
Here's the perl code for reference if it helps anyone:
Code: Select all
sub normalize_whitespace { return (shift =~ s![\s\n\r]+! !gsr); }
sub write_differences {
my ($filename, $contents) = @_;
my %orig_elements = map {normalize_whitespace($_) => $_} (read_entire_file($filename) =~ m#(<[\w\-]+\s[^>]*>)#gs);
my $get_element = sub {
my $e = normalize_whitespace(shift);
return defined($orig_elements{$e}) ? $orig_elements{$e} : $e; # return original element if possible
};
$contents =~ s#(<[\w\-]+\s[^>]*>)#$get_element->($1)#gse;
write_entire_file($filename, $contents);
return 1;
}
Code: Select all
sub read_entire_file {
my $filename = shift;
open(FILE, "<$filename") or die "can't open $filename for read: $!";
local $/ = undef;
binmode(FILE, ":encoding(utf-8)"); # the UTF-8 package checks and enforces this
my $contents = <FILE>;
close FILE;
return $contents;
}
sub write_entire_file {
my ($filename, $contents) = @_;
$contents =~ s!\n?$!\n!s; # add LF if needed
open(FILE, ">$filename") or die "can't open $filename for write: $!";
binmode(FILE); # don't convert LFs to CR/LF on Windows
binmode(FILE, ":encoding(utf-8)"); # the UTF-8 package checks and enforces this
print FILE $contents;
close FILE;
}
Code: Select all
my $file_contents = read_entire_file($filename);
# reformat/modify XML inside $file_contents
write_differences($filename, $file_contents);
-
- Posts: 922
- Joined: Thu May 02, 2019 2:32 pm
Re: Synchronizing XML formatting between Oxygen and external tools (+ Git)
Post by chrispitude »
I agree that integrated Java-based solutions would be best. I wish I had the Java knowledge and free time to pursue this! The DITA migration is not even my full-time job. I'm just a technical writer with a spare-time investigation into DITA that somehow turned into a full multi-group migration effort.
Maybe when I retire some day (some day??!), I can learn how to create useful Oxygen add-ons.
Maybe when I retire some day (some day??!), I can learn how to create useful Oxygen add-ons.

Return to “DITA (Editing and Publishing DITA Content)”
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service