Pasting Text from MS Word into Oxygen XML Editor

Questions about XML that are not covered by the other forums should go here.
jjohnson
Posts: 1
Joined: Tue Sep 01, 2009 11:09 pm

Pasting Text from MS Word into Oxygen XML Editor

Post by jjohnson »

Hi. I'm new to Oxygen XML Editor. I want to copy text from a Word 2003 file into Oxygen XML Editor. Is it okay to just copy text directly from Word or do I need to save the Word file as a "Word 2003 XML Document" and check the "Save Data Only" box? If I just copy text directly from Word 2003, do extraneous codes get carried into the XML file behind the scenes that could create problems? Thanks for your help.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Pasting Text from MS Word into Oxygen XML Editor

Post by sorin_ristache »

Hello,

The save format that you set in MS Word is not important. If you select "Word 2003 XML Document" the document will be saved as XML file otherwise it will be saved in a text format or in a Microsoft proprietary format (depending on the save format that you choose, Text file or Microsoft file). The Microsoft proprietary format is not editable directly in a text editor or in Oxygen editor. But a copy and paste from MS Word to Oxygen should not copy any extraneous characters that are not displayed in MS Word on screen when you edit your MS Word document. Copying such extraneous characters would be a MS Word bug but I did not experience that in copy/paste actions. Do you get strange validation errors in Oxygen or some unexpected characters in the document after the copy/paste action?


Regards,
Sorin
kwringe
Posts: 10
Joined: Thu Jun 18, 2009 9:30 pm

Re: Pasting Text from MS Word into Oxygen XML Editor

Post by kwringe »

The problem we've encountered when copying and pasting from Word is that special characters, such as fancy quotes, from the Word document are inserted into our oXygen XML files. Often we don't notice that we've added fancy quotes (or em-dashes, en-dashes, etc.,) because they aren't flagged by oXygen. The problem we have with fancy quotes is that they show up in our output and look funny.

We've added schematron rules that report errors when we insert fancy quotes into our documents. Unfortunately, we can't use the schematron rule for the em-dash as we use an entity to represent the em dash and it is the same as the Word em-dash. For the em-dash we've resulted to using Find and Replace before a release.
Post Reply