Page 1 of 1

Best XML structure

Posted: Wed May 02, 2012 6:47 am
by xuitos
I'm quite new to constructing xml, and I was hoping for some initial guidance on best practices.

I have an Excel file that needs to be converted to xml.

It's in tabulator form:
Row A has ID
Row B has Name, with entries for First, Middle, Last in three cells running down the column.
Row C has Team, with a list of teams
Row D has Played, with the number of games played for each team.
Row E has Goals, with the number of goals kicked in each team.

I have exported the file as csv, and import it into Oxygen as a text file. But then I get lost. Should I make a schema and then import?
Anyway, any help would be much appreciated!

Re: Best XML structure

Posted: Wed May 02, 2012 1:07 pm
by Costin
Hi,

You can import an Excel file that has a simple format (either by transforming it to CSV and import as text, or by importing it as an xls file) with no need for a preceding schema file. After importing the file, you could switch to "Grid" mode which is a better approach to the way that the original file is displayed in Excel.
Otherwise, in case you need to customize the format of the imported file, then you should consider creating a custom XSL file.

There are several ways to do this import.

1. You can use either Oxygen XML Editor or Developer to import from Text/Excel to XML(File -> Import -> Text file/MS Excel file...).
Note however that this function has some limitations:
- it can only import the old Excel 97/2000/XP/2003 format.(.xsl)
- it can only import one sheet at a time.
- the table from the sheet is assumed to start from the top left corner(you cannot define the starting and ending row/column of the data to import). If there are other labels, titles or other data before the actual table, the import will get you mixed results.


2. If you have Excel documents in the new format(.xslx), Excel 2007/2010, you could first convert it to the older ".xsl" format - compatibility mode before importing it in oXygen. However, Oxygen also provides a small example XML stylesheet for extracting data directly from the .xslx file. The sample is located in:
Oxygen/samples/ooxml/extractFromExcel.xsl
This stylesheet will have to be modified to your needs, so some XSLT knowledge is required. The advantage is that after developing this stylesheet you can use it to import automatically in the desired XML format.

There are some video demonstration on our web site about how this could be used:
http://www.oxygenxml.com/videos.html#vt ... ocuments_t
Note that the videos are showing extraction from Word and exporting to Excel, but the procedure is the same for extraction/import from Excel.

3. There is an additional method of getting data in Oxygen from Excel. It can be copied directly from Excel to the clipboard and pasted in Oxygen in the Grid mode. Some preparation is required for this.
You have to create in Oxygen an XML file that replicates the structure of the table(same number of columns), switch to Grid mode and paste the copied content from Excel at top left corner from the blank table.

e.g.

Code: Select all

<root>
<row>
<item1/>
<item2/>
...
<itemN/>
</row>
<row>
<item1/>
<item2/>
...
<itemN/>
</row>
</root>
You can already name the cell/item elements in any way you want. But the row elements must have the same name(e.g. "row" in this case).

Regards,
Costin

Re: Best XML structure

Posted: Thu Sep 12, 2013 8:28 pm
by Chaa006
Thank you, that is very useful information. I have a supplementary question, if I may ? The Excel spreadsheet is not monolingual, and contains text primarily in British English with intermixed polytonic Greek, Latin and Hebrew. Is it possible to surround the stretches of text in languages other than English with XML tags to indicate the language ? A naïve approach (simply wrapping the text in <Greek></Greek> does not work as the import converts the leading "<" to "<".

Philip Taylor

Re: Best XML structure

Posted: Fri Sep 13, 2013 12:03 pm
by adrian
Hi,

In Oxygen you can manually select the text you want to surround with tags and use the contextual menu > Refactoring > Surround with Tags... (Ctrl+E) action.
Note that in XML the language is usually specified with an xml:lang attribute:
W3C - Language tags in HTML and XML

Regards,
Adrian

Re: Best XML structure

Posted: Fri Sep 13, 2013 3:20 pm
by Chaa006
Unless I am misunderstanding you, Adrian, that is not what I would like to achieve. The master source is prepared using Microsoft Excel by a classics scholar, and it would (should) be his responsibility to identify which stretches of text are not in English (e.g., are in Greek, Latin, Hebrew and so on) and then to indicate that <stress>in the Excel file</stress>. My question is, "How can he indicate the language of a stretch of text in Excel so that Oxygen will automatically surround that stretch of text with the correct tags during Excel import ?".

Phjilip Taylor

Re: Best XML structure

Posted: Fri Sep 13, 2013 4:14 pm
by adrian
Hi,

I'm not sure how your text is formatted in Excel. Ideally text in each language should be in a different Excel column (a column for Greek, another for Latin, etc). But this is only possible if the content is structured like a translation (the same text translated in various languages).
If the text content mixes languages within the same Excel cell/column, this makes it rather difficult to extract and identify each language.

For the latter case it would not be possible to do the import in one step. The Oxygen import expects text content from Excel. Even if you add element tags in the content within Excel, Oxygen will still treat that content as text and escape the XML special characters ('<' -> '<', '&' -> '&', etc).

I guess you could still use this form of tagging in Excel, and after importing the content in Oxygen, you could unescape the '<' entity from the entire XML document. Note that this can be easily accomplished by selecting the entire XML content (Edit > Select All) and using the action Document > Source > Unescape Selection. In the displayed dialog make sure only the '< to <' checkbox is selected.
However, afterwards you will have to re-escape any '<' characters that were not part of a tag from Excel and should have remained escaped. Oxygen will indicate these as errors.

Regards,
Adrian

Re: Best XML structure

Posted: Fri Sep 13, 2013 6:00 pm
by Chaa006
Thank you, Adrian. It is, unfortunately, the case that the scholarly conventions for representing the content of MSS require XML-like constructs, e.g.

βασιλ<έως>, τοῦ νικερίτ<ου>· διὰ χειρὸς ἰω(άννου) τοῦ εὐτε<λοῦς> (καὶ) ξένου τοῦ κούλικ(ος)· (καὶ) οἱ ἀναγινωσκοντες

(Note the <ου> and <λοῦς> components). This means that the raw Excel document cannot contain XML tags, as there would be no (easy) way of deciding which was tag and which was content. What I was hoping was that there might be some way to (for example) format Greek as green, Latin as lavender, Hebrew as heliotrope (just silly mnemonic colours, not necessarily those we would use) and that there might be a semi-automated process whereby Excel styling could be mapped to XML tagging during import. The styling need not be colour : it could be (e.g.,) bold, italic, underline, ..., just /something/ that the Excel import wizard could be trained to recognise and map to XML tags.

Philip Taylor

Re: Best XML structure

Posted: Wed Sep 18, 2013 3:26 pm
by adrian
The Import feature from Oxygen can only import the content, not the styling or formatting. That means colors and font styling can't be imported.

The question that immediately comes to mind is, why use Excel for this in the first place?
It doesn't seem suited for this kind of task. Why not use directly Oxygen? The Author mode can be customized and simplified for the less XML-savvy users.

Regards,
Adrian