Import TEI header information from excel and combine it with
Posted: Tue Jan 27, 2015 9:31 am
Hello, I am new to Oxygen and it would be great if you could help me with a corpus I am working on.
I have several hundreds of plain text files. They are short texts (around 200 words each) which make up a corpus (around 700,000 words).
I would like to create an XML version for every file (using a "TEI lite" schema) containing information like the title of the text, the author, date, and so on.
I already have an excel file with the information associated to every text file (one row per text file, columns contain information like title of the text, author, date, etc), like this:
filename title author date
filename1 title1 author1 2013
filename2 title2 author2 2010
filename3 title3 author 3 2011
...
However, the excel file only contains the metadata associated to every text file (and the name of the text file), but not the textfile itself.
I would like to convert the information contained in the excel file into a TEI header, and combine it with the text file, so that I have an XML document with two parts: TEI header and text.
Could you please let me know how I can do that?
I think that I can export the excel information into XML, as explained here: https://www.udemy.com/blog/excel-to-xml/ but I am not sure how to include the text itself into the XML file.
Any help would be very much appreciated.
Pilar
I have several hundreds of plain text files. They are short texts (around 200 words each) which make up a corpus (around 700,000 words).
I would like to create an XML version for every file (using a "TEI lite" schema) containing information like the title of the text, the author, date, and so on.
I already have an excel file with the information associated to every text file (one row per text file, columns contain information like title of the text, author, date, etc), like this:
filename title author date
filename1 title1 author1 2013
filename2 title2 author2 2010
filename3 title3 author 3 2011
...
However, the excel file only contains the metadata associated to every text file (and the name of the text file), but not the textfile itself.
I would like to convert the information contained in the excel file into a TEI header, and combine it with the text file, so that I have an XML document with two parts: TEI header and text.
Could you please let me know how I can do that?
I think that I can export the excel information into XML, as explained here: https://www.udemy.com/blog/excel-to-xml/ but I am not sure how to include the text itself into the XML file.
Any help would be very much appreciated.
Pilar