Scripting to semi-automate entity generation?
Posted: Thu Jul 23, 2015 4:56 am
Hello,
I apologize in advance if I do not use the correct language for my problem. I am very new to XML and this is my first project.
I'm currently using XML and the TEI P5 guidelines to encode a number of public domain theatrical plays for an academic project. I must generate entities for every speaker, line, as well as physical position of each line as it corresponds to a physical publication of these plays. The current play that I am working on is over 5,000 lines long and physically typing in the entities for each line, scene, speaker, etc. has been exhausting and very inefficient.
I'm curious if there is a way I can automate this process. I'd like to automate the following:
-Insert <l>.........</l> for every line in a document (I paste all of the text from the original document into Oxygen and have approx. 5,000 lines of text)
-Within the text, if a word is capitalized without a period immediately before it, place that word on the next line.
-after X number of lines, insert <page number and page image>
I would appreciate any help the community could provide and am happy to provide additional information if necessary. I have limited scripting and programming experience, but if I'm pointed in the right direction I believe I can figure it out.
Thank you
I apologize in advance if I do not use the correct language for my problem. I am very new to XML and this is my first project.
I'm currently using XML and the TEI P5 guidelines to encode a number of public domain theatrical plays for an academic project. I must generate entities for every speaker, line, as well as physical position of each line as it corresponds to a physical publication of these plays. The current play that I am working on is over 5,000 lines long and physically typing in the entities for each line, scene, speaker, etc. has been exhausting and very inefficient.
I'm curious if there is a way I can automate this process. I'd like to automate the following:
-Insert <l>.........</l> for every line in a document (I paste all of the text from the original document into Oxygen and have approx. 5,000 lines of text)
-Within the text, if a word is capitalized without a period immediately before it, place that word on the next line.
-after X number of lines, insert <page number and page image>
I would appreciate any help the community could provide and am happy to provide additional information if necessary. I have limited scripting and programming experience, but if I'm pointed in the right direction I believe I can figure it out.
Thank you