Scripting to semi-automate entity generation?
Questions about XML that are not covered by the other forums should go here.
-
- Posts: 1
- Joined: Thu Jul 23, 2015 4:43 am
Scripting to semi-automate entity generation?
Hello,
I apologize in advance if I do not use the correct language for my problem. I am very new to XML and this is my first project.
I'm currently using XML and the TEI P5 guidelines to encode a number of public domain theatrical plays for an academic project. I must generate entities for every speaker, line, as well as physical position of each line as it corresponds to a physical publication of these plays. The current play that I am working on is over 5,000 lines long and physically typing in the entities for each line, scene, speaker, etc. has been exhausting and very inefficient.
I'm curious if there is a way I can automate this process. I'd like to automate the following:
-Insert <l>.........</l> for every line in a document (I paste all of the text from the original document into Oxygen and have approx. 5,000 lines of text)
-Within the text, if a word is capitalized without a period immediately before it, place that word on the next line.
-after X number of lines, insert <page number and page image>
I would appreciate any help the community could provide and am happy to provide additional information if necessary. I have limited scripting and programming experience, but if I'm pointed in the right direction I believe I can figure it out.
Thank you
I apologize in advance if I do not use the correct language for my problem. I am very new to XML and this is my first project.
I'm currently using XML and the TEI P5 guidelines to encode a number of public domain theatrical plays for an academic project. I must generate entities for every speaker, line, as well as physical position of each line as it corresponds to a physical publication of these plays. The current play that I am working on is over 5,000 lines long and physically typing in the entities for each line, scene, speaker, etc. has been exhausting and very inefficient.
I'm curious if there is a way I can automate this process. I'd like to automate the following:
-Insert <l>.........</l> for every line in a document (I paste all of the text from the original document into Oxygen and have approx. 5,000 lines of text)
-Within the text, if a word is capitalized without a period immediately before it, place that word on the next line.
-after X number of lines, insert <page number and page image>
I would appreciate any help the community could provide and am happy to provide additional information if necessary. I have limited scripting and programming experience, but if I'm pointed in the right direction I believe I can figure it out.
Thank you
-
- Posts: 2879
- Joined: Tue May 17, 2005 4:01 pm
Re: Scripting to semi-automate entity generation?
Hello,
If the source is text (as opposed to XML), you can't really automate the process, but you can make use of various Oxygen helpers to improve productivity.
If you have certain XML structures that appear repeatedly, you can create code templates in Oxygen that insert (or surround existing text content with) that XML snippet.
Most of what you mentioned could be accomplished with the Find/Replace tool (Find > Find/Replace) and regular expressions, but unfortunately this can't be automated by Oxygen. Still, you could accomplish all this with a few manual steps, and afterwards just make corrections.
Find: ^.*?$
Replace with: <l>$0</l>
Options: Regular expression
Note that this doesn't check if lines are already wrapped in <l> tags, so only do this once.
Find: (?<!\.)[A-Z].+?
Replace with: \n$0
Options:
Case sensitive
Regular expression
Note that this simply breaks the text line, if you also want it to break an existing <l> tag, you can instead use Replace with: </l>\n<l>$0
Find: (<l>.*?</l>\n){X} (replace X with the actual number)
Replace with: $0<pb n="1" facs="page1.png"/>\n
Options: Regular expression
Note that if you use the Find and Replace buttons this counts the X number of lines from your current position in the document, so make sure you start at the top, or you can use 'Find All' and/or 'Replace All' which always start at the top.
Regards,
Adrian
If the source is text (as opposed to XML), you can't really automate the process, but you can make use of various Oxygen helpers to improve productivity.
I believe that by "entities" you are referring to XML tags, start tag (<tag>) and end tag (</tag>) of an XML element. Note that in XML the term "entities" usually refers to XML entities.I'm currently using XML and the TEI P5 guidelines to encode a number of public domain theatrical plays for an academic project. I must generate entities for every speaker, line, as well as physical position of each line as it corresponds to a physical publication of these plays. The current play that I am working on is over 5,000 lines long and physically typing in the entities for each line, scene, speaker, etc. has been exhausting and very inefficient.
If you have certain XML structures that appear repeatedly, you can create code templates in Oxygen that insert (or surround existing text content with) that XML snippet.
Most of what you mentioned could be accomplished with the Find/Replace tool (Find > Find/Replace) and regular expressions, but unfortunately this can't be automated by Oxygen. Still, you could accomplish all this with a few manual steps, and afterwards just make corrections.
You could wrap every line within <l> tags with the Find/Replace tool (Find > Find/Replace) and regular expressions:jreifste wrote:-Insert <l>.........</l> for every line in a document (I paste all of the text from the original document into Oxygen and have approx. 5,000 lines of text)
Find: ^.*?$
Replace with: <l>$0</l>
Options: Regular expression
Note that this doesn't check if lines are already wrapped in <l> tags, so only do this once.
I have a similar solution with the Find/Replace tool (Find > Find/Replace) and regular expressions:jreifste wrote:-Within the text, if a word is capitalized without a period immediately before it, place that word on the next line.
Find: (?<!\.)[A-Z].+?
Replace with: \n$0
Options:
Case sensitive
Regular expression
Note that this simply breaks the text line, if you also want it to break an existing <l> tag, you can instead use Replace with: </l>\n<l>$0
This assumes the lines are already wrapped in <l> tags:jreifste wrote:-after X number of lines, insert <page number and page image>
Find: (<l>.*?</l>\n){X} (replace X with the actual number)
Replace with: $0<pb n="1" facs="page1.png"/>\n
Options: Regular expression
Note that if you use the Find and Replace buttons this counts the X number of lines from your current position in the document, so make sure you start at the top, or you can use 'Find All' and/or 'Replace All' which always start at the top.
Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
Return to “General XML Questions”
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service