Batch processing of files

Oxygen general issues.
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Batch processing of files

Post by honyk »

Hello,

I'd like to open several files, perform Format & Indent operation and than save modified files into new location. What is the best method to do so? Some custom Java application (is there any sample how to do something similar) or something else?

Jan
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

Hello,

The action Format and Indent cannot be applied as batch action. You can create an XSLT stylesheet that formats the XML documents as you want, add the XML documents to the same Oxygen project in the Project view, select them in the Project view and apply the stylesheet to all the XML documents with the action Apply Transformation Scenario that is available on the popup menu of the Project view.


Regards,
Sorin
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

Does it mean 'cannot be applied' that it is not possible at all or in in current version? I want to migrate from other tool into Oxygen. All files are stored in Subversion. I'd like to perform formatting and indentation issues (quite different in Oxygen) in single commit to avoid disturbing Diff results if real changes are made. This is reason why I am trying to simulate exactly Oxygen behaviour as XSLT transformation can give different results. Maybe Opening in author mode, small editing without impact to the document content and saving would be sufficient in my case. If this is possible, it would be acceptable for me.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

Why don't you open every file in Text mode and apply the action Format and Indent from the toolbar Document or from the menu Document -> XML Document? It is the same as opening the file in Author mode and saving the file but there is a difference: saving in Author mode also applies the white-space properties from the CSS stylesheet associated with the edited document.


Regards,
Sorin
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

Yes, I agree saving file in text mode after Format & Indent would be enough, but doing that manually for every single file is not possible as there are thousands of such files... I am looking for any automation...
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

Opening a file in Author mode and saving it is the same as Format and Indent if you do not associate a CSS stylesheet that contains a white-space property with the file. You can open a file in Oxygen automatically at startup if you specify the file path as command line parameter. You can open the file in Author mode by default instead of Text mode if you set Author mode for XML Editor in Preferences -> Editor -> Pages.


Regards,
Sorin
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

Thanks for the tip. Opening via cmdline parameter could help. What is still missing to me in your proposal is automation of saving step :-)

In addition, in Author mode there is necessary extra step before save because Save icon is disabled after opening (no content change is detected).

PS: Back to my original question, I thought Format & Indent is similar action to e.g. docbook 'Insert graphic' with some accessible code behind. I've planed to use this code somehow.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

I think the saving step in Author mode could be automated with a custom extension action that starts a timer when the extension class is loaded by Author mode that inserts and removes a space character with the method authorAccess.getDocumentController().insertText() and saves the document with authorAccess.getEditorAccess().save().

The code of Format and Indent is not accessible from a customizable action. Only the user can run this action from the toolbar or from the menu.


Regards,
Sorin
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

Thanks for you suggestion. I am trying to code it, but I have no idea, how to achieve that my action is launched just after opening document. At the moment my code will overtake control, I can imagine inserting, deleting and saving steps. Second unknown is closing the document at the end.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

The extension actions are loaded when an XML document is opened in Author mode and the XML document is of the same document type as the one where you set your extension (Preferences -> Document Type Association). You can force that by setting Author mode instead of the default Text mode for XML Editor in Preferences -> Editor -> Pages -- XML Editor or you can force that at the document type level with the checkbox Initial Page in the dialog for editing a document type.

You cannot close the document from an extension. You have the following options:

- call System.exit() after edit and save in your extension action, so that each XML document will be modified and sabed in a different Oxygen instance

- allow the XML documents to be opened in the same Oxygen instance (if you call oxygen.exe with a file path as parameter the specified file is opened in the existing instance of Oxygen) and close the Oxygen instance manually after each 200 or 300 XML documents opened, modified and saved. The following oxygen.exe command will open a new Oxygen instance.


Regards,
Sorin
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

Ok, closing is quite clear now. I think this is very common task and it should be improved in future. But for now I am glad at least for current approach.

I understand what you mean by association/mode selection, but the only way of running my action I know at present is clicking on toolbar/menu etc, not automatic launch just after opening document without any user action. This step is still unclear to me.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

I explained that in a previous post:
sorin wrote:I think the saving step in Author mode could be automated with a custom extension action that starts a timer when the extension class is loaded by Author mode
That means start the timer in a Java static section of the Java class of your Author extension.


Regards,
Sorin
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

Thanks for your patience, but I still don't understand it well. To tell the truth, I am not experienced Java programmer so more detailed instruction are welcomed. Below is the code I think you mean, but I am not sure as it raise other questions, see comments in the code.

Code: Select all

public class BatchReformatting implements AuthorOperation {

Timer timer = new Timer();
TimerTask task = new TimerTask() {
public void run() {
// doOperation(); what parameters to pass ?
}
};
// timer.schedule(task, 10); - cannot be placed here

public void doOperation(AuthorAccess authorAccess, ArgumentsMap args)
throws IllegalArgumentException, AuthorOperationException {
...
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

Hello,

For running the timer in the Author extension automatically the timer has to be created and started in a static Java code section. Anyway we will add the action Format and Indent also in the Project view so that you can select a set of files in the Project view tree and you can apply the Format and Indent action to the selected files which will apply the options from Preferences -> Editor -> Format.


Regards,
Sorin
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

What exactly mean a static Java code section? Static method at the beginning like below?

Code: Select all

public class BatchReformatting implements AuthorOperation {

public static void main(AuthorAccess authorAccess, ArgumentsMap args) {

Timer timer = new Timer();
TimerTask task = new TimerTask() {
public void run() {
//doOperation(authorAccess, args); - non-static method cannot be referenced from a static content
}
};
timer.schedule(task, 10);
}

public void doOperation(AuthorAccess authorAccess, ArgumentsMap args)
throws IllegalArgumentException, AuthorOperationException {
There is static/non-static conflict, which I can't resolve.

Your proposal of new functionality seems promising, but only if format & indent behaviour would correspond to the definition of white space handling in CSS file. My original idea is abandoned as in mixed content like Docbook there is real risk of unintended spaces in some specific content. In ideal case there could be an option which behaviour to use (keep css settings or not).

And to tell the truth, I don't understand well how preserve space settings in Preference/Editor/Format/XML is kept if document is saved in Author mode. If it is combined with css, ignored or prefered.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

honyk wrote:What exactly mean a static Java code section? Static method at the beginning like below?
No, not as in your example.
honyk wrote:Your proposal of new functionality seems promising, but only if format & indent behaviour would correspond to the definition of white space handling in CSS file.
We will add the action Format and Indent to the Project view so that it can be applied to a batch of files in one action. This action will not look at any CSS stylesheet just like the Format and Indent action that is available now on the editor toolbar and on the Document menu. It will just read the file from disk and apply the current values of the format preferences that are set in Preferences -> Editor -> Format and Preferences -> Editor -> Format -> XML.
honyk wrote:My original idea is abandoned as in mixed content like Docbook there is real risk of unintended spaces in some specific content. In ideal case there could be an option which behaviour to use (keep css settings or not).
That is no longer a problem. We fixed the problem of inserting spaces between the child elements of an elements with mixed content. The fix will go in the next version of Oxygen. As I specified no CSS is applied by the action Format and Indent.
honyk wrote:And to tell the truth, I don't understand well how preserve space settings in Preference/Editor/Format/XML is kept if document is saved in Author mode. If it is combined with css, ignored or prefered.
It is not a problem or priority for the option of preserving the whitespace. When the document is saved in Author mode if an element is required to preserve the whitespace in the list Preserve Space Elements or in the CSS then the whitespace is preserved. If the element is not required to preserve the whitespace in any of the two options then it will not preserve that.


Regards,
Sorin
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

Thanks for answers, but instead of your 'no' I would be grateful for correct code :-) It seems to me it is something secret, not intended for public use. In such case, are there other ways to discuss it than via this public forum? FYI: we have purchased Oxygen with maintenance pack.

Many thanks for clarification of white space handling! And for the info there is bug in wrapping of mixed content! I've met some issues and mistakenly thought it is caused by css or so. Now I really understand the difference between Format & Indent and saving in Author mode. In such case proposed functionality would be helpful. Unfortunately, we can't wait as it is necessary asap :-(
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

honyk wrote:Thanks for answers, but instead of your 'no' I would be grateful for correct code :-) It seems to me it is something secret, not intended for public use. In such case, are there other ways to discuss it than via this public forum? FYI: we have purchased Oxygen with maintenance pack.
I am sorry, we do not provide full Java development services for custom Author actions. We can provide guidelines and we help you with using the Author API or other Oxygen features but you need a good understanding of such general Java topics as static code and the relationship of Java static code and Java class loading before you start implementing in your own Java code the guidelines that I outlined above. Anyway the Author extension that I suggested is a kind of kludgy workaround that could help you until we implement the requested batch action and add it to the Project view. The Author API was designed for Author extensions invoked by the user, not for automatic actions (hence the kludgy workaround).


Regards,
Sorin
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

Ok, I understand. I've read through the discussion and I am quite unsure if timer functionality you are suggesting to implement is really what I am looking for. In one of topic there is mentioned timer in connection to saving operation. This step is quite clear. There is no problem to create working timer for this operation. Later the timer is mentioned as method how to perform certain action automaticaly just after opening document. And this is what I am discussing all the time. Are we both really on the same wave? If so, please, would you confirm this is working somewhere or it is only untested suggestion?

My other question is how to integrate such, still virtual, autolaunching code into Document Type Association settings. To be able to select custom action the appropriate class must implement AuthorOperation.

Now I have two classes, both implementing AuthorOperation, one with core code (MC), second, supporting (SC) with static method and timer (and with empty doOperation method), which loads the core class dynamically. If SC class is defined in actions dialog of Document Type Association settings, nothing happens after any document is open (by default in Author mode). Is definition in action list really enough for Oxygen to be able consider code behind as something to load immediately after opening the document?

I plan to hire an experienced Java programmer for this task, but I'd like to know if this is not wasting of time (and money).
honyk
Posts: 176
Joined: Wed Apr 29, 2009 4:55 pm

Re: Batch processing of files

Post by honyk »

Solved! Heart of final code is method authorAccess.getEditorAccess().reloadContent(Reader);
It is used for exchange of document content without necessity to open next file in extra editor and close it after saving. As name of document is kept the same all the time, it is necessary to rename just saved file so that it matches the file content. It is performed using java method renameTo as there is no way to do this directly via Oxygen API.
sorin_ristache
Posts: 4141
Joined: Fri Mar 28, 2003 2:12 pm

Re: Batch processing of files

Post by sorin_ristache »

Hello,
honyk wrote:If so, please, would you confirm this is working somewhere or it is only untested suggestion?
It was not tested because as I specified the Author API was designed for interactive actions not automatic ones.
honyk wrote:Is definition in action list really enough for Oxygen to be able consider code behind as something to load immediately after opening the document?
Also you have to add the action to the Author menu or to the Author toolbar when you configure the document type.


Regards,
Sorin
Post Reply