Search and copy XML files into a new directory

Questions about XML that are not covered by the other forums should go here.
crult
Posts: 20
Joined: Thu Jan 21, 2010 10:21 pm

Search and copy XML files into a new directory

Post by crult » Thu Jan 21, 2010 10:33 pm

Hello,

i'm using the 10.3 version of Oxygen in windows xp. I have the following problem: i have some folders with many XML files inside. I want to find only the files that contain a specific annotation (for example <Secteur>SCI</Secteur> ). There are some newspaper articles in XML format. The content SCI indicates that this is a science article. Other articles have <Secteur>SPO</Secteur> for Sports for example. I want to find only the science articles and to copy them in a new directory, doing it automatically. There is any solution? I used the option find, i found some results but i can't take each file manually to copy it (600 results). Thanks for your response.

adrian
Posts: 2659
Joined: Tue May 17, 2005 4:01 pm

Re: Search and copy XML files into a new directory

Post by adrian » Fri Jan 22, 2010 12:09 pm

Hi,

The 'Find/Replace in Files' from Oxygen, as the name implies, only does find and replace. The only copying is done to backup the old files(with a custom file extension) during replace but I don't think that would be very useful.
So the answer is no, there isn't a way to do this automatically.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

crult
Posts: 20
Joined: Thu Jan 21, 2010 10:21 pm

Re: Search and copy XML files into a new directory

Post by crult » Fri Jan 22, 2010 1:57 pm

If i make a minor change on the target XML files, ask for their backup with .xml extension and find them? What's the backup directory? If it isn't the solution, is that possible with another software or tool (or XSLT)?

thank u very much

adrian
Posts: 2659
Joined: Tue May 17, 2005 4:01 pm

Re: Search and copy XML files into a new directory

Post by adrian » Fri Jan 22, 2010 3:57 pm

They are not backed up to a different directory, they are placed in the same directory as the modified file, they are just appended that custom extension(default is bak).

You could probably do this in ANT but you have to do a little research.
Here's a starting point:
http://mail-archives.apache.org/mod_mbo ... hoo.com%3E

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

crult
Posts: 20
Joined: Thu Jan 21, 2010 10:21 pm

Re: Search and copy XML files into a new directory

Post by crult » Fri Jan 22, 2010 9:10 pm

Thank you very much for the instructions, i have to try this!

crult
Posts: 20
Joined: Thu Jan 21, 2010 10:21 pm

Re: Search and copy XML files into a new directory

Post by crult » Thu Jan 28, 2010 9:28 pm

hi,

someone proposed me something like this:

''If you have XSLT 2.0, you can store the locations of all files in an extra file, then for each location in that file, check this category, and then produce a new document that is the copy-of the document. AFAIK, Oxygen's XSLT processor is XSLT 1.0. Does the editor have SAXON? SAXON is an XSLT 2.0 processor, so it could do the job (assuming that using the editor every time you want to trigger SAXON is OK for you).''

Can you give me some information? Thank you very much!

adrian
Posts: 2659
Joined: Tue May 17, 2005 4:01 pm

Re: Search and copy XML files into a new directory

Post by adrian » Fri Jan 29, 2010 12:07 pm

Oxygen does use Saxon 6.5.5 for XSLT 1.0 and Saxon 9.2(Oxygen 11.1) for XSLT 2.0. But you do need a bit of XSLT knowledge to write the stylesheet to do this.

Regards,
Adrian
Adrian Buza
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

crult
Posts: 20
Joined: Thu Jan 21, 2010 10:21 pm

Re: Search and copy XML files into a new directory

Post by crult » Sun Jan 31, 2010 6:31 pm

That's what i need i think!thank you very much for the support. If i have any questions i'll post them. thanks!

crult
Posts: 20
Joined: Thu Jan 21, 2010 10:21 pm

Re: Search and copy XML files into a new directory

Post by crult » Sun Jan 31, 2010 7:39 pm

So i have to describe you exactly the situation:


All these folders are in the same directory(folder ''XSL'')



1) The name of the folder containing the XML files is ''01''
2) The folder where i want to extract only the files containing <Secteur>SCI</Secteur> is named ''SCI''.
The Xpath is /Document/Article[1]/Secteur[1]. The form of the XML documents is:

<?xml version='1.0' encoding='ISO-8859-15'?>
<Document xyurl='xyl://20040101N0001.xml'>
<DocId>20040101N0001</DocId>
<Article>
<Page Lien='repository/2004/01/01/pages/04010120.pdf'>20</Page>
<Date Annee='2004' Mois='01' Jour='01'/>
<Publication>LeMonde</Publication>
<Secteur>SCI</Secteur> <------ Here is the description of the category
<Taille>34</Taille>
<Corps>
<Titraille>
<Tetiere>AUJOURD&apos;HUI VOYAGES</Tetiere>
<Titres>
<Surtitre>« Hermione », la frégate de Rochefort</Surtitre>
<Titre>
<P>A bord, la vie était rude</P>
</Titre>
<SousTitre/>
</Titres>
</Titraille>
<Chapo/>
<Origine/>
<Texte>
<P>Sur l&apos; Hermione, les affûts de canon étaient peints en rouge pour faciliter le nettoyage du sang des hommes après la bataille. La « frégate de douze » était armée de 26 canons de douze (les boulets pèsent 6 kg) et 6 canons de six (boulets de 3 kg). Elle était beaucoup plus légère, rapide et maniable qu&apos;un vaisseau taillé pour le combat avec 118 canons. A bord, l&apos;eau est rationnée à trois pintes par homme et par jour. Les vers et les charançons infestent les biscuits de mer. L&apos;absence de fruits et légumes frais rend le scorbut ravageur. La fièvre typhoïde, la petite vérole et la gangrène sont des maladies fréquentes. L&apos;hygiène est absente, le sommeil mauvais. Deux matelots alternent dans un hamac, souvent trempé, à l&apos;entrepont, espace confiné où vivent aussi les moutons embarqués vivants. Le capitaine prend soin de sa chair à canon comme d&apos;un cheptel : il lui faut assez d&apos;hommes vivants pour livrer combat. A cette époque, le service dans la marine est obligatoire - un an sur trois - dans les provinces maritimes du royaume. </P>
<P/>
</Texte>
<SignaturePubliee/>
<Note/>
<Images/>
</Corps>
</Article>
<Indexation>
<TagAdmin1/>
<TagAdmin2/>
<TagAdmin3/>
<TitreComplementaire>2 articles - description de la vie des matelots à bord de l&apos;"Hermione"</TitreComplementaire>
<Commentaire>Q0101/675650;</Commentaire>
<Categories>
<Categorie>DESCRIPTION</Categorie>
<Categorie>ENCADRE</Categorie>
<Categorie>ENSEMBLE</Categorie>
</Categories>
<Lien/>
<Oeuvre>
<TitresOeuvre/>
<GenresOeuvre/>
<AuteursOeuvre/>
</Oeuvre>
<SignaturesIndexees>
<SignatureIndexee/>
</SignaturesIndexees>
</Indexation>
<Etat Statut='EXPORTE'>
<Documentaliste>DAR</Documentaliste>
<MisesAJour>
31-12-2003
</MisesAJour>
</Etat>
<Historique>
<France/>
<Etranger/>
<Personnes/>
</Historique>
</Document>





3) I created a new XSL stylesheet. Something like this:


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">



<xsl:template match="/Document/Article[1]/Secteur[1]/SCI">




<xsl:result-document href="">

<xsl:copy-of select="document(.) "></xsl:copy-of>

</xsl:result-document>




</xsl:template>


</xsl:stylesheet>



Now i want to tell XSLT to search the folder ''01'', to find only tha files containing <Secteur>SCI</Secteur>, and to copy them (without any changes) to the folder ''SCI''. Can you help me with th XSLT stylesheet?
Another question: How can i apply my scenario to the whole folder ''01''. I can't do this for each only XML file.


Thank you very much!

Post Reply