Easiest way to extract same variables from many XML files?

Questions about XML that are not covered by the other forums should go here.
XMLnoob
Posts: 2
Joined: Wed May 11, 2022 10:41 am

Easiest way to extract same variables from many XML files?

Post by XMLnoob »

Hi everyone,

I have over 30,000 XML files that are in the same format in a local folder with different names. I want to obtain the same 2 variables from each one and place into an excel spreadsheet.

I feel like this shouldn't be a hard task but I'm struggling to figure out how to do it. I haven't worked with XML or databases/queries/sql before but know a bit of programming.

I tried to import the XML files into excel using developer tools but it gave me a "switch from current encoding to specified encoding not supported".
As far as I can understand, this is the encoding:
<?xml version="1.0" encoding="utf-16"?>

The first variable I want to collect is email:

Code: Select all

<applicant>
<person/>
<contact>
<email>
<type>1</type>
<address>Example@Email.com</address>
</email>
</contact>
The second variable is test score from score class = '5'.

Code: Select all

<testscores>
<testscore>
<score class="3">70</score>
</testscore>
<testscore>
<score class="5">92</score>
</testscore>
</testscores>
Is this something that can be done relatively easily and if so, any advice on how to do it for a beginner?

Thank you in advance
Radu
Posts: 9059
Joined: Fri Jul 09, 2004 5:18 pm

Re: Easiest way to extract same variables from many XML files?

Post by Radu »

Hi,

About this error:

{quote}switch from current encoding to specified encoding not supported{quote}

I do not know anything about how the Excel import works, if you want to change the encoding for all files from UTF-16 to UTF-8 for example, you can add in the Oxygen Project view a reference to the folder containing the files, then right click the folder "Find/Replace in Files" and replace "<?xml version="1.0" encoding="utf-16"?>" with "<?xml version="1.0" encoding="utf-8"?>". I do not know if the import will work on UTF-8 files but it could be tested..

If you want some XML specific technology to solver this, Oxygen has the possibility to create an XSLT stylesheet and apply it over a sequence of documents in a folder. But you would need to learn XSLT: https://blog.oxygenxml.com/xslt_training.html
The XSLT could produce an HTML document and if you open the HTML document in a web browser, you could copy its contents and paste them in Excel to populate the table.

Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
XMLnoob
Posts: 2
Joined: Wed May 11, 2022 10:41 am

Re: Easiest way to extract same variables from many XML files?

Post by XMLnoob »

Thank you for the answer - changing the UTF to UTF-8 worked to allow import into excel!

I'll see about learning XSLT.
Post Reply