Extracting all values, all text and all nodes with only wildcards
Questions about XML that are not covered by the other forums should go here.
-
- Posts: 3
- Joined: Tue Sep 17, 2019 10:27 pm
Extracting all values, all text and all nodes with only wildcards
Post by Brian_donovan »
Hi, questions: I have a 400MB xml and I need to extract all values, all text and all nodes from it in blocks and as efficiently possible. Here is a example of my xml tags:
What I need is nodes, values and text from every block, IMPORTANT: I do not know how many blocks, how many somethings and how many inner blocks are in the code and I do not know the names either, everything need to be extracted with only wildcards. here is the code I have up to this point ( obviously is not perfect) pleas help 
the answer I am looking for is the next one:
the "child::*[.]" code is working nicely but the "*/@*[3]" is not I do not know how to use a wildcard instead of a 3 and I can not repeat the code from 1 to 100 there must be a better way. I have also tried the "//*" but I just cant make it work right... Any help will be appreciated, thank you all.
Code: Select all
<Big report>
<block something1="A" something2="B" something3="C" something4="D" something5="E"/>
<inner block>F</inner block>
<inner block>G</inner block>
<inner block>"H"</inner block>
<inner block>
<inner inner block something1="I" something2="J" something3="K"/>
</inner block>
</block>
<block something1="L" something2="M" something3="N" something4="O" something5="P"/>
<inner block>"Q"</inner block>
<inner block>
<inner inner block something1="R" something2="S" something3="T"/>
</inner block>
</block>
<something else>
</something else>
<Big report>

Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="Big report/block">
<xsl:value-of select="*/@*[3]"/>
<xsl:value-of select="child::*[.]"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Code: Select all
ABCDEFGHIJK
LMNOPQRST
-
- Posts: 9431
- Joined: Fri Jul 09, 2004 5:18 pm
Re: Extracting all values, all text and all nodes with only wildcards
Hi Brian,
Ideally when you post sample XML fragments, they should be wellformed, it makes it easier for somebody to construct an example based on them.
An XSLT which lists all attribute values and all text nodes could look something like this:
For generic XSLT questions there is also an XSLT users list which may also be a good place where to ask XSLT related questions:
https://www.mulberrytech.com/xsl/xsl-list/
Regards,
Radu
Ideally when you post sample XML fragments, they should be wellformed, it makes it easier for somebody to construct an example based on them.
An XSLT which lists all attribute values and all text nodes could look something like this:
Code: Select all
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*">
<xsl:apply-templates select="@* | node()"/>
</xsl:template>
<xsl:template match="@*">
<xsl:value-of select="."/>
</xsl:template>
</xsl:stylesheet>
https://www.mulberrytech.com/xsl/xsl-list/
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 102
- Joined: Tue Aug 19, 2014 12:04 pm
Re: Extracting all values, all text and all nodes with only wildcards
Post by Martin Honnen »
The built-in default processing (see https://www.w3.org/TR/xslt-30/#built-in ... -only-copy for XSLT 3 but the declarative way it is specified with "xsl:mode" in XSLT 3 is backwards compatible with XSLT 1 and 2) copies all attribute values and text node values so there is not much you need to do beyond simply relying on it and ensuring the attributes are processed, perhaps, if the result of each "block" should form one line (as long as the data doesn't contain line breaks) then
suffices.
Doing it efficiently (in terms of low memory consumption for a huge document), if you use oXygen where you can set up the latest Saxon 9.9 or 9.8 EE for streaming, then use
Code: Select all
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="block | block//*">
<xsl:apply-templates select="@* | node()"/>
<xsl:text> </xsl:text>
</xsl:template>
Doing it efficiently (in terms of low memory consumption for a huge document), if you use oXygen where you can set up the latest Saxon 9.9 or 9.8 EE for streaming, then use
Code: Select all
<xsl:mode streamable="yes"/>
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="block | block//*">
<xsl:apply-templates select="@*"/>
<xsl:apply-templates/>
<xsl:text> </xsl:text>
</xsl:template>
Return to “General XML Questions”
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service