[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Using memory addressing to retrieve a value vice using a software string library to retrieve a value

Subject: Re: [xsl] Using memory addressing to retrieve a value vice using a software string library to retrieve a value
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 20 Nov 2015 16:38:55 -0000

In addition to the replies by Dr. Kay and David Carlisle:

It depends on the concrete case and the concrete XML document and the
string itself.

1. Imagine a string starting with a space and then having the wanted
substring (that is not too-long). This may be faster than scanning
sequentially through thousands of nodes -- as in general an XPath
engine may use a linked-list and not a vector.

2. On the other side, imagine an XPath engine that has the sequence of
nodes in a vector (self-expandable array structure). Then the wanted
node can be obtained in O(1). Imagine at the same time that you have a
sting long many Megs, snd the first space happens only at the end of
the string, and the wanted substring is also very long. In this case
searching through the sting using the specified expression can be
orders of magnitude slower.

A general remark about finding the first/all  occurrence of a given
string as a substring of another given string, or whether a string1
contains a string2.

I am aware of at least two very efficient algorithms:

  1. Using *suffix-trees*  -- all possible suffixes of the given
string are organized as a trie.  The search (same as the contains()
function) is very fast.

  2. Using the hash of the search-string and scanning the given string
and sequentially computing the hash of every next substring with the
same length as the search string. Because computing the hash of the
moving "window" can be done incrementally, this algorithm is O(N)  --
much better than the naC/ve O(N*M) algorithm. More about this -- in the
book of Steven Skiena "The Algorithm Design Manual",


To summarize:  All depends.

On Fri, Nov 20, 2015 at 4:13 AM, Costello, Roger L. costello@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi Folks,
> I want to retrieve "west".
> Which of these is faster?
> -------------------
> Approach #1
> -------------------
> The <edge> element contains text:
>         <edge>garden west door</edge>
> "west" is retrieved using this XPath:
>         substring-before(substring-after(., ' '), ' ')
> Note: assume that <edge> is the context node.
> -------------------
> Approach #2
> -------------------
> The <edge> element contains markup:
>         <edge>
>             <garden/>
>             <west/>
>             <door/>
>         </edge>
> "west" is retrieved using this XPath:
>         *[2]/name()
> I believe that Approach #2 is much, much faster. Do you agree?
> As I see it, Approach #2 is simply a memory addressing task (which computers
do very well) to obtain <west/> and then output the symbols at that memory
address. Approach #1, on the other hand, requires the use of software string
libraries, which, I imagine, results in hundreds or thousands of machine
instructions. In fact, I would imagine that Approach #2 is hundreds or
thousands of times faster than Approach #1. Do you agree?
> /Roger

Dimitre Novatchev
Truly great madness cannot be achieved without significant intelligence.
To invent, you need a good imagination and a pile of junk
Never fight an inanimate object
To avoid situations in which you might make mistakes may be the
biggest mistake of all
Quality means doing it right when no one is looking.
You've achieved success in your field when you don't know whether what
you're doing is work or play
To achieve the impossible dream, try going to sleep.
Facts do not cease to exist because they are ignored.
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
Sanity is madness put to good use.
I finally figured out the only reason to be alive is to enjoy it.

Current Thread