[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] A regular expression for the content of any processing-instruction


Subject: Re: [xsl] A regular expression for the content of any processing-instruction
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 23 Feb 2012 13:54:07 +0000

On 23/02/2012 13:43, Costello, Roger L. wrote:
Hi Folks,

I created a regex for the content of any PI.

You'll find an xpath regex for that in the sources of htmlparse.xsl xml 1.1 spec says

[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))



Is my regex correct?


Here is the structure of the content of any PI:

1. Zero or more whitespace characters. This is expressed as: \s*


No white space allowed here


2. One or more XML name characters. This is expressed as: \c+

It has to start with a name-start character you can have <?N1 but not <?1N so \i\c* except it cant start with XmL . but the meaning of \i and \c change at xml 1`.` or xml 1.0 5th edition so whether or not \i and \c mean what you want depends on what you want and what the systen you are using means by them.



3. Zero or more whitespace characters. This is expressed as: \s*
no

4. The equals sign. This is expressed as: =
no

5. Zero or more whitespace characters. This is expressed as: \s*


6. Either a single- or double-quote character. This is expressed as:
["']
no

7. One or more characters (any kind of character). This is expressed as: .+
no

Note: the period allows any character. That's not correct. What is correct?

8. Either a single- or double-quote character. This is expressed as:
["']
no

9. Repeat (1) - (8) one or more times. This is expressed as: ( ... )+
no

10. Zero or more whitespace characters. This is expressed as: \s*
no

Here is the resulting regex:


(\s*\c+\s*=\s*["'].+["'])+\s*
no

Do you agree?

No, you are giving attribute syntax, PI's don't have attributes their content is everything up to the next occurrence of ?>

/Roger



David



________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. ________________________________________________________________________



Current Thread
Keywords