[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
Abel Braaksma wrote:
Perhaps I misunderstood what you are implying (should Mc Cartney be written McCartney? I didn't know). But if you mean that you want a list of exceptions that do not need to be split into words, then you are right: you'll need that list. We know little from the OP, we are only guessing here. I.e., is the string in one field, or is it part of a larger string? Should consecutive capitals be ignored or not? Are there exceptions? Can a string contain non-latin characters, or punctuation? I.e.:
1, 5 and 6 go well with my last regex, using "\{Lu}+".
For the rest, I think you need an exceptions list, which you can place as alternates at the start of the regex (which may yield funny results when the OPs text is from a larger corpus).
But all I'm doing is guessing on the requirements. Perhaps Babu will enlighten us? ;)
Re: [xsl] Formatting string
Subject: Re: [xsl] Formatting string From: Abel Braaksma <abel.online@xxxxxxxxx> Date: Wed, 16 May 2007 15:36:34 +0200 |
Abel Braaksma wrote:
Jesper Tverskov wrote:It is impossible to come up with a REGEX that can handle any combination of upper case and lower case. What about PaulMcCartney or JFK? If pascal notation is not used, XxxxXxxxx, or a similar strict pattern, a REGEX solution is only possible if we know all input strings from the start.
all provided solutions work with any combination of upper case and lower case. Which of the examples did you try?
PaulMcCartney would become Paul Mc Cartney with any of them.
Perhaps I misunderstood what you are implying (should Mc Cartney be written McCartney? I didn't know). But if you mean that you want a list of exceptions that do not need to be split into words, then you are right: you'll need that list. We know little from the OP, we are only guessing here. I.e., is the string in one field, or is it part of a larger string? Should consecutive capitals be ignored or not? Are there exceptions? Can a string contain non-latin characters, or punctuation? I.e.:
1. O'Reilly >>>> O'Reilly 2. McDonald's >>>> McDonald's 3. Paul McCartney >>>> Paul McCartney 4. J.K.Rowling >>>> J.K. Rowling (?) 5. JKRowling >>>> J K Rowling (?) 6. JFK >>>> JFK 7. BankOfUSA >>>> Bank Of USA
1, 5 and 6 go well with my last regex, using "\{Lu}+".
For the rest, I think you need an exceptions list, which you can place as alternates at the start of the regex (which may yield funny results when the OPs text is from a larger corpus).
But all I'm doing is guessing on the requirements. Perhaps Babu will enlighten us? ;)
Cheers -- Abel Braaksma
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Formatting string, Abel Braaksma | Thread | Re: [xsl] Formatting string, David Carlisle |
Re: [xsl] Convert xml to html with , Florent Georges | Date | Re: [xsl] Formatting string, Dimitre Novatchev |
Month |