[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Re: [xquery-talk] backticks in regex - tales of the unexpected part II


Subject: Re: [xsl] Re: [xquery-talk] backticks in regex - tales of the unexpected part II
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Tue, 8 Apr 2014 20:01:20 +0200

There is an uncanny legacy of hackers (and typists) abusing certain
special characters. There is quoting `like this', which abuses the
grave as an opening quote and the apostrophe as a matching closing
quote. Read Donald E. Knuth (and others) on the subtleties of
typesetting in English and other languages, where French, German and
English roll their own w.r.t. quotes. If you have a dataset constantly
misusing some character, fix it with some simple tool, but don't blame
clean SW.

-W

On 8 April 2014 19:28, Ihe Onwuka <ihe.onwuka@xxxxxxxxx> wrote:
> it and every other backtick in the dataset I am dealing with is a
> mistyped quotation  mark.
>
> Exhibit 1
>
> Aisha`s Song but is supposed to be referring to
> http://www.imdb.com/title/tt1950067/
>
> On Tue, Apr 8, 2014 at 6:20 PM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>>
>> On 7 Apr 2014, at 17:07, Ihe Onwuka <ihe.onwuka@xxxxxxxxx> wrote:
>>
>>> backticks match the \w regex class which does seem at odds with the
>>> definition of that class.
>>
>>
>> You might call it a backtick, and misuse it as a kind of quotation mark, but its proper Unicode name and intended semantics is "grave accent", and the \w category includes all non-spacing diacriticals.
>>
>> Michael Kay
>> Saxonica


Current Thread
Keywords