[XSL-LIST Mailing List Archive Home] [By Thread] [By Date]

Re: [xsl] Breaking up Program Code Examples

Subject: Re: [xsl] Breaking up Program Code Examples
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 18 Sep 2012 16:54:17 -0400

At 2012-09-18 19:33 +0000, Craig Sampson wrote:
We are having issues with copying and pasting programming code examples in our documentation.
The documentation has coding examples and our users want to copy these lines of code and paste
them in their applications in order to run them. The newlines in the code, we're outputting in
as-is mode, were being lost

Lost? Do you mean that when you copy from the PDF when in Adobe there are no line-feeds when you paste the content?

so Antenna House suggested that we place <fo:block> elements
around each line of code. Outputting each line in its own block will hopefully eliminate the

What more can you say about the "problem".

We are using an xsl:tokenize statement to read up until the newline character. This works fine
for breaking up our code examples into individual lines but we have 2 problems.

1) We lose indents in the code when pasted.

That, I think, is an Adobe problem ... I doubt that can be fixed because it looks like it is a function of Adobe.

2) We allow a few embedded elements, like <userSuppliedValue>, within our <code> element
that we are losing through the tokenize. We still need these to surface so we can react
when they are present.

Which, as you discovered, means you cannot act on the string value of the content, which is what is supplied to tokenize().

Here is the example of the tokenize:

<fo:block xsl:use-attribute-sets="code" space-before="6pt" space-after="0pt"
white-space="pre" white-space-treatment="preserve" linefeed-treatment="preserve">
<xsl:for-each select="tokenize(., '\n')">
<fo:block><xsl:sequence select="."/></fo:block>

Here is an example of the XML:
<codeBlock eid="n0b7p8f53ujhsbn17928o22t9lyb">
<code eid="p01xqdrm89zmnon115yg9qm064mb">begin;

The <userSuppliedValue> element italizes the code it contains but
since it's getting eaten by tokenize what we get is:


I started with what I'm assuming is what you started with before you made the changes above ... my example code is below. It is formatted in the PDF as follows, with the "2" in italics:


But, I get problems in Windows Adobe Reader because copying the text doesn't copy the indentation spaces (you can actually see that in the background colour of the selection).

I get worse problems in MacOS Preview because copying the text appears not to copy the newline after "begin;" but the new line before "end;" is copied.

I even tried replacing the spaces with NBSP, knowing it isn't what I want in the copy buffer, and *still* the copy of the content didn't have the indentation .... so both pieces of software are handling NBSP as not applicable to the clipboard.

Then just to try to understand what was going on, I hacked the source file to introduce spaces *in* the line:

<?xml version="1.0" encoding="UTF-8"?>
<codeBlock eid="n0b7p8f53ujhsbn17928o22t9lyb">
<code eid="p01xqdrm89zmnon115yg9qm064mb">begin;
  x=y   +<userSuppliedValue>2</userSuppliedValue>;

.... and the result again looks right, but when copying the text to the clipboard even the three spaces after the "y" collapse to one. The text selection highlight shows nothing over the three space characters.

I think that tells me that the PDF interpretation of any space character simply moves the drawing point on the PDF page, it does not render space characters to the page, thus it does not have space characters to copy to the clipboard, except for a single (manufactured) space character between text areas on the same line.

Thanks for any suggestions on how to get this working.

As I understand the functionality of Adobe, I think you are out of luck. I'm pretty sure I discovered this 14 years ago when making my training material and books. So, in the classroom and when I publish my PDF books I give the customer a subdirectory of text files of the examples from the book. That way it even saves them the cut-and-paste activity because the files are standalone.

I doubt any formatter will "fix" this, as I think the PDF reader interpretation of the spaces is where the problem lies.

I hope this is helpful, even if it doesn't solve your problem.

. . . . . . . . . Ken

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

<xsl:template match="codeBlock">
<root font-family="Times" font-size="20pt">

    <simple-page-master master-name="frame"
                        page-height="210mm" page-width="297mm"
                        margin-top="1cm" margin-bottom="1cm"
                        margin-left="1cm" margin-right="1cm">
      <region-body region-name="frame-body"/>

  <page-sequence master-reference="frame">
    <flow flow-name="frame-body">

<xsl:template match="code">
  <block white-space="pre" font-family="monospace">

<xsl:template match="userSuppliedValue">
  <inline font-style="italic">


Public XSLT, XSL-FO, UBL and code list classes in Europe -- Oct 2012
Contact us for world-wide XML consulting and instructor-led training
Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm
Crane Softwrights Ltd.            http://www.CraneSoftwrights.com/s/
G. Ken Holman                   mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Google+ profile: https://plus.google.com/116832879756988317389/about
Legal business disclaimers:    http://www.CraneSoftwrights.com/legal

Current Thread