Page 1 of 1

Oxygen looks in wrong dir for DTD, entities

Posted: Thu Nov 03, 2005 6:27 pm
by Paul Vinkenoog
I have XML master files with relative paths to the DTD and to external entities. When I want to validate such a document I get a FileNotFoundException because the DTD "D:\Program Files\docs\docbookx\docbookx.dtd" can't be found. Evidently, Oxygen takes its program dir, not the file location, as the base for the relative path.

If I point to the web location of the DTD, the above problem is circumvented but then I get FileNotFoundExceptions on files that are included by external entities. Here again, it looks for those files under the Oxygen program dir.

Oddly enough, if I place the cursor on such an external entity reference and select "Open file at cursor" from the context menu, the file is found and opened immediately.

I know my source files and stylesheets setup as such is OK; it's been built up over several years and regularly used to generate output.

How can I solve this? I want to debug some templates in connection to a certain set of source files, but at the moment that's impossible. I've also searched for an option to tell Oxygen to use a certain base dir for a project, but couldn't find one.

(On a side note: why does Oxygen complain about not finding the DTD when checking well-formedness only?)

Posted: Thu Nov 03, 2005 9:50 pm
by george
Dear Paul,

Please make sure the file is saved on disk, oXygen will take the current working directory only if the file is a new, not yet saved file, otherwise the file location should be used as base for solving relative references.
If you still have problems please send us a cut down sample that we can use to reproduce the problem at support at oxygenxml dot com.

A conformant XML processor must read the DTD to check that a document is wellformed if a DTD is specified. Also DTDs can specify entity definitions so they are needed to check that a documet is wellformed or not.

Best Regards,
George

Posted: Fri Nov 04, 2005 2:11 am
by R2O
FWIW, I see the same problem. It's not relative paths to the DTD, but a path from the root.

That is, if your XML file includes:

<!DOCTYPE sourcedocument SYSTEM "/dir1/dir2/mydtd.dtd">

oXygen always looks in c:\dir1\dir2\mydtd.dtd for the file, even if the XML file is located on a different drive.

Posted: Fri Nov 04, 2005 8:14 am
by george
Hi,

/dir1/dir2/mydtd.dtd does not specify a relative location inside the current directory, you should remove the first / or add a .(dot) in front of it, that is either:

dir1/dir2/mydtd.dtd

or

./dir1/dir2/mydtd.dtd

Best Regards,
George

Posted: Fri Nov 04, 2005 4:42 pm
by R2O
Sorry for the confusion, but my situation is like the first poster. I don't want a relative path. I want an absolute path from the root. The drive letter and the location of my source XML file can change. So, imagine this directory structure:

d:/dir1/dir2/mydtd.dtd
d:/myfiles1/myfiles2/myfile.xml

Now, if myfile.xml includes the following declaration:

<!DOCTYPE sourcedocument SYSTEM "/dir1/dir2/mydtd.dtd">

oXygen looks in c:\dir1\dir2\mydtd.dtd rather than d:\dir1\dir2\mydtd.dtd. This sort of declaration works in other XML editors, so I know it can work. Also, doing something like ../../dir1/dir2/mydtd.dtd is not an option because the source XML files could reside in numerous directories. Each one would need a different DTD declaration, which is not an ideal solution.

Posted: Fri Nov 04, 2005 5:47 pm
by sorin_ristache
Hello,
R2O wrote:The drive letter and the location of my source XML file can change.

...

Now, if myfile.xml includes the following declaration:

<!DOCTYPE sourcedocument SYSTEM "/dir1/dir2/mydtd.dtd">

oXygen looks in c:\dir1\dir2\mydtd.dtd rather than d:\dir1\dir2\mydtd.dtd.

...

Each one would need a different DTD declaration, which is not an ideal solution.
In that case you should make the XML document portable: specify an URI as the system ID of the DTD, something like:

Code: Select all

<!DOCTYPE sourcedocument SYSTEM "file:/C:/dir1/dir2/mydtd.dtd">
and resolve it with an XML catalog like:

Code: Select all

<?xml version="1.0"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<system systemId="file:/C:/dir1/dir2/mydtd.dtd" uri="dir3/mydtd.dtd"/>
</catalog>
Please note that you need to specify a full path as the system ID in the XML document otherwise a conformant XML parser resolves first the path specified as system ID relative to the current document location. Using the above XML catalog the location of the DTD is independent of the location of the document that references it because the reference is resolved using the catalog. I placed the above catalog in D:\temp on my machine and a document in a folder on C: drive containing the declaration:

Code: Select all

<!DOCTYPE sourcedocument SYSTEM "file:/C:/dir1/dir2/mydtd.dtd">
and I placed the mydtd.dtd file in D:\temp\dir3. The path dir3/mydtd.dtd is resolved relative to the location of the catalog file. You can specify a full path in the catalog for the DTD file instead of the relative path dir3/mydtd.dtd if you need that.

Make sure you restart <oXygen/> after you add the catalog file to the list of XML catalogs from Options - Preferences - XML - XML catalog.

Regards,
Sorin

Posted: Fri Nov 04, 2005 8:30 pm
by R2O
I still wasn't clear. Catalogs won't help me because the drive letter can change. I may have a virtual drive for different views of my source. d: may provide one view and c: another. Both drives contain the DTD in the path specified (/dir1/dir2/mydtd.dtd), but these may be different versions of the DTD. Therefore, I don't want to include the drive letter in my DTD specification. I expect oXygen to figure out the drive letter based on the location of the source XML file. Instead, oXygen always looks in c: because that's where the oXygen executable is located. Does that make sense?

Again, this set-up works for other XML editors.

Posted: Fri Nov 04, 2005 8:36 pm
by R2O
Furthermore, like the first poster, I have entities that I want to reference from the root of a drive without being explicit about the drive letter. This is just like the DTD situation.

Posted: Mon Nov 07, 2005 4:34 pm
by Paul Vinkenoog
First, thanks to everybody who replied.
george wrote:Please make sure the file is saved on disk, oXygen will take the current working directory only if the file is a new, not yet saved file, otherwise the file location should be used as base for solving relative references.
The files are all on disk, they've existed for some time.
I've investigated the problem some more and found that:
  • Currently the filenames in the entities are prefixed with "file:" e.g. "file:firebirddocs/nullguide.xml"
    Oxygen tries to find them relative to its program dir, not to the master file dir. This fails of course.

    If I prefix with "file://" (as I should in a URL), Oxygen complains that it can't find the *netwok path* //firebirddocs/nullguide.xml

    If I leave out the protocol specifier and just write "firebirddocs/nullguide.xml", the file is found.
    However, this is not an option because I can't build my target docs like this: Saxon will (justifiedly) complain about the missing protocol.
Using absolute paths is not an option either, because the entire setup is part of a CVS repository at SourceForge. Project members install this stuff on all kinds of drives and directories, on both Windows and Linux systems.

Like I said before: the above problems occur when a file must be included e.g. for validation or when tracing/debugging. If I select the entity reference in the master file and choose "Open file at cursor" from the context menu, the file is found and opened.
george wrote:A conformant XML processor must read the DTD to check that a document is wellformed if a DTD is specified. Also DTDs can specify entity definitions so they are needed to check that a documet is wellformed or not.
Thanks for clarifying that.

Posted: Mon Nov 07, 2005 6:16 pm
by george
Hi Paul
Currently the filenames in the entities are prefixed with "file:" e.g. "file:firebirddocs/nullguide.xml"
Oxygen tries to find them relative to its program dir, not to the master file dir. This fails of course.
If you want to specify a replative location then remove the protocol, if you add a protocol that is not a relative location anymore, imagine you move your file on an HTTP server (or you access them from the same location but through HTTP).
If I prefix with "file://" (as I should in a URL), Oxygen complains that it can't find the *netwok path* //firebirddocs/nullguide.xml
If you look at RFC1738 (http://www.ietf.org/rfc/rfc1738.txt) you will see that after file:// it is expected the host.
If I leave out the protocol specifier and just write "firebirddocs/nullguide.xml", the file is found.
That is the correct specification of a relative location.
However, this is not an option because I can't build my target docs like this: Saxon will (justifiedly) complain about the missing protocol.
That is probably because you do not invoke it properly. I think you do not set the system ID for your source document when you invoke Saxon.

Best Regards,
George

Posted: Tue Nov 08, 2005 6:48 pm
by Paul Vinkenoog
Hello George,
george wrote:If you want to specify a replative location then remove the protocol, if you add a protocol that is not a relative location anymore
That's right, although one RFC mentions that "some older parsers" accept a protocol here if it's the same as the base protocol. Maybe this applies to Saxon 6.x's resolver too.
Further, Oxygen accepts "file:firebirddocs/nullguide.xml" as a relative path as well. But it takes it relative to the program dir, that's what's causing problems for me.
paul wrote:However, this (i.e. leaving out protocol) is not an option because I can't build my target docs like this: Saxon will (justifiedly) complain about the missing protocol.
george wrote:That is probably because you do not invoke it properly. I think you do not set the system ID for your source document when you invoke Saxon.
I realise that this is not a Saxon support forum, but what do you mean with setting the system ID? Saxon can be invoked with source filename + stylesheet filename. In my case it's done with the Ant style task, like this:

Code: Select all

<style basedir="${docs.dir}"
style="${style.dir}/fo.xsl"
in="${docs.dir}/${setname}.xml"
out="${fo.dir}/${docname}.fo">
<param name="rootid" expression="${rootid}"/>
</style>
Both stylesheet (via xsl:include) and document (through entities) include other files, and the builds always work fine. Except if you remove the "file:" prefixes from the relative paths in the entities, that is. (Removing it from the DTD location poses no problems.) We use Saxon 6.5.2, but I just tested and obtained the same results with 6.5.3 and 6.5.4.

Posted: Thu Nov 10, 2005 4:16 pm
by george
Hi Paul,

Try to invoke Saxon from command line on that transformation and see if you have problems. If yes, please post a cut down sample to allow us to reproduce the problem.

Best Regards,
George

Posted: Thu Nov 10, 2005 6:47 pm
by Paul Vinkenoog
Hello George,
george wrote:Try to invoke Saxon from command line on that transformation and see if you have problems.
No! Saxon happily accepts a mix of "file:"- and non-prefixed relative filenames in external entities, and parses everything. So the fact that it chokes on non-prefixed filenames in my usual setup must have something to do with the way Ant passes parameters to Saxon (more specifically the base URL and/or current directory). So it's "Go to the Ants" for me now to find wisdom - I probably should have known ;-)

Now, on to Oxygen's behaviour.

You were right about protocols not being allowed in relative URLs. However, several RFCs (e.g. 2396) mention that some parsers do accept this, provided the protocol is the same as in the base URL. This is due to a loophole in RFC 1630. Evidently, Saxon's parser (Aelfred with minor additions) is of that forgiving type.

I think Oxygen should either be strict and treat a URL like "file:child.xml" as absolute (in which case it usually won't resolve); or be forgiving like some others, discard the protocol prefix and resolve the URL correctly. Right now it does neither: it accepts it as a relative URL but uses the program dir as the base. At least when validating or debugging. When opening such a file in the editor (with "Open file at cursor") it resolves the location correctly.

Here is a very simple setup to observe that behaviour. First file mother.xml:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
<!ENTITY mychild SYSTEM "file:child.xml">
]>

<article id="the-article">
<title>Mama's article</title>
<para>Here comes my child:</para>
&mychild;
</article>
Second file, in same directory, child.xml:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>

<section id="child-section">
<title>Child's section</title>
<para>Content of child's section</para>
</section>
Validating and stepping will fail with (in my case) " FileNotFoundException-D:\Program Files\Oxygen 6.2\child.xml"

But opening child.xml via the context menu when the cursor is on "&mychild" works.

Greetings, and thanks for your help and suggestions,
Paul