webhelp outputs include content that should be filtered out
Oxygen general issues.
-
- Posts: 8
- Joined: Wed Jan 22, 2014 5:30 pm
webhelp outputs include content that should be filtered out
Hi,
webhelp includes content that should have been filtered out. No links to the content are available from the TOC, however the search engine detects and indexes the files and hence makes them accessible to customers through search.
Content inclusion is conditional: distribution=Internal_OBSOLETE
Is there a way to notgenerate an output for those topics ?
Thanks
Damien
webhelp includes content that should have been filtered out. No links to the content are available from the TOC, however the search engine detects and indexes the files and hence makes them accessible to customers through search.
Content inclusion is conditional: distribution=Internal_OBSOLETE
Is there a way to notgenerate an output for those topics ?
Thanks
Damien
-
- Posts: 222
- Joined: Tue Jul 01, 2014 11:48 am
Re: webhelp outputs include content that should be filtered out
Post by bogdan_cercelaru »
Hello,
If you are publishing WebHelp into a folder that contains other HTML files, all these files will be indexed (files that have the same extension as the one set using the "args.outext" parameter) and presented in the search results. The indexer runs on all HTML files from the output folder, without considering if topic is linked in the map.
To clean the output directory before WebHelp is generated, you can use the "clean.output" parameter and set it to "yes".
Regards,
Bogdan
If you are publishing WebHelp into a folder that contains other HTML files, all these files will be indexed (files that have the same extension as the one set using the "args.outext" parameter) and presented in the search results. The indexer runs on all HTML files from the output folder, without considering if topic is linked in the map.
To clean the output directory before WebHelp is generated, you can use the "clean.output" parameter and set it to "yes".
Regards,
Bogdan
Bogdan Cercelaru
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
-
- Posts: 8
- Joined: Wed Jan 22, 2014 5:30 pm
Re: webhelp outputs include content that should be filtered out
This is not the problem.
The problem is that some dita content that should not be part of the output (cause they are excluded by the value set in the ditaval file) are present in the generated webhelp. Then they are indexed and present in the search results.
Damien
The problem is that some dita content that should not be part of the output (cause they are excluded by the value set in the ditaval file) are present in the generated webhelp. Then they are indexed and present in the search results.
Damien
-
- Posts: 9436
- Joined: Fri Jul 09, 2004 5:18 pm
Re: webhelp outputs include content that should be filtered out
Hi Damien,
If you remove the output and temporary files folder manually before publishing, do you still obtain those extra HTML files coming from the extra topics which should be excluded?
Maybe you excluded the topics in the DITA Map but there may be other topics present in the publication linking to them. And you should also profile those links.
If you run the "Validate and check for completness" action from the DITA Maps Manager, you can run it with a DITAVAL filter configured. It might report that certain topics are not referenced in the DITA Map but there are links to them from other topics present in the DITA Map.
Regards,
Radu
If you remove the output and temporary files folder manually before publishing, do you still obtain those extra HTML files coming from the extra topics which should be excluded?
Maybe you excluded the topics in the DITA Map but there may be other topics present in the publication linking to them. And you should also profile those links.
If you run the "Validate and check for completness" action from the DITA Maps Manager, you can run it with a DITAVAL filter configured. It might report that certain topics are not referenced in the DITA Map but there are links to them from other topics present in the DITA Map.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 40
- Joined: Wed Jan 29, 2014 4:30 pm
Re: webhelp outputs include content that should be filtered out
Hi Radu,
I can confirm that
- the book is valid when using the Oxygen check completeness and the DITAVAL
- the profiled content is excluded from the PDF
- the profiled content is excluded from the TOC and next/previous page navigation
thus the problem is due to the fact that, despite it should be excluded, the topic is converted to HTML and then indexed.
Pascale
I can confirm that
- the book is valid when using the Oxygen check completeness and the DITAVAL
- the profiled content is excluded from the PDF
- the profiled content is excluded from the TOC and next/previous page navigation
thus the problem is due to the fact that, despite it should be excluded, the topic is converted to HTML and then indexed.
Pascale
-
- Posts: 9436
- Joined: Fri Jul 09, 2004 5:18 pm
Re: webhelp outputs include content that should be filtered out
Hi Pascale,
Do you have the same problem when converting to XHTML?
You could try to make an experiment, for example make one of the topics which should not be reachable anymore using the DITAVAL filter not wellformed (remove the root start tag for example).
Then configure Oxygen in the Preferences->DITA page to always show the console output and publish to WebHelp, after that you can look in the DITA OT console view to see in what part of the processing the DITA OT will try to read it. It might give us some indication about the problem.
Also, if you can put together a sample DITA project to reproduce the problem and attach it to an email, we could try to investigate this on our side.
Regards,
Radu
Do you have the same problem when converting to XHTML?
You could try to make an experiment, for example make one of the topics which should not be reachable anymore using the DITAVAL filter not wellformed (remove the root start tag for example).
Then configure Oxygen in the Preferences->DITA page to always show the console output and publish to WebHelp, after that you can look in the DITA OT console view to see in what part of the processing the DITA OT will try to read it. It might give us some indication about the problem.
Also, if you can put together a sample DITA project to reproduce the problem and attach it to an email, we could try to investigate this on our side.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
-
- Posts: 40
- Joined: Wed Jan 29, 2014 4:30 pm
Re: webhelp outputs include content that should be filtered out
Hi Radu,
we found the cause of the problem: we are using DITA OT 2.2.2 (not 1.8 ) and this version seems to have severe problems with the XML Catalogs.
In many occasions, and in particular when executing the [gen-list] and [debug-filter] targets, OT issues messages like:
Our workaround is to copy the DTD where it is needed, by adding to the plugin:
and by defining a new ANT target in the buildxxx.xml file:
With that, the filtering occurs as expected: the excluded content is not present anymore, and the search behaves correctly.
Kind regards,
Pascale
we found the cause of the problem: we are using DITA OT 2.2.2 (not 1.8 ) and this version seems to have severe problems with the XML Catalogs.
In many occasions, and in particular when executing the [gen-list] and [debug-filter] targets, OT issues messages like:
Code: Select all
Failed to read DITAVAL file: ...\ditaval.dtd (The system cannot find the file specified)
[DOTJ037W][WARN] The XML schema and DTD validation function of the parser is turned off. Please make sure ....
Using Xerces grammar pool for DTD and schema caching.
Our workaround is to copy the DTD where it is needed, by adding to the plugin:
Code: Select all
<feature extension="depend.preprocess.pre" value="copy-ditaval-dtd"/>
Code: Select all
<target name="copy-ditaval-dtd" description="Copy DITAVAL DTD">
<dirname property="ditaval.dir" file="${dita.input.valfile}" />
<copy file="${basedir}/plugins/org.oasis-open.dita.v1_2/dtd/ditaval/dtd/ditaval.dtd" todir="${ditaval.dir}" />
</target>
Kind regards,
Pascale
-
- Posts: 9436
- Joined: Fri Jul 09, 2004 5:18 pm
Re: webhelp outputs include content that should be filtered out
Hi Pascale,
It's good you found the problem. I'm assuming the referenced DITAVAL DTD is in the proper location (relative to the DITAVAL file) somewhere in the sources folder?
But then probably the DITA OT does not copy it to the temporary files folder and thus the parsing problem arises. Whenever an XML document with an associated DTD is properly parsed, that DTD needs to be found and resolved otherwise the parsing fails.
I guess as a workaround you could remove that DOCTYPE declaration entirely. Oxygen will still validate the DITAVAL when it's opened.
If you want you can also try to register a bug on the DITA OT issues list.
Regards,
Radu
It's good you found the problem. I'm assuming the referenced DITAVAL DTD is in the proper location (relative to the DITAVAL file) somewhere in the sources folder?
But then probably the DITA OT does not copy it to the temporary files folder and thus the parsing problem arises. Whenever an XML document with an associated DTD is properly parsed, that DTD needs to be found and resolved otherwise the parsing fails.
I guess as a workaround you could remove that DOCTYPE declaration entirely. Oxygen will still validate the DITAVAL when it's opened.
If you want you can also try to register a bug on the DITA OT issues list.
Regards,
Radu
Radu Coravu
<oXygen/> XML Editor
http://www.oxygenxml.com
<oXygen/> XML Editor
http://www.oxygenxml.com
Jump to
- Oxygen XML Editor/Author/Developer
- ↳ Feature Request
- ↳ Common Problems
- ↳ DITA (Editing and Publishing DITA Content)
- ↳ SDK-API, Frameworks - Document Types
- ↳ DocBook
- ↳ TEI
- ↳ XHTML
- ↳ Other Issues
- Oxygen XML Web Author
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Content Fusion
- ↳ Feature Request
- ↳ Common Problems
- Oxygen JSON Editor
- ↳ Feature Request
- ↳ Common Problems
- Oxygen PDF Chemistry
- ↳ Feature Request
- ↳ Common Problems
- Oxygen Feedback
- ↳ Feature Request
- ↳ Common Problems
- Oxygen XML WebHelp
- ↳ Feature Request
- ↳ Common Problems
- XML
- ↳ General XML Questions
- ↳ XSLT and FOP
- ↳ XML Schemas
- ↳ XQuery
- NVDL
- ↳ General NVDL Issues
- ↳ oNVDL Related Issues
- XML Services Market
- ↳ Offer a Service