Page 1 of 1

Scoring.properties help

Posted: Tue Nov 17, 2020 1:07 am
by qualler
I've been making some updates to the scoring.properties file. I've read the help topic, but I have a scenario I'm not sure about.

In the output I have a title like this: Job Card PRE A001 - Third-Party Review Workstation Compatibility Checks

When I search exactly for this title, it is not the first result. It's actually the 4th result. The engineering team finds this odd. In fact, the first result is actually another file, which contains a link to the PRE A001 topic.

So, I thought, well, I'll add a = 10 to the scoring.properties file and see if that does the trick (title is set to 30). But it didn't work.

Two things:
  • It seems the - in the title (between A001 and Third-Party) changes the results. Although it's not documented (that I could find) it seems - might mean "exclude". Could I have clarification on that point?
  • The topic that is first in the results has a ton of instances of "pre" and "job card" and I'm guessing this is influencing the ratings. Does each instance add to the 'weight'?
Given the information I've included, can someone suggest what I could do to get the correct topic to the list?

Definitely appreciate any help.

Justin

Re: Scoring.properties help

Posted: Tue Nov 17, 2020 4:52 pm
by radu_pisoi
Hi,

The search results are sorted depending on a score/relevance computed for each document. The score computing depends on the context of search terms, for example a word found in a title scores better than a word found in unformatted text. Also, the score depends on the number of occurrences of search term in document, so it is possible that a topic that has a lot of occurrences of the search term to be displayed in front of a document where search term is located in the document title.

For further investigations, can you send us a sample DITA map containing the topic that you complain and also topics that are displayed in front of it. Please send the sample DITA map on support@oxygenxml.com. Also, please specify the WebHelp version.

Re: Scoring.properties help

Posted: Fri Nov 20, 2020 11:25 pm
by qualler
Thanks Radu, I exported the files and sent them over to the support address. Curious if you have any thoughts on the matter and appreciate your help.

Justin

Re: Scoring.properties help

Posted: Wed Jan 06, 2021 5:52 pm
by qualler
Hi Radu, have you have a chance to look at this? I have another issue with the search where a word multilingual is in keywords/keyword yet when I search specifically for that word, it's not found. I'll resend my email with the earlier example. Thanks

Re: Scoring.properties help

Posted: Wed Jul 13, 2022 7:20 pm
by Dan02
Hi Radu,
I have a similar issue with the scoring.properties file using keywords.
To illustrate:
- I have set a placeholder keyword of "dogs" in topic 1:
<prolog>
<metadata>
<keywords>
<keyword>dogs</keyword>
</keywords>
</metadata>
</prolog>
- In topic 2, I have entered two instances of "dogs" in the body of the topic, within <p> tags.
<conbody>
<p>dogs</p>
<p>dogs fighting cats</p>
</conbody>
- To prioritize topic 2 over topic 1, I have set the following in the scoring.properties file:
meta_keywords = -100
However, when I execute a search, topic 1 is always listed first in the results.
So in the end what I am trying to do is increase the priority of the meta_keywords, but the example above is a quick way to show that setting a value on meta_keywords has no impact on the results. So even in cases when I set meta_keywords to a positive value (= 100), it does not prioritize the keyword matches over content that matches in body text.
Any insight you can share would be greatly appreciated.
Thanks!
Dan
***
Using Oxygen 24.1. Full scoring.properties file is below.
And to be clear, I have been able to influence results in other instances using for example a=-30, so the issue is not related to my install.

# HTML title
title=20

# Headings
h1 = 10
h2 = 9
h3 = 8
h4 = 7
h5 = 6
h6 = 5

# Meta keyword and indexterm
meta_keywords = -100
meta_indexterms = 15

# Short description
meta_description = 9
# p.shortdesc=7

b = 5
strong = 5
em = 3
i=3
u=3

# Ignored elements
div.ignore=ignored