fuzzy search

Post here questions and problems related to editing and publishing DITA content.
galanohan
Posts: 115
Joined: Mon Jul 10, 2023 11:49 am

fuzzy search

Post by galanohan »

Hi Guys,

Tech support team asked me if our offline webhelp supports fuzzy search. Is there a parameter that enables fuzzy search in webhelp's search bar? Or is is possible to add some script like js to enhance the search in offline webhelp?

For example (well this example was provided by GPT-4), with the following js script:

// Get the input element and search results container
const input = document.getElementById('search-input');
const resultsContainer = document.getElementById('search-results');

// Listen for input changes on the search input
input.addEventListener('input', () => {
// Get the search query from the input value
const query = input.value.toLowerCase();

// Filter the search results using a fuzzy search algorithm
const filteredResults = searchResults.filter(result => {
// Convert the result title and description to lowercase
const title = result.title.toLowerCase();
const description = result.description.toLowerCase();

// Check if the query is a substring of either the title or description
if (title.includes(query) || description.includes(query)) {
return true;
}

// If the query is not a substring, use a fuzzy search algorithm to match on similar characters
const titleScore = fuzzyMatch(title, query);
const descriptionScore = fuzzyMatch(description, query);

// If the score is above a certain threshold, include the result in the filtered results
if (titleScore > 0.5 || descriptionScore > 0.5) {
return true;
}

// Otherwise, exclude the result from the filtered results
return false;
});

// Render the filtered results in the search results container
renderResults(filteredResults);
});

// Fuzzy match function using the Levenshtein distance algorithm
function fuzzyMatch(str1, str2) {
const matrix = [];

// Initialize matrix with default values
for (let i = 0; i <= str1.length; i++) {
matrix = ;
}

for (let j = 0; j <= str2.length; j++) {
matrix[0][j] = j;
}

// Calculate Levenshtein distance for each character pair
for (let i = 1; i <= str1.length; i++) {
for (let j = 1; j <= str2.length; j++) {
if (str1.charAt(i - 1) === str2.charAt(j - 1)) {
matrix[j] = matrix[j - 1];
} else {
matrix[j] = Math.min(
matrix[j - 1] + 1,
matrix[j] + 1,
matrix[j - 1] + 1
);
}
}
}

// Calculate similarity score based on Levenshtein distance
const distance = matrix[str1.length][str2.length];
const maxLength = Math.max(str1.length, str2.length);
const score = 1 - distance / maxLength;

return score;
}


"This script listens for changes to an HTML input element with ID search-input, and filters an array of search results using a fuzzy search algorithm based on Levenshtein distance. The filtered results are then rendered in an HTML container with ID search-results. You can modify this code to fit your specific use case."

with the preceding script, I create a js file named fuzzy.js under /js to contain it. Then I created an html file under /fragments named js.html to call the fuzzy.js:
<!DOCTYPE html>
<html>
<script src="${oxygen-webhelp-template-dir}/js/fuzzy.js" defer="defer"></script>
</html>

And in the opt file, I added the following to include the preceding files for webhelp generation:

<html-fragments>
<fragment file="fragments/js.html" placeholder="webhelp.fragment.after.body"/>
</html-fragments>
<resources>
<css file="styles.css"/>
<fileset>
<include name="resources/**/*"/>
<exclude name="resources/**/*.svn"/>
<exclude name="resources/**/*.git"/>
<include name="js/**"/>
</fileset>
</resources>

Then I had quick test on a sample map with topics that contain lots of "explorer" in the body. In the search test, pages containing "explorer" are the targets while the keywords I used for searching were "expl" and "exp", both returned the pages I wanted, though ranked differently.

Did I do it the right way?

Regards,
Galano