Index each letter as a separate word using GroupDocs.Search for .NET

Share on FacebookTweet about this on TwitterShare on LinkedIn

Are you looking for a full-text search API that allows you to search over a lot of document formats? In that case, GroupDocs.Search for .NET will meet your requirements. API creates index and then perform instant search across thousands of documents.

Those who are already working with the API, we have some new features and improvements. Moreover, some classes have been renamed to improve code readability. There are minor changes in the new version 19.10, so the migration will not be too difficult. API architecture is optimized for better performance.
After upgrading to v19.10, you need to replace the namespace usage across the entire project from GroupDocs.Search to GroupDocs.Search.Legacy to resolve build issues.

Lets go though the code changes:
Old code sample:
https://gist.github.com/1e84ff71f71f21dc4274015c4f916a0c

New code snippet:
https://gist.github.com/4919cfb96e5041b92507e8b2935a047b

You can observe the minor changes (e.g. SearchParameters is changed to
SearchOptions).

Improvements

  • Highlight search results in short fragments
  • Enhance document metadata indexing with new formats

New Features

  • Index each letter as a separate word
  • Implemented ability to remove paths from index

How to highlight search results in short fragments?
This improvement allows highlighting the search results in separate short fragments of the text, and not in the whole document. Below example shows how to generate short HTML snippets with highlighted found terms:
https://gist.github.com/297af296115ebb29b5a8ded0b6ec9cac

How to enhance document metadata indexing with new formats?
This improvement adds support for new document formats. These are mostly documents, the main content of which is not textual, therefore only the metadata of these documents is indexed:

  • MP3 – MPEG-2 Audio Layer III;
  • WAV – Waveform Audio File Format;
  • BMP – Bitmap Picture;
  • GIF – Graphical Interchange Format File;
  • JP2 – JPEG 2000 Core Image File;

For complete list visit this article.

How to index each letter as a separate word?
This feature is designed to work with hieroglyphic languages and allows you to index each character in the text as a separate word, regardless of the presence of separators.
https://gist.github.com/8fe7c13d4e392bf9af43d3255ef20749

Ability to remove paths from index
When indexed paths are removed from an index, the index is updated and all removed documents and folders become inaccessible for search.
https://gist.github.com/b70500ee090ed206f971b7c9f7d15125

We’d recommend you to download the latest version and share your experience. In case of any issues, you can post on forum.



The post Index each letter as a separate word using GroupDocs.Search for .NET appeared first on Document Manipulation APIs Blog – groupdocs.com.