Class ResourceSearchTextExtractorImpl

java.lang.Object
ch.tocco.nice2.dms.impl.entitylistener.ResourceSearchTextExtractorImpl
All Implemented Interfaces:
ResourceSearchTextExtractor

@Component public class ResourceSearchTextExtractorImpl extends Object implements ResourceSearchTextExtractor
Extract the text content of a binary with apache tika. The service can be configured through application properties: nice2.dms.fullTextIndex.ignoreFileExtensions: a comma separated blacklist of file extensions for which extensions the extraction process is skipped. nice2.dms.fullTextIndex.maxContentSizeInMb: if the content is larger than the threshold an empty string is returned. nice2.dms.fullTextIndex.maxFileSizeInMb: if the file is larger than the content is not extracted and an empty string is returend.
  • Constructor Details

    • ResourceSearchTextExtractorImpl

      public ResourceSearchTextExtractorImpl(org.slf4j.Logger log)
  • Method Details

    • extractContent

      public String extractContent(Binary binaryValue) throws IOException
      Specified by:
      extractContent in interface ResourceSearchTextExtractor
      Throws:
      IOException
    • setIgnoreFileExtensions

      @Value("${nice2.dms.fullTextIndex.ignoreFileExtensions}") public void setIgnoreFileExtensions(String ignoreFileExtensions)
    • setMaxContentSizeInMb

      @Value("${nice2.dms.fullTextIndex.maxContentSizeInMb}") public void setMaxContentSizeInMb(double maxContentSizeInMb)
    • setMaxFileSizeInMb

      @Value("${nice2.dms.fullTextIndex.maxFileSizeInMb}") public void setMaxFileSizeInMb(double maxFileSizeInMb)