Document Filters


Document Filters is an SDK that gives software developers the tools they need to embed deep inspection capabilities into their applications. Document Filters uses this unique technology to identify, extract and render all the text and metadata – including hidden content – in a file. This means that your applications can extract and process content from hundreds of file formats without the need for the source application. Document Filters is deployed across a variety of  industries, supporting some of the world's largest eDiscovery, Security, DLP, AI, ML and Analytics enterprises.


  • Transform all your data into valuable insights: Unique ‘deep inspection’ capability lets you identify, extract, analyze and transform all of the text and metadata in a file – even what was previously hidden (e.g., tracked changes, comments, notes, annotations and embedded web links) from 600+ file formats.
  • Expertly convert content into high-fidelity renditions: Create renditions using your preferred output format, including HTML5, PDF, multi-page TIFF, PNG, SVG or PostScript in 15 lines (of code) or less.
  • Easily view and manipulate files in a high-quality format: View, render and even manipulate (annotate, redact and markup) the extracted content in near pixel-perfect high-definition.
  • Identify the true nature of your content: Intelligent file identification means the source content is accurately identified for filtering, without simply relying on the filename extension.
  • Choose the language that works for you: C/C++, Java, C#, VB.NET and Python are supported out-of-the-box, and the library can be called from any language that supports “C” bindings.
  • Fast and easy deployment on virtually any platform: Natively supported on 29 platforms – including Windows, MacOS, Linux, FreeBSD, Solaris, HP-UX and AIX – Document Filters does not require bloated run-time, providing faster performance and easier deployment.



Document Filters works behind the scenes to identify, extract, transform and output content into usable data and file formats.


Document Filters is the ideal OEM technology for processing unstructured content outside of native applications. This powerful toolkit is the key catalyst driving content mining and intelligence-gathering across a key range of business applications. Some examples include:s applications. These include:

  • eDiscovery
  • Capture
  • Compliance and security
  • Data loss prevention
  • Text/data analytics
  • Enterprise content management
  • Document/email archival
  • Enterprise search
  • Artificial intelligence
  • Intelligent document processing


As an embeddable set of components, Document Filters serves as the 'intelligence' inside solutions from many global ISVs and SaaS vendors. It also helps drive content gathering and mining for ISV software applications.


The above images show a file in its native application and the high-definition image that Document Filters generates as a result of its file conversion and rendering process.