Profiles

The default set of profiles will suit the most purposes and allows PDFs of various kinds to be compared to each other. But for special use cases i-net PDFC provides the option to define and manage custom profiles that exactly are in line with the documents.

Managing Profiles

i-net PDFC is shipped with a set of immutable default profiles. To create a custom profile, please select the default profile that suites your requirements the most and open it by clicking on Show >

Since default profiles can't be modified you'll have to create a copy of the profile. To do so click on the Duplicate labels on the top or bottom of the component.

A name has to be provided for the new profile and can not be changed later.

Import/Export of profiles

Each profile can be stored to an external file by clicking on Export at the bottom of the profile panel. The export files are portable and can be used for any type of i-net PDFC installation - GUI, API and Server.

To import a profile, create a custom profile and select it as the active one, The Import label will be available at the bottom of the profile configuration page. Click on this label to select a file to import. Alternatively drag&drop a profile XML file into the panel to load the settings.

The selected imported profile will replace all settings of the current profile.

Profile settings

A profile basically contains the settings for comparison mode, element comparison types and filters to be used. Each filter or comparison type may have additional options to fine-tune the feature.

Comparison mode

This option has the biggest impact on the comparison. Here are the differences between the default comparison mode and the “strict mode”:

Default mode Strict mode
Allows for parts of the document to be matched even when one part is located further down the document due to an inserted paragraph or element.Each part of the document must be lined up in the same position in both documents in order to be seen as matching. This means that if a paragraph is inserted in the one document, all content underneath this paragraph will be seen as different to the other document since it will have moved.
Places an emphasis on the continuous flow of content being the same, as opposed to look/location on each individual page. Places an emphasis on both location of elements AND content.

Filter and Optimization

i-net PDFC offers various specialized optimizations for comparing content of specific kinds. You can turn these optimizations on and off at will.

The Header & Footer optimization looks for similar page header and footer areas which are then excluded from the comparison to avoid unnecessary distinctions being made between print dates, for example. This feature has an auto detection mode and a manual mode in case the headers and footers are to complex to be detected correctly.

If you have set them to manual mode, you must specify how big the headers or footers are.

Multi Columns

The option ensures content in columns is only compared with content in the same column in the other document. It should only be used if strictly required since it uses a heuristic approach to identify potential columns.

Invisible Elements

The Invisible Elements filter should be enabled if you get any differences referring to elements not being visible in the printed PDF document. It removes all transparent or white drawing elements and all images with a zero width or height.

Text by pattern

With the Text by pattern filter it is possible to use one or more regular expressions to exclude text from comparison. With a click on “Add text filter” you can open the list of specified regular expression filters.

Compared Types

Certain element types may be irrelevant for your comparison, so you can turn them off in this section.

Text

Text and words are the primary content elements of most documents. Enable this option to compare text elements.

The tolerance percentage (displayed as a tooltip) is the percentage of a line height that lines can be apart for them to still be recognized as identical between the documents.

If you are in strict mode, additionally you will be able to set the actual distance in pixels that the words and lines may be apart from each other in order to still be declared identical.

There are some additional features to fine tune the text and word comparison

  • Case sensistive defines whether text is compared case sensitive
  • Simplify special character normalizes typographical differences like special types of hyphens or ligatures to basic characters
  • Solve OCR misinterpretations normalizes typical text recognition problems like an 'rn' that was recognized to be an 'm'
  • Exclude rotated text will cause the comparer to ignore all rotated text elements which is useful, e.g. for watermarks.
  • Fontcolor Ignore the font color changed
  • Fontstyle Ignore the font style changed for example “bold” or “italic”
  • Fontsize Ignore the font size changed
  • Fontfamily Ignore the font family changed

Lines and Shapes

Lines and shapes can be compared as well, this will compare each and every line in the document for differences. It is recommended to leave this option off unless necessary, since little movements and extra space can cause lines to be placed at different positions, leading to a multitude of detected differences. You can additionally decide whether line styles (such as dashed vs. dotted lines) are to be compared, as well as define the tolerance level for the differences in line sizes (length/thickness).

The tolerance levels here are measured in pixels - e.g. a tolerance of “20 pixels” for the size would cause a line which is 50 pixels wide to be seen as identical to a line which is 30 pixels wide, but as different to a line which is 29 pixels wide.

Images

Images are compared by their similarity in colors and size. The comparison uses an area average algorithm to tolerate slight quality changes or aliasing artifacts.

The tolerance levels are in percentage in comparison to the image of the first PDF - a size tolerance of 100%, for example, means that the same image could be up to twice as large or as small in the second PDF and still be seen as identical. 0% would mean the image would have to be the exact same size.

Color tolerances are percentages of the entire color spectrum - a color tolerance of 100% causes the image comparison to be reduced to simply making sure an image - any image - exists at the same position.

 

© Copyright 1996 - 2017, i-net software; All Rights Reserved.