i-net PDF Content Comparer v3.0

1. Introduction

i-net PDFC helps you compare two PDF files (or PDF files that are contained in two folders) for content differences. In addition to simply tracking changes in your documents, this can be useful for comparing the output of two different PDF generator programs, e.g. OpenOffice and Adobe PDF Writer, when migrating from one to the other, or comparing the output of different versions of one program, e.g. two different versions of i-net Clear Reports.

i-net PDF Content Comparer, as the name states, does not do a simplistic pixel-based or structural comparison but an element-based check for actual content differences in the documents.

The following elements are compared and any differences are logged:

  • Text differences (letters or words missing)
  • Line/Arc/Box differences (lines or boxes missing or with different styles)
  • Image differences (images missing or pixel values differ)
  • Margin differences (page margins different)

These differences each have a configurable tolerance value so that minor differences can be ignored if necessary. For example, it might not matter to you whether a line or box is misplaced by a couple of pixels, as long the line is there and looks as intended.

2. Using i-net PDFC

To use i-net PDFC, execute pdfc-<version>-setup.exe to install the application. For other platforms there are installation packages available as well. i-net PDFC Content Comparer can be used in various ways. You can run the PDFC as Graphical Tool, drag in the PDFs you wish to compare, click on the green button, and view the comparison.

If you want to use i-net PDFC in scripts, you can also execute it on a command line, e.g. using a .bat or .sh file. As alternative you can use the API to write your own Java program that can be executed as standalone application, JUnit test or integrated into a continuous integration system such as Jenkins. See the section Programming (API) for more information about API and Java code samples.

2.1 System Requirements

i-net PDFC is platform-independent. The command line version runs on any platform that supports Java 7 or higher (e.g. Windows, Linux, Unix, Solaris, Mac OS X 10.4+, …).

2.2 Command Line

To compare PDF files with a result output in form of images and a difference log, you can execute i-net PDFC on a command line. To start it manually the i-net PDFC package contains the start files runPDFC.bat for Windows and runPDFC.sh for Unix/Linux.

Usage:

runPDFC [-c <config file>] [-[i][o]] [<Folder1> <Folder2> | <File1> <File2>]
Parameter Description
-c Specifies the configuration file (config.xml) for i-net PDFC. If none is specified, the default “config.xml” will be used.
-i Creates difference images in <working directory>/differences for any differences found (recommended for a graphical comparison)
-f Creates diff images for any differences found for the first file (the same as -i if combined with -s)
-s Creates diff images for any differences found for the second file (the same as -i if combined with -f)
-x Creates XOR difference images, may be combined with -i. If this parameter is specified, the difference image contains a third image in which the images of both sides are laid over one another, so that small differences can be seen more clearly.
-o Creates images for each page of each version (need only be used for debug purposes)
-r <path> Set the root folder for difference images and comparison report. If this parameter is specified, i-net PDFC creates a sub-directory for each pair of PDF files beneath the root directory. Otherwise it is using the working directory. In this sub-directory it stores the page images and difference images in the directory “differences”.
-e Creates a comparison report in <working directory>/differences providing an overview about found differences
-activate <key> Used to activate an activation-key or to manually enter a license key
-p Creates a comparison-PDF visually showing the differences between the compared PDF files

Note that if using two folders, the PDF files must have the same names in each folder.

It is also possible to configure the parameter values using the file config.xml (see Configuration).

Example usage:

Assuming you installed i-net PDFC into the directory “C:/PDFC” and the sub-directories “folder1” and “folder2” contain PDF files with the same name, you can start the comparison with the following command:

  • runPDFC.bat folder1 folder2
    This will compare all PDF files in the folder “folder1” with the PDF files of the same name in the folder “folder2” and will result in an output on the console for any differences found between the compared PDF files in “folder1” and “folder2”.
  • runPDFC.bat -i folder1 folder2
    In addition to the console output this creates PNG images in the subfolder ”<working directory>/differences” in which any detected differences that were found are marked in color.
  • runPDFC.bat -i -o folder1 folder2
    This creates additional PNG images for each page of the compared PDF files in the subfolder ”<working directory>/differences/<pdf-file-name>” with the original page content.
  • runPDFC.bat -i -o folder1 folder2 > pdfc.log
    Additionally you can write the console output in a file. With the parameter LOG_LEVEL you can disable the log output and configure the amount of log output.

You can do the same with single PDF files:

  • runPDFC.bat -i D:/my-PDF-files/file1.pdf D:/my-PDF-files/file2.pdf > pdfc.log
    This will compare both PDF files and create the difference images in the folder ”<working directory>/differences”.

2.3 PDFC as Graphical Tool

For a quick and easy visual comparison of PDF files simply start up PDFC as a graphical tool. This features a more intuitive and interactive visualization of the differences and can be configured completely in this application.

The PDFC graphical tool uses the configuration files ”.pdfc” and ”.pdfcgui” located in the current users home directory. There is no need to edit those files except for rarely needed fine tuning.

The i-net PDFC graphical tool has an additional help/manual which can be accessed from within the application.

How to start this tool depends on the operating system you're using.

Windows

Run the i-net PDFC.exe which is installed by the downloaded pdfc-<version>-setup.exe.

Mac OS

Install i-net PDFC by downloading the pdfc-<version>.dmg. This will create an i-net PDFC App on your system.

Linux

Install the i-net PDFC .deb or .rpm file. This will create an application launcher in the 'Office' category which runs i-net PDFC.

Command line parameters

The GUI executable accepts up to two command line parameters. These parameters will preset the first and second(optional) PDF for the comparison. This will not start the comparison process automatically.

i-net PDFC.exe <File1> <File2>

2.4 Programming (API)

2.4.1 Java API

With the provided Java API you can call i-net PDFC from a Java program to compare mutiple PDF files programmatically. You can find more information in the API documentation and in the Java code samples directory: “java/samples”.

2.4.2 .Net API

With the provided .Net (C#) API you can call i-net PDFC from a .Net program to compare mutiple PDF files programmatically. You can find more information in the API documentation and in the C# code samples directory: “dotnet/samples”.

2.5 Example: Run PDFC within Jenkins

You can use i-net PDFC within a Jenkins job to compare PDF files automatically.

The i-net PDFC installation package contains a sample Ant script and a Java sample that executes i-net PDFC as JUnit test. This sample can be found in the directory: '<install-dir>/java/samples/junit'.

To integrate i-net PDFC sample with Jenkins, the following requirements must be fulfilled:

  1. Jenkins must be installed and configured, so that Ant scripts can be executed
  2. the CompareTwoFoldersAsUnitTest.java file in the directory '<install-dir>/java/samples/junit' must be compiled and packaged into a Java archive (.jar)
  3. the Ant script from the sample requires a directory containing all JARs from i-net PDFC and a JUnit JAR and the previously created sample JAR.

2.5.1 Jenkins / Hudson Integration sample

To integrate the provided sample into Jenkins / Hudson, just do the following steps.

  • create a free-style software job in Jenkins / Hudson
  • under the 'Advanced Project Options' select 'Use custom workspace' and enter the directory containing the Ant build script
  • Select 'Invoke Ant' under the 'Build' section of the job
  • open the 'Advanced…' options of the Ant build and add the following 3 properties
    • source_dir=<path to the source pdf files>
    • reference_dir=<path to the reference pdf files>
    • libraries_dir=<path to the directory containing all required libraries>
  • under the 'Post-build Actions' select 'Publish JUnit test result report' and enter 'junit-reports/*.xml'
  • save the configuration and run the new job

3. Configuration

The behavior and the precision of i-net PDFC can be specified using the configuration properties. These configuration properties of i-net PDFC are included in the file config.xml.

After the installation the installation folder contains the config.xml with default values. You can change the default values ​​by editing this file. If it does not exist, then i-net PDFC uses the default values.

Note: The graphical user interface uses the file .pdfc in the current users home directory instead of the config.xml file in the installation directory. You can edit this file the same way to perform some special fine-tuning but this is not necessary in most of the cases.

3.1 Logging and Results

With the following properties it is possible to configure the output of i-net PDFC and the logging.

Property Name Description
CREATE_DIFFIMAGES Specifies if a PNG image with the marked difference will be created for each pair of pages that contains differences. Possible values are: false, first, second and true - this creates the difference image for none, the first, the second or both files. The default value is: false.
CREATE_ORIGIMAGES Specifies if a PNG image with the original content will be created for each compared page. The default value is: false.
CREATE_XORIMAGES Creates an (negated) XOR image for any pair of pages with differences. The image will be stored as a PNG in the differences directory of the current comparison. If CREATE_DIFFIMAGES is enabled as well, the XOR image will be drawn onto the image created by CREATE_DIFFIMAGES between the two actual page images The default value is: false.
IMAGE_SCALE_FACTOR Defines a scale factor for the generated images (original and difference images). The default is 1, i.e. no scaling. The default value is: 1.
LOG_FILE Specifies the file where logged information is to be stored. If a file is specified, the logging is written to the file, otherwise the logging is written to the console. Default is empty, logging to the console.
LOG_LEVEL Specifies the Logging Level. Available values: OFF, ERROR, WARN, INFO, ALL. The default value is: INFO.
OFF switches the output completely off.
ERROR logs error messages.
WARN contains all the messages from ERROR-Level and additionally informs about the irregularities during the execution.
INFO contains all the messages from WARN-Level and additionally describes settings and environment attributes.
ALL is used to display the maximal information during the PDFC execution including any debug info.
MAX_ERRORS_PER_FILE Specifies the maximum number of errors that can occur before the comparison will be canceled for the current PDF file. The default value is: 100.
GENERATE_REPORT Specifies whether a result report should be generated for the comparison. The report will have the name PDFC_Result_<first file name>.pdf and will be saved in the difference image directory. The default value is: false

3.2 Comparison modes

Since i-net PDFC 2.0, the standard comparison mode does not compare page by page individually but rather treats the documents as continuous entities. It is however possible to enable a “strict mode” which checks each element on each individual page for its counterpart in the other document.

The usage depends on the files you're comparing and their origin. If you want to check whether a different PDF export or PDF printer delivers absolutely equal results, the strict mode is your choice. In case you simply have two versions of the same document and want to find the changes, the default, continuous, non-strict mode should be preferred.

Property Name Description
CONTINUOUS_COMPARE This flag enables the continuous compare mode. If set to false, the content is compared page by page. The default value is: true.

3.3 Strict mode configuration

The strict mode features several modules which define what will be compared and so-called normalizers which make it possible to ignore differences which are not important.

3.3.1 Modules

Modules are various ways of comparing PDF files. Each such operation examines the differences in similar elements in two files and if this difference is greater than the tolerance level, it is logged as a difference. The tolerance level and the modules itself can be configured in the file config.xml. If some module is not set in the configuration entry MODULES, the compare operation will not be executed.

Property Name Description
MODULES Specifies a comma-separated list of modules that will be executed for each page.

The following modules are available:

MODULE_PAGEPROPERTIES

This module compares page properties (page number, rotation, width, height and aspect ratio).

Property Name Description
TOLERANCE_PAGE_LEFTCORNER Specifies the maximum number of pixels that the left or top margin of a page can differ (the upper left corner of all elements) before it is viewed as a difference. The default value is: 3.
TOLERANCE_PAGE_RATIO Specifies the tolerance for the aspect ratio of the PDF page. The default value is: 0.1.
TOLERANCE_PAGE_SIZE Specifies the maximum number of pixels that the width or height of a page can differ before it is viewed as a difference. The default value is: 2.

MODULE_IMAGE

This module compares the position of images, size and their content.

Property Name Description
TOLERANCE_IMAGE_DISTANCE Specifies the maximum number of pixels that the position of an image can differ before it is viewed as a difference. The default value is: 3.
TOLERANCE_IMAGE_PIXEL_VALUE Specifies the highest allowed discrepancy of pixel values, in percent, before it is viewed as a difference. The range of this property is [0,1]. The default value is: 0.05.
TOLERANCE_IMAGE_SIZE Specifies the maximum difference in percent that the area taken by an image may differ before it is viewed as a difference. The default value is: 0.1.
USE_PIXEL_MEDIUM_VALUE This property of the #MODULE_IMAGE specifies whether i-net PDFC should compare the medium values instead of single-pixel values. The default value is: true.

MODULE_LINES

This module compares the shape positions and their properties.

Property Name Description
TOLERANCE_BOX_ROUND_EDGES Specifies the maximum number of pixels that a control point of a quadratic Bézier curve * may differ in total before it is viewed as a difference. The default value is: 3.
TOLERANCE_LINE_POSITION Specifies the maximum number of pixels that the position of a line or curves can differ per axis before it is viewed as a difference. The default value is: 3.
TOLERANCE_LINE_SIZE Specifies the maximum number of pixels that the length of a line can differ in total before it is viewed as a difference. The default value is: 2.
TOLERANCE_LINE_STYLE Specifies if a different line dash pattern, describing dashes and gaps used to stroke paths will be viewed as a difference. The default value is: false.
TOLERANCE_LINE_THICKNESS Specifies the maximum difference in stroke thickness of two lines or curves (measured in pt) before it is viewed as a difference. The default value is: 1.
TOLERANCE_UNDERLINE_LENGTH Specifies the maximum difference in percent which the length of underlines may differ before it is viewed as a difference. The default value is: 0.1.
TOLERANCE_COLOR Defines the maximum color difference per RGB or HSB channel for all paints. The value is the absolute difference for HSB and absolute * 255 for RGB. This value is used by the text module as well. The default value is: 0.01 which is 1%

MODULE_TEXT_WORDORDER

This module splits the PDF texts into words and compares these words. It can identify inserted or removed words within the complete text flow of a page. Since only the textual content is compared the absolute location of each word is ignored by default. To verifiy the location of each word as well, use the configuration property TOLERANCE_TEXT_LOCATION.

Property Name Description
TOLERANCE_TEXT_LOCATION Defines the allowed offset for matched words in pixels. Any value smaller than zero disables this feature. Set this value to 0 if you want to verify the text content of a pair of PDF files is absolutely identical, even in its location. The default value is -1 which is 'disabled'.
TEXT_ALIGN_RATIO This value is the maximum allowed y-jitter for the text line identification. It is relative to the text height of the respective line. This value can be used to compensate rounding errors of different PDF generators. The default value is 0.15
COMPARE_TEXT_STYLES If set to 'true' the text styles of all matched words will be checked as well. This will compare the font name, size, style and color. The default value is 'true'
TOLERANCE_TEXT_SIZE This property defines the tolerated difference in the text size as a ratio. It's only relevant in case COMPARE_TEXT_STYLES is set to true. The default value is 0.05
TOLERANCE_COLOR Defines the maximum color difference per RGB or HSB channel for all paints. The value is the absolute difference for HSB and absolute * 255 for RGB. This value is used by the line module as well. The default value is: 0.01 which is 1%

3.3.2 Normalizers

Normalizers modify the content of PDF file before comparison is started. This simplifies the comparison by reducing it to only comparing elements which are important.

Property Name Description
NORMALIZERS Specifies a comma-separated list of normalizers that will be executed before and after each page.

The following normalizers are available:

NORMALIZER_CLIP

This normalizer removes the elements laying outside of clip regions, i.e. which are not visible.

NORMALIZER_MARGIN

This normalizer changes the coordinates of elements to take into account the differences in margin values.

CHART_REMOVAL

This normalizer attempts to detect charts and removes them. This can be useful when comparing reports generated by Crystal Reports and i-net Clear Reports, since charts look entirely different in these two products.

Property Name Description
CHART_DENSITY_THRESHOLD (value must be a Double) density-of-shapes threshold for detecting a chart: ((number of shapes)³ / area size).
CHART_REMOVAL_MARGIN (value must be a Double) percent of shape height to use as margin for removing PDF elements above and below detected charts.
CHART_REMOVAL_MOD Specifies the chart detector mode. Available values: CHART_REMOVAL_ALWAYS and CHART_REMOVAL_AUTO - default.

HEADER_FOOTER

This filter can be used to exclude the page header and footer from the comparison. Unlike in continuous comparison mode, this normalizer can only be used with fixed sizes for headers and footers (as opposed to automatic detection which is possible in the default non-strict mode).

Property Name Description
FIXED_HEADER_SIZE Specifies the size of the header in pixes. Default is -1 for inactive
FIXED_FOOTER_SIZE Specifies the size of the header in pixes. Default is -1 for inactive

INVISIBLE_ELEMENTS

If you get any differences which refer to element which are not visible in the printed PDF document, please enable this normalizer. It removes all transparent or white drawing elements and all images with a zero width or height. This normalizer has an additional optional setting:

Property Name Description
INVISIBLEELEMENTS_HIDE_ROTATION In case the INVISIBLEELEMENTS filter is active, this property advises the filter to hide rotated text as well. The intention is that rotated text is in most cases only used for printing marks or water marking which should not be compared as it is not part of the actual content. Default is false

REGEXP

This filter allows to define certain strings or regular expression patterns. Each textual content in the compared files which is at least partially matched by one of the strings or patterns will be completely ignored in the comparison.

The patterns for this filter are defined in the property FILTER_PATTERNS. This property defines one pattern/string per line. Each line has the format:

FILTER_PATTERNS <pattern or string>|(regexp|text)|(active|inactive)

The default is an active, plain text pattern. In that case you can leave out the keywords text and active. Since the property defines patterns per line, any linebreaks to be matched have to be replaced by [[CR]]!

To use this filter it is necessary to add it to the property “CONTINUOUS_FILTERS”:

<entry key=“CONTINUOUS_FILTERS”>REGEXP</entry>

3.4 Continuous (non-strict) mode configuration

The continuous mode compares the PDF documents as if they were a stream of objects(words, images, lines etc.). To configure which of these elements you're interested in, you'll have to configure the CONTINUOUS_COMPARE_TYPES. In case of complex layouts which contain page headers and footers or a multi column layout, the usage of of additional filters is advised. These filter can be enabled by the CONTINUOUS_FILTERS property.

3.4.1 Filters

Filters are an optional feature for the continuous mode. They help to remove redundant elements from the comparison and to overcome the issue that PDFs may not contain any information about the original text layout. Please note the these filters may not be exactly correct in every single case. Finding the original layout of a document depends heavily on the content of these documents. The chance of correctly detecting a header rises with the number of pages available. So it's recommended to use the PDFC GUI when activating filters since the GUI allows you to review the result of each filter.

Filters can be activated by adding them to the CONTINUOUS_FILTERS property:

Property Name Description
CONTINUOUS_FILTERS Specifies a comma-separated list of filters that will be executed before the actual comparison.

The following filters are available:

HEADERFOOTER

This filter can be used to exclude the page header and footer from the comparison. It can be set to automatic mode or fixed header and footer sizes.

Property Name Description
FIXED_HEADER_SIZE Specifies the size of the header in pixes. Set to -1 to auto-detect the header. Default is -1
FIXED_FOOTER_SIZE Specifies the size of the footer in pixes. Set to -1 to auto-detect the header. Default is -1

MULTICOLUMN

The multi column filter does not exclude content from the comparison but rather restructures the object stream. This is required whenever the print order of the document does not represent the actual layout. E.G. if you have a two column layout, the text in the PDF file may be stored from top left to bottom right but this is not the way a reader would see the document. A reader would first read the complete left column and than the right one. The multi column filter helps to transform the content of a file from print order to read order.

INVISIBLEELEMENTS

If you get any differences which refer to element which are not visible in the printed PDF document, please enable this filter. It removes all transparent or white drawing elements and all images with a zero width or height. This filter has an additional optional setting:

Property Name Description
INVISIBLEELEMENTS_HIDE_ROTATION In case the INVISIBLEELEMENTS filter is active, this property advises the filter to hide rotated text as well. The intention is that rotated text is in most cases only used for printing marks or water marking which should not be compared as it is not part of the actual content. Default is false

REGEXP

This filter allows to define certain strings or regular expression patterns. Each textual content in the compared files which is at least partially matched by one of the strings or patterns will be completely ignored in the comparison.

The patterns for this filter are defined in the property FILTER_PATTERNS. This property defines one pattern/string per line. Each line has the format:

FILTER_PATTERNS <pattern or string>|(regexp|text)|(active|inactive)

The default is an active, plain text pattern. In that case you can leave out the keywords text and active. Since the property defines patterns per line, any linebreaks to be matched have to be replaced by [[CR]]!

3.4.2 Compared types

The continuous compare mode distinguishes between three types of content: text words, lines / shapes and images. Each of theses types can be excluded from the comparison.

Compared types can be included or excluded by CONTINUOUS_COMPARE_TYPES property:

Property Name Description
CONTINUOUS_COMPARE_TYPES Specifies a comma-separated list of types that will be included in the comparison. Default is 'TEXT, LINE, IMAGE'

TEXT

Includes all text elements like words, numbers, punctuation and list items. The text comparison can be modified using the following properties:

Property Name Description
TEXT_ALIGN_RATIO This value is the maximum allowed y-jitter for the text line identification. It is relative to the text height of the respective line. This value can be used to compensate rounding errors of different PDF generators. The default value is 0.15
COMPARE_TEXT_STYLES If set to 'true' the text styles of all matched words will be checked as well. This will compare the font name, size, style and color. The default value is 'true'
TOLERANCE_TEXT_SIZE This property defines the tolerated difference in the text size as a ratio. It's only relevant in case COMPARE_TEXT_STYLES is set to true. The default value is 0.05
TOLERANCE_COLOR Defines the maximum color difference per RGB or HSB channel for all paints. The value is the absolute difference for HSB and absolute * 255 for RGB. This value is used by the line comparison as well. The default value is: 0.01 which is 1%

LINE

This value includes all graphical elements except images. The line and shape comparison can be modified using the following properties:

Property Name Description
COMPARE_LINE_STYLES If set to 'true', the styles of all matched lines and shapes will be checked as well. This will compare the color, stroke and thickness of all lines. The default value is 'true'
TOLERANCE_LINE_POSITION Specifies the maximum number of pixels that the position of a line or curves can differ per axis before it is viewed as a difference. The default value is: 3.
TOLERANCE_LINE_SIZE Specifies the maximum number of pixels that the length of a line can differ in total before it is viewed as a difference. The default value is: 2.
TOLERANCE_LINE_THICKNESS Specifies the maximum difference in stroke thickness of two lines or curves (measured in pt) before it is viewed as a difference. The default value is: 1.
TOLERANCE_COLOR Defines the maximum color difference per RGB or HSB channel for all paints. The value is the absolute difference for HSB and absolute * 255 for RGB. This value is used by the text comparison as well. The default value is: 0.01 which is 1%

IMAGE

This value includes all images. Note that comparing images may have a notable impact on your performance. The image comparison can be modified using the following properties:

Property Name Description
TOLERANCE_IMAGE_DISTANCE Specifies the maximum number of pixels that the position of an image can differ before it is viewed as a difference. The default value is: 3.
TOLERANCE_IMAGE_PIXEL_VALUE Specifies the maximal allowed discrepancy of pixel values (Double) before it is viewed as a difference. The range of this property is [0,1]. The default value is: 0.05.
TOLERANCE_IMAGE_SIZE Specifies the maximum difference in percent that the area spanned by an image may differ before it is viewed as a difference. The default value is: 0.1.
USE_PIXEL_MEDIUM_VALUE This property of the MODULE_IMAGE specifies, if i-net PDFC should compare the medium values instead of single-pixel values. The default value is: true.

3.5 Tolerance Values

With the tolerance values it is possible to specify at which value differences between the PDF files will be ignored. This makes it possible to ignore minor differences. Note that for the most part, you will normally leave the tolerance values at their defaults. In section Modules you can find the available tolerance values for each module. They can be set in the configuration file config.xml.

3.6 Page Image Cache

This property specifies whether page images will be saved as temporary PNG files on the hard disk or in memory.

Property Name Description
CACHE_FOR_PAGE_IMAGES Specifies a type of temporary storage of page images. These images can be used to show original pages of the compared PDF files (CREATE_ORIGIMAGES) and/or to show the marked differences of the compared pages (CREATE_DIFFIMAGES). This option is effective only if at least one of the options CREATE_ORIGIMAGES and CREATE_DIFFIMAGES is set. If “MEMORY” or “HARD_DISK” is set, the page images will be created during comparison. If “NONE” is set, the page images are created in an additional runthrough after comparison is done. If “HARD_DISK” Cache is used, the images will be saved as PNG files in the sub folders of folders containing the compared PDF files.

4. Limitations

  • PDF documents with restricted access permissions are partly supported. The PDF documents, having such restrictions, are encrypted. The i-net PDFC parser can read PDF files, if they are encrypted with the standard algorithm, based on RC4. If the revision is not 3 or if the AES encryption, instead of RC4, was used, then such documents can not be read.
  • i-net PDFC does not read annotations, interactive forms and signature fields. These elements will be ignored by the parser.

If these or some other not implemented features are important for you, please contact our support team and send us two PDF files that you'd like to compare with i-net PDFC.

5. Changes

v3.0.263 (February 28, 2014)

  • ArrayIndexOutOfBoundsException occurred.
  • If the .NET API of i-net PDFC was used then the following exception occurred: “A generic error occurred in GDI+”.

v3.0.256 (February 21, 2014)

  • NullPointerException occurred.

v3.0.254 (February 19, 2014)

  • ClassCastException occurred in strict comparison mode on PDF control elements.
  • The following exception occurred with the .NET edition: cli.System.Runtime.InteropServices.ExternalException: A generic error occurred in GDI+.

v3.0.246 (February 11, 2014)

  • Version 3.0.235 was delivered as a beta version.
  • Activated / non activated checkbox was not detected as difference.

v3.0.235 (January 31, 2014)

  • Files with upper case suffix were not compared in batch compare mode.
  • Specific text differences were not found.
  • Chinese, Japanese and Korean characters have not been rendered correctly and an ArrayIndexOutOfBoundException has occurred if the PDF files containing such characters.
  • ProfileDataException with invalid ICC profile occurred.

v3.0.214 (January 10, 2014)

  • New features in the i-net PDFC GUI:
    • Export the comparison result in a PDF file or print it on a printer
    • Display annotations in the PDF files
    • Search within the PDF file(s) is possible
    • Continuous Zoom of the PDF files is possible
    • Meta Data added to the visibilities filter
    • Tabs layout has been redesigned
    • Tabs Search, Annotations, Export / Print added
    • Comparison result view:
      • With a rigth click on a difference marker it is possible to get detailled information about the difference, copy the text, ignore selected text, ignore selected difference
    • Export/Import of the configuration, useful to use it with i-net PDFC command line / API
  • It is now possible to save the comparison result in PDF file using -p command line parameter
  • A color difference tolerance for text and shapes can now be defined using the configuration property TOLERANCE_COLOR
  • Legacy API deprecated, will be removed in Version 4.0
  • Batchrunner, IReportResult, IPageResult, IDifference is now deprecated
  • Batchrunner.setLoader is not used anymore, the new threading architecture requires full control over the parser process
  • Complete new API with extended capabilities
  • New Filter to normalize visually equals characters added. This is useful especially for ORC or PDF-printer generated PDF documents
  • PDFParseException: Stream ended inappropriately occurred
  • Improved Space recognition for the 'Text by pattern' Filter
  • PS commands gt, ge, lt and le supported
  • Support for special type of Adobe-encoded Jpeg images
  • Default language for the comparison report was German. Now it is English.

v2.5.155 (November 12, 2013)

  • Text clipping will now be only active if the filter 'Invisible Elements' is in use.
  • Decryption of PDF files with user permissions access is extended to standard encryption filters with revision 4 and version 4. IllegalArgumentException: Parsen of encrypted pdf files is not supported occurred in this case.

v2.5.135 (October 23, 2013)

  • IllegalArgumentException: Unknown encoding: StandardEncoding occurred.

v2.5.130 (October 18, 2013)

  • UnsupportedOperationException: ps command: add
  • Solid red boxes has been displayed over images in the PDF file.

v2.5.113 (October 01, 2013)

  • “PDFParseException: Unsupported shader type: 4” occurred.

v2.5.98 (September 16, 2013)

  • UnsupportedClassVersionError occurred if Java API or command line version was used with Java 6.

v2.5.92 (September 10, 2013)

  • 'Illegal Capacity' exception occurred randomly for pages with a large amount of lines.

v2.5.78 (August 27, 2013)

  • CompareTwoFilesWithCustomHandler - Differences directory was deleted after comparison.
  • NullPointerException occurred, if a required value in the font description was missing.

v2.5.64 (August 13, 2013)

  • Java and .Net edition of i-net PDFC installed with the same setup.
  • java.awt.color.CMMException: Invalid profile data occurred.
  • Difference description contains wrong page number if there is a text replacement with equal bounds but on different pages.
  • File name in the difference summary on the console was limited.
  • Text concatenation error occurred sometimes near punctuation marks.

v2.5.7 (June 17, 2013)

  • new text filter which accepts plain text and regular expressions to exclude certain text from the comparison
  • major performance improvements for the internal and GUI rendering process
  • report can now be generated with the command line tool as well
  • several fixes for embedded font, especially Type1C
  • GUI: cycle through differences the with left and right cursor keys
  • improved scrolling, doesn't jump when zooming anymore
  • the gui now logs to a file

v2.2.14 (April 15, 2013)

  • Performance optimization in parser.
  • Improvements in memory usage to prevent OutOfMemory errors.
  • It is now supported to scroll with the left and rigth key through the differences.
  • The GUI executable now accepts two parameters which will preset the first and second file of the GUI.
  • CheckBoxes in PDF files are now compared.
  • Following exceptions fixed:
    • UnsupportedOperationException: ps command dup
    • RuntimeException: invalid reference: null
    • IllegalArgumentException: Unknown encoding: SymbolSetEncoding
    • ArrayIndexOutOfBoundsException
    • BufferUnderflowException
    • PDFParseException: Data format exception:incorrect header check
  • It was not possible to compare PDF files on the command line using runPDFC.bat
  • Scrolling behaviour in the GUI was wrong if the compared PDF files had different lengths.

v2.1.38 (January 18, 2013)

  • PDFParseException: Unknown command: Qq
  • Differences in watermark of PDF files were not highlighted
  • Various bugs in the GUI were fixed
  • Comparison report was improved

v2.1.7 (December 18, 2012)

  • EXPORT of a comparison report as PDF
  • API class IDifference has a new type “TYPE_STYLE_MODIFIED” to get the distinguish between content and style modifications
  • NullPointerException in PDFFontEncoding occurred
  • PDF parser improvements for:
    • type 6 and 7 shaders are now supported
    • support for embedded CIDMaps
    • unicode mapping to character arrays
    • improved support of CMYK colors

v2.0.161 (November 09, 2012)

  • Command line parameter -generateinfo added
  • BufferUnderflowException solved
  • IllegalStateException: “invalid dimension of return values” solved
  • French accents are now displayed correctly
  • Content of the compared PDF files were not displayed in the i-net PDFC GUI for some landscape pages

v2.0.143 (October 22, 2012)

  • Fix for scaled font sizes and font colors
  • Fix for Type3-Fonts with embedded images
  • Fix for the PDF 'carriage return' command
  • Optimization of the Multi-Column filter for single column scenarios - produces less columns boxes
  • Image comparison for images with different internal resolution is now possible

v2.0 (August 27, 2012)

  • Streamlined licensing process, include offline mode
  • Invisible elements filter
  • Performance optimization in parser and result view
  • Several bug fixes to conform strict and loose compare mode

v2.0 Beta (July 31, 2012)

  • Entirely new continuous comparison engine to find modifications in large documents. This engine focuses on content changes rather than on exact location of each element. It's configurable by several new configuration properties.
  • Whole new comparison GUI for both comparison modes.
  • New result visualizations in the GUI for both modes
  • Visual configuration
  • Support for PDF form elements added
  • New paged mode normalizer 'HEADER_FOOTER' to exclude fixed header and footer areas
  • New module property 'COMPARE_TEXT_STYLES' to compare the text styles as well
  • New module property 'TEXT_ALIGN_RATIO' to better compensate alignment differences due to different PDF generators
  • New values for the 'CREATE_DIFFIMAGES' property to create only images for the left or right page
  • Support for Colored Tiling Patterns added
  • Fix for 'div zero' error in pattern type 1
  • Fix for PDFs with corrupt kerning data
  • Fix for ParserException due to incorrect CCITT image decoding

v1.15 (March 16, 2012)

  • BufferUnderflowException occurred

v1.14 (February 29, 2012)

  • UnsupportedOperationException: ps command: roll
  • New Image scaling property 'IMAGE_SCALE_FACTOR' to scale all image output
  • New text module property 'TOLERANCE_TEXT_LOCATION' to verify text identity

v1.13 (February 10, 2012)

  • PDFParseException: Unsupported function type: 4

v1.12 (February 02, 2012)

  • Bug: ArrayIndexOutOfBoundsException occurred during processing of some Type1C fonts.
  • Bug: ArrayIndexOutOfBoundsException occurred by processing masked images.

v1.11 (November 28, 2011)

  • Bug: “IllegalArgumentException: space ranges not defined” occurred during the file comparison.

v1.10 (November 21, 2011)

  • API enhancements: Classes in package “diffimage” added.
  • Command line parameter -x and -r added.
  • Bug: IllegalArgumentException occurred during the file comparison.
  • Bug: Endless loop occurred during the file comparison.
  • The height and width of the text difference mark box were not correct calculated for large font size.

v1.09 (July 29, 2011)

  • Differences in several PDF/A documents were not found.
  • Equal text passages marked as different.
  • NumberFormatException: For input string: “xxx” occurred.

v1.08 (June 10, 2011)

  • Bug in ASCIIHexDecode was fixed. Some texts encoded with this encoding could not be decoded.
  • Text scaling was corrected in order to calculate the proper height of text-difference marker.
  • The international format for date-time is now used for error messages.
  • The batch file runPDFC.bat (or shell script runPDFC.sh) can now be launched from any directory.
  • The log level “OFF” produces no output.
  • The log level list is reduced. The following log levels can be used: OFF, ERROR, WARN, INFO, ALL.
  • Differences on single PDF pages not found because of an error while reading the PDF comments in the file.

v1.07 (May 09, 2011)

  • API revised
    • Properties “USE_HD_CACHE_FOR_PAGE_IMAGES” and “USE_MEMORY_CACHE_FOR_PAGE_IMAGES” replaced with “CACHE_FOR_PAGE_IMAGES”.
  • API documentation added
  • Java code samples added
  • Documentation enhanced
  • The default encryption of pdf files is taken into account for additional elements such as functions, color spaces and hint tables.
  • Differences in PDF files were not found, because some characters from CFF fonts were not shown.
  • Character codes using MacRomanEncoding could not be compared.
  • Not embedded CID-fonts could not be used to build the page difference images.
  • Image comparision has been improved through corrected image cache.
  • Problems with PDF/A files occurred. If the key-length for default encryption is not set, it will be defined on the basis of security handler version.
  • Java 5 supported.

v1.06 (Feb 23, 2011)

  • Rectangles in difference images sometimes appeared a few pixels below the point at which they should appear.

v1.05 (Jan 26, 2011)

  • No differences were found between PDF files because of error by font reading.
  • ClassCastException occurred during the comparison of PDF files.
  • It is now supported to compare PDF files created with FastReports.
  • The width array length during string processing is now limited by the string length to avoid IndexOutOfBoundsExceptions.

v1.04 (Dec 09, 2010)

  • Using hard disk or memory cache is now possible.
  • Improved image comparison: the pixel values of images are now compared too.

v1.03 (Nov 30, 2010)

  • Fixed a problem regarding the reading of compressed objects and xref streams.
  • Fixed a bug with default width for CID-fonts type 0.
  • Fixed a bug with ToUnicode font map ranges for CID-fonts.

v1.02 (Jul 08, 2010)

  • Fixed a bug with inlined DCT-encoded images in a PDF.
  • Fixed a bug regarding Unicode special characters not being read correctly.
  • Fixed a problem regarding the reading of embedded True Type text.

v1.01 (Mar 23, 2010)

  • Improved chart detection and comparison.
  • Fixed problems when identical shapes occurred more than once.

v1.0 (Feb 25, 2010)

  • Initial release.

6. Support

If you have any questions or problems, please do not hesitate to contact pdfc@inetsoftware.de for technical support.

Copyright 2009 - 2014, i-net software GmbH. All rights reserved.