forpix is a forensic program for identifying similar images that are no longer identical due to image manipulation. Hereinafter I will describe the technical background for the basic understanding of the need for such a program and how it works.
From image files or files in general you can create so-called cryptologic hash values, which represent a kind of fingerprint of the file. In practice, these values have the characteristic of being unique. Therefore, if a hash value for a given image is known, the image can be uniquely identified in a large amount of other images by the hash value. The advantage of this fully automated procedure is that the semantic perception of the image content by a human is not required. This methodology is an integral and fundamental component of an effective forensic investigation.
Due to the avalanche effect, which is a necessary feature of cryptologic hash functions, a minimum -for a human not to be recognized- change of the image causes a drastic change of the hash value. Although the original image and the manipulated image are almost identical, this will not apply to the hash values any more. Therefore the above mentioned application for identification is ineffective in the case of similar images.
A method was applied that resolves the ineffectiveness of cryptologic hash values. It uses the fact that an offender is interested to preserve certain image content. In some degree, this will preserve the contrast as well as the color and frequency distribution. The method provides three algorithms to generate robust hash values of the mentioned image features. In case of a manipulation of the image, the hash values change either not at all or only moderately similar to the degree of manipulation. By comparing the hash values of a known image with those of a large quantity of other images, similar images can now be recognized fully automated.