See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/272486831 The use of Hyperspectral Analysis for Ink Identification in Handwritten Documents Conference Paper · September 2014 DOI: 10.1109/CCST.2014.6986980 CITATIONS READS 11 286 5 authors, including: Aythami Morales Miguel A. Ferrer Universidad Autónoma de Madrid Universidad de Las Palmas de Gran Canaria 120 PUBLICATIONS 1,254 CITATIONS 281 PUBLICATIONS 2,843 CITATIONS SEE PROFILE SEE PROFILE Moises Diaz Cristina Carmona-Duarte Universidad de Las Palmas de Gran Canaria Universidad de Las Palmas de Gran Canaria 70 PUBLICATIONS 766 CITATIONS 44 PUBLICATIONS 181 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: SEMI AUTOMATIG SYSTEM SIGNATURE RECOGNITION View project sclera biometrics View project All content following this page was uploaded by Aythami Morales on 13 April 2015. The user has requested enhancement of the downloaded file. SEE PROFILE The use of Hyperspectral Analysis for Ink Identification in Handwritten Documents Aythami Morales1, Miguel A. Ferrer1, Moises Diaz-Cabrera1, Cristina Carmona1, Gordon L. Thomas² 1 Instituto Universitario para el Desarrollo Tecnológico y la Innovación en Comunicaciones (IDeTIC) Universidad de Las Palmas de Gran Canaria Campus de Tafira s/n, E35017, Las Palmas de Gran Canaria, Spain amorales@idetic.eu, mferrer@idetic.eu, mdiaz@idetic.eu, ccarmona@idetic.eu, ²Independent consultant and author, UK, gordonl.thomas@virgin.net Abstract - Hyperspectral analysis is employed in many different areas, such as medicine, environmental studies, security and forensics. Focusing on law enforcement, ink discrimination has become an important factor for the detection of fraudulent documents. This paper proposes an approach for ink analysis in handwritten documents and pen verification using hyperspectral analysis and Least Square SVM classification. The proposed method obtains immediate results in a non-contact way from the document or test sample. The first step is to determine the best possible lighting conditions. Then a detailed study is made of components and properties of the ink and pens used. This paper proposes a classification method based on the hyperspectral characteristics of the ink derived from its physical properties. Furthermore, a database of hyperspectral curves of several types of inks is made, which is used to obtain the characteristics of different inks. The proposed method for automated ink type identification is tested using 25 different pens and more than 1000 samples. The achieved discrimination between types of ink was 87.5%. The experimental protocol includes three different scenarios. Keywords—ink identification, pen verifier, analysis, handwritten document analysis, forensics. I. hyperspectral INTRODUCTION The analysis of inks, particularly in Document Examination, is of great importance. The ink type and temporal factors can be important evidence in criminal prosecutions [1]. There are many document analysis techniques, perhaps as many as for forging them. Documentoscopy is the area of knowledge that it is aimed at determining the authenticity of a document, its authorship, structure and content [2], a document being defined as any medium capable of hosting graphical content, either printed or handwritten. According to the Spanish Directorate General of Police, in the Document Examination section of the Forensic Science Department, the following material is available to study a document’s authorship: 1. a binocular microscope or magnification loupe for examining finer details of the various documents. The system usually incorporates a photographic camera. 2. An infrared microscope for the spatial analysis of inks. This allows the optical removing of certain pigmented inks, therefore permitting the visualization of traces produced by others. 3. Projector profiles for precision measurements. 4. A videospectral comparator for the optical analysis of the ink reflection under different lighting conditions and wavelengths (UV, IR and green and blue filters of different nm.). 5. Fluotest, for luminescence observation under UV of different wavelengths. As can be seen, all methods used are non-contact in order not to interfere with the evidence. There are other non-contact techniques, such as: colour analysis, absorption spectrum analysis, examining by ultraviolet radiation, infrared radiation detection or infrared absorption. The contact techniques are chromatography (either thin or high performance liquid layer) and the use of test chemicals [3][4]. There are also more technologically advanced techniques that require more complex instrumentation, for instance, specialised spectroscopic techniques for studying the interaction between electromagnetic radiation and the ink. These techniques include FTIR (Fourier Transform InfraRed), Raman Spectroscopy, Electrophoresis and Mass Spectroscopy. This paper is focused on the application of optical spectrometry. As is well-known, each natural element has properties of absorption and reflection depending on its atomic structure. When an ink is irradiated with white light, it absorbs some wavelengths and reflects others. As the white light contains energy at several wavelengths, the spectral response of the deposited ink can be used to characterise it and extract useful information (e.g. determine if two different samples have been made with the same ink). So as not to bias the measurements, the light source must contain the same amount of energy in all the radiated wavelengths. Thus, it is possible to infer the composition of an ink given the dispersion of the light reflected by it. In recent decades, the industry has commercialized devices for spectroscopy analysis (e.g. Spectrum FORAM 685-2 by Foster and Freeman or HSI Examiner 100 QD by ChemImage) In general, these devices provide detailed microscopic analysis alongside a spectral curve which can be used to compare inks, papers, holograms and other forms of image. The main drawback of such devices is the high price which is not affordable for some small forensic offices. The Spectrum FORAM 685-2 works as follows: white light is used to illuminate the questioned document while a narrow bandpass optical filter is placed in front of a camera in order to analyze the ink sample at the selected wavelength. This is a multispectral device, i.e., a tens of bands can be analyzed. The device allows visualization of the strokes at different microscopic magnifications. This way it is possible to analyze how the ink is deposited on the paper and how it fills the interstices between the fibres. This can give revealing clues about the fluidity and viscosity of the ink, which, supplemented with other tests, can help to reach a deeper knowledge of the document authorship. Moreover, if the spectral responses of the inks are significantly different, it could mean they are different or at least a sufficient time interval has elapsed between the imprints to justify discrepancies. Another more current system used for document analysis is the HSI Examiner 100 QD produced by ChemImage [5] [6], which provide hardware and software for many chemical and biological applications such as pathology, forensic studies, pharmaceutical studies and threat detection. The HSI Examiner 100 QD is a hyperspectral imaging system and software package specifically designed for forensic document examination. This platform provides, according to its manufacturer, the most sensitive commercially available device for ink discrimination purposes. Again the price is its major drawback. The aim of the work reported here is to develop an automatic ink classifier based on optical properties of the ink The spectral range is from 400 nm to 1100 nm which includes the near infrared. The proposed algorithm runs in real-time, giving a probability of the same line being written with the same pen, and providing additional evidence of a document being fraudulent or not. The block diagram of the system can be summarized as: 1. Acquisition, which include the hyperspectral camera, the light and “box” for document acquisition. 2. Hyperspectral image processing to reduce the noise and obtain the ink hyperspectral curve at different locations 3. Characterization of ink hyperspectral reflection and classifier design. 4. Database build and test performed to validate the final system. II. ACQUISITION DEVICE To obtain the hyperspectral image we use a spectrograph in conjunction with a CCD camera. The system scans a line image, obtaining the spectral response at each point of the line. The gain and exposure parameters are setup to increase the contrast between bands. The spectrograph used in this paper is the ImSpector V10E, and the camera the model TM-1327GE which has a Figure 1. Spectral response of different bulbs tested. resolution of 1392×1024 pixels. The vertical axis represents the wavelength, so we have 1024 spectral bands between 400 and 1100nm. The spatial resolution depends on the angle of view of the camera lens and the focus distance. It is possible to acquire up to 30 frames per second. For illumination, after testing different lamps (fluorescent, halogen, CFL, LEDs, etc.) looking for white light emission as uniform as possible between 400 and 1100nm, we chose the Philips EcoClassic bulb and the OSRAM bulb, each emitting at 100W. Figure 1 shows the spectral radiation of the chosen bulbs alongside the other two tested bulbs, when their light is projected over a white sheet. The data acquisition is made under controlled conditions inside a box of width 30 cm, depth 30 cm and 40 cm high. The interior of the box is painted in white with a painting material that also reflects in the infrared. There are two apertures in the box, one for the camera lens and another one for document. Each is covered with a black felt curtain to avoid external light interference. Figure 2 shows the box, the camera and the curtain configuration. The camera gain and exposure time were experimentally fixed to 1.96 and 0 respectively. III. HYPERSPECTRAL IMAGE PROCESSING After introducing a paper with ink lines drawn upon it, the reflection of the lines is broken down into different wavelengths, which are subsequently projected onto the CCD detector allowing the creation of a two-dimensional image of the reflection, as shown in Figure 2, where one axis represents the spatial information and other spectral information. In this grayscale image, the white tones indicate high levels of reflection while the dark shades indicate less reflection. Sheet with 3 ink samples Hyperspectral image Ink 3 Wavelength (nm) Angle a) Ink 2 Ink 1 Wavelength Scanned line Hyperspectral curves Background removal b) c) d) Figure 2. Procedure to obtain the ink hyperspectral curve along a line in a document. a) sample document; b) hyperspectral image; c) hyperspectral image processed; and d) hyperspectral curves. The image processing is performed in the following steps: 1. 2. 3. The sample document is introduced into the closed enclosure and optimally illuminated. Focus, gain values and camera exposure are set. Inside the box there is a mark which shows the line analyzed by the hyperspectral camera, see figure 2. The hyperspectral image of the line analyzed is obtained by the hyperspectral device to obtain the image at Figure 2.b. The hyperspectral image is processed in order to remove the background noise. This is conducted by subtracting the hyperspectral image of a white sheet from the document hyperspectral image. This is to equalize the effect of non-flat spectral illumination. We thus obtain figure 2bc. 4. 5. The hyperspectral curves of ink pixels are extracted as follows: a. The line corresponding to wavelength equal to 800nm which is a maximum for ink reflection is extracted and derived (it corresponds to a row of the image matrix). b. The higher negative peaks of the derived line are the position (angle) of ink pixels. c. At each pixel position, the hyperspectral curve is obtained (column at angle position of the ink). The hyperspectral curve is smoothed by a moving averaging filter of length 21 pixels thus obtaining hyperspectral curves as shown at Figure 2.d. The curves characterize the ink composition and need to be parameterized in order to identify the ink. IV. DATABASE For the database, we have used 25 different pens of different ink types as follows: 7 different pens of viscous ink, 4 different pens of liquid ink, 7 different pens of gel ink and 7 different marker pens [7]. Two different databases have been built, the first for system design and the second for evaluation. A. Database for system design We start with lines drawn on paper with all the pens. We have used the same kind of sheet for all the documents: business paper of 80 g/m2. With each of the above described pens, we draw 50 lines and just after drawing (minimum time lapse) we work out the hyperspectral curve of the central pixel of each line. So, we obtained 50×25=1250 hyperspectral curves. An example of a document belonging to this database being placed in the box can be seen at figure 3 (upper). After a week, with the ink dried, the lines were scanned again, thus obtaining another 1250 hyperspectral curves. In total 2500 hyperspectral curves comprise the designed corpus. B. Database for validation The database for validation consists of 30 bank checks. Ten of them were written with just one pen, fictitiously, of course, because there was no intention to use them for bank transactions. Other 10 of them were written with a specific ink and afterwards forged with a different ink. The remaining 10 were written with one pen and fraudulently altered with a different pen with the same ink type. An example can be seen at figure 3 (lower) where the amount 900 is altered to 90,000 with another pen. within the context of statisticaal learning theory and structural risk minimization. Least Squares Support Vector Machines are VMs which lead to solutions of reformulations to standard SV the indefinite linear system ms generated within them. Robustness, sparseness, and weightings w can be imposed on LS-SVMs where needed. Wee apply a Bayesian framework with three levels of inference [99]. Figure 3. Upper: sample of document belonging to thhe database for system design being introduced into the box. Lower: examplle of an altered check belonging to the validation database. V. INK HYPERSCPECTRAL CURVE PARAM METRIZATION We represent each hyperspectral curve by b several features in order to enter it into the ink recognizeer. Two kinds of hyperspectral curve parameters have been developed: the first based on area and the second based on curve slope [8]. Prior to working out the parameters, the hyperspectral curve is divided uniformly into sections of Δ nm from 400nm to 1100nm and features based on area and slope s are obtained from each section. The area of each section is numerically calculated with the trapezoidal rule using: 900 · C 900 · 1 2 being the value of the hyperspectrall curve at 1 nm. For slope parameters, the derivative of o each section is approximated as: C 900 · 1 C 900 · 2 The area and slope based features of the hyperspectral curve are obtained by concatenating the parrameters of all the sections as follows: , | 0, 900 · 1600 3 , | 0, 900 · 1600 4 c both The ink feature vector is obtained by concatenating characteristics , . VI. CLASSIFIER The model we use to discriminate one innk from another is built using a Least Squares Support Vecttor Machine (LSSVM). Support Vector Machines (SVMs) arre frequently used The meta-parameters of thee LS-SVM model are the width of the Gaussian kernels σ and thhe regularization factor which are trained with parameter vecctors from the modelled ink as positive samples and other inks as negative samples. The regularization factor is taken as 30 and the Gaussian width σ parameter is optim mized as follows: the training sequence is randomly partitiioned into two equal subsets 30 times with the ,1 2. The LS-SVM is trained and Gaussian width w ,1 equal to T first subset logarithmically equally spacedd values between 10 and 10 . Each one of the T LS-SVM models m is tested with the second subset so as to obtain T Equual Error Rate ,1 measures. The Gaussian widthh σ of the model is obtained as σ= where . Finally, the ink model is obtained by training the LS-S SVM with the complete training sequence. This training procedure is employed to work out a LSSVM model per ink or pen, depending on the experiment, using its own training samples as positive vectors and training samples of other inks as negative vectors. To verify that a questioned ink vector correspoonds to a given ink model, the score of the questions ink is worked out with the LS-SVM model of the given ink. If the score is greater than the s threshold, it is accepted as the same. X VII. EXPERIMENTS Several sets of experimentss have been performed. The first were aimed at determining the ability of the device to distinguish between inks and between pens. The latter were designed to validate the schemee with the bank check database. Ink Classification – no tim me lapse: The first experimental session was addressed at workiing out the ink verification rate, i.e., the ability to distinguish among viscous, gel liquid and marker ink. With the 1250 samples of the design database collected just after writing (wiith the ink fresh), the classifier was trained with 30% of thee samples and tested with the remainder 70%. Table I shownn the results. It can be seen that the viscous and liquid inks arre the least stable while the gel ink is the most stable. The maarker is very different from the other pens, so it is not difficult to discriminate. Ink Classification – onee week time lapse: With the trained ink models, we testedd the samples acquired in the second session, i.e., the dried ink. The results can also be seen at Table I. Obviously, the peerformance is reduced except in the case of the viscous ink. This T is because the viscous ink dries very quickly and there is no real difference between first and second scanning. TABLE I. HIT RATIO FOR INK IDENTIFICATION NO TIME LAPSE Ink Viscous Gel Liquid Marker Viscous Gel Liquid Marker Fresh Dried TABLE II. Hit ratio (%) 85 % 95 % 75 % 95 % 85 % 90 % 70 % 85 % HIT RATIO FOR INK IDENTIFICATION AFTER ONE WEEK Ink Fresh Dried Viscous Gel Liquid Marker Viscous Gel Liquid Marker Hit ratio (%) 63 % 75 % 65 % 73 % 63 % 65 % 53 % 66 % Pen Classification – Same ink: The third experimental set investigates the ability of the scheme to discriminate between pens using the same ink. For the gel ink, we have 7 classes (the 7 pens) and 50×7=350 samples freshly inked. Again we trained with the 30% of the samples and test with the remainder 70%. The results can be seen at Table II which shows the difficulties of the scheme to distinguish among different pencils. Again, the gel achieves the best performance. Pen Classification – Same ink after one week: From the fourth experimental set, with the trained models of experiment 3, we work out the hit ratio to distinguish between the pens after the ink had dried. The results are also given at Table II. Forgery Detection: The last set of experiments is used to validate with the bank check database. The 20 altered checks are presented to the system. The written amounts are scanned. The task is to determine whether all the numbers were written with the same pen. This is conducted by training the classifier with samples of the first digit and testing with samples of the remaining digits. In all cases, the checks were scanned a week later, with the ink dried. The results are given at Table III. It can be seen that when the ink is different, all forgeries were detected. When the same ink is used in a different pen 80% of the alterations are found. No false alarms were detected by our scheme in our database. TABLE III HIT RATIO OF THE VALIDATION TEST WITH THE CHECKS Checks forged with: different ink Same ink, different pen No forged VIII. CONCLUSIONS This paper proposes a methodology to detect forgeries in handwritten documents. The proposal includes the device design. This is meant to decrease the scheme cost since the commercially available systems are generally expensive. The proposed scheme is based on hyperspectral ink physics parameterizing the hyperspectral curve of the ink pixels. The hyperspectral ink curve is modeled with a LS-SVM classifier. The validation experiments were performed with a database of altered bank checks to detect forgeries. The results are extremely encouraging. ACKNOWLEDGMENT This study was funded by the Spanish Government’s MCINN TEC2012-38630-C04-02 research project. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] View publication stats Forgery Detection rate 100 % 80% 0% Comisaría General de Policía Científica, Departamento de Documentoscopía. España. [Online]. Available: http://www.policia.es/org_central/cientifica/servicios/tp_docum_copia.ht ml. Accessed Sep. 22, 2014. Tony Roig, “Documentoscopía: Discriminación de tintas”. Blog El Investigador 2.0, Spain, Sep. 2009. [Online]. Available: http://policiasenlared.blogspot.com.es/2009/09/documentoscopiadiscriminacion-de.html. Accessed Sep. 22, 2014. Headwall photonics – Forensics applications, Fitchburg, Massachusetts, EE.UU. [Online], Available: Accessed http://www.headwallphotonics.com/applications#forensics. Sep. 22, 2014. ForensicXP: The next generation in questioned documents examination, Global Marketing & Research Inc, Nueva York, EE.UU. [Online], Available: http://arxmar.com/index-1.html. Accessed Sep. 22, 2014. ChemImage Corporation website, Pittsburgh, Pensilvania, EE.UU. [Online], Available: http://www.chemimage.com/. Accessed Sep. 22, 2014. ChemImage Corporation - The HSI Examiner 100 QD, Pittsburgh, Pensilvania, EE.UU. [Online]. Available: http://www.chemimage.com/products/instrumentation/examiner/100.asp x. Accessed Sep. 22, 2014. K. Franke, O. Bünnemeyer, and T. Sy, “Writer identification using ink texture Analysis”, in Proc. 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR), pp. 268–273, Canada, 2002. Miguel A. Ferrer, Aythami Morales and Alba Díaz, "An approach to SWIR Hyperspectral Hand Biometrics", Information Sciences, vol. 268, 2014, pp. 3-19. C. J. C. Burges, “A tutorial on support vector machines for pattern recognition”, Data Mining and Knowledge Discovery, vol. 2 (2), 1998, pp. 955-974.