Every once in awhile we get a new project that really can challenge the norm. Recently we did a file conversion from a paper directory for a client that needed to get the entries into an excel data base so they could import them into a data base management structure for a project they were doing.
Sounds easy....just scan and run through OCR (optical character recognition) software....until we actually analysed the document and realized that the entries were not consistent in the data they included with some having four lines of data and some up to eight. Secondly the ability for the automated software to output the records directly to excel in an easy to manage way was less than perfect. Finally when the directory was cut apart the page sizes ended up uneven and this caused problems in the scanning so that the template used for conversion did not work accurately.
The solution was to run the images created by the scanning process...done at higher than normal dpi since the records had small fonts....through a middleware software product that had the ability to output each line as a separate field entry. The resulting listing of information was then manually coded as to the data field it related to and then we ran this through another routine to re-organize the data in the excel spreadsheets to group the records into properly organized columns of data.
The result of the process was the ability to capture over 20,000 records from about 800 pages of scanning in an orderly and fairly quick manner. The cost to the client was substantially less than they would have incurred with manual data entry and the process proved more accurate. Of course it was a lot less tedious as well.
Was this the run of the mill scanning project? No way, but it certainly shows that with some creativity and using a variety of tools it is possible to capture data from paper records for electronic use and do so cost effectively. For the client using a service to do the work helped on their payroll costs as well as avoided the need for them to invest in software for a one off job.
How have you used OCR to do some creative capture?