What do you do if someone shares a PDF with you and the PDF contains scanned images of text ? How do you get at that text if you want to copy and paste it, search it, or even edit it ? In short how do you liberate it ?
The Guardian Newspaper’s unsearchable PDFs …
Optical Character Recognition & Zamzar to the rescue
Fortunately Zamzar was able to step in and help – We used specialist OCR (Optical Character Recognition) technology to analyse the 30+ PDF files and produce readable, searchable text.
You can read more about our assistance in a story by the Guardian’s technology editor Charles Arthur : “BBC Pollard inquiry: why is it so hard to search the documents?”
OCR (Optical Character Recognition) technology has a reputation for being costly and difficult to use, so we are pleased to say that we’re currently working hard to make it available on the main Zamzar site so that we can help to liberate more documents ! Do let us know if this might be of interest to you.
