Archive for the ‘guardian’ Tag
What do you do if someone shares a PDF with you and the PDF contains scanned images of text ? How do you get at that text if you want to copy and paste it, search it, or even edit it ? In short how do you liberate it ?
The Guardian Newspaper’s unsearchable PDFs …
Last Friday (22nd February 2012) the Guardian newspaper found itself in just such an unenviable position after the BBC released a slew of PDF files (relating to an independent review of the BBC’s handling of the Jimmy Saville scandal) containing scanned, un-searchable text. Not exactly the most helpful format for journalists looking to make use of the files in a hurry !
Optical Character Recognition & Zamzar to the rescue
You can read more about our assistance in a story by the Guardian’s technology editor Charles Arthur : “BBC Pollard inquiry: why is it so hard to search the documents?“
OCR (Optical Character Recognition) technology has a reputation for being costly and difficult to use, so we are pleased to say that we’re currently working hard to make it available on the main Zamzar site so that we can help to liberate more documents ! Do let us know if this might be of interest to you.
The Guardian has just announced their list of the Top 100 websites for 2009 and we’re honoured that Zamzar has been recognised in the “Create/collaborate” section.
Zamzar is one of only 2 sites in that section which remain from the Guardian’s 2008 list (the other being NetVibes).
We’re excited about what 2010 has in store, but we wouldn’t be anywhere without you.
So a big thanks to all of you who use the service, tell your friends and colleagues about it and keep us busy with suggestions on how to improve it