And then what do we do? We retype the information into our own systems.
As long as we are doing this for our own personal use and not planning to sell the data, it is legal to do that. But it is incredibly painstaking work. Add to that all the members in one family who copy the same information and then retype it and we're looking at a lot of duplicated effort. There has got to be a better way—and there is!
Get The Picture
Here is a little trick that often works, if you're patient. If you skip a step, it won't work.
Use your digital camera to create editable copy. Turn off the flash, no matter how convinced you are that you need to use it.
Lay the original on a flat surface in a well-lighted area. It does not need to be bright light. Just avoid shadows. Any light that is comfortable for reading the original text is good enough for this method to work successfully.
Photograph each page separately, using a small to medium image size in terms of storage size. For the purpose of this project, I use the smallest image size my Canon will produce.
I set up shop and photograph all that I can. I begin by photographing the masthead of a newspaper. For books, I photograph the inside cover page and the page bearing the publishers information. That saves me from having to figure out where the image came from.
The Two-Software Solution
I use Adobe products and have never tested this process with anything else. But, make a copy of your photo, and give it a try with any product of your choice.
I begin with Adobe PhotoShop CS3. Open all of your photo images at one time. Under the Edit menu, you will find the Adobe PDF Presets menu. I choose the default High Quality Print option and click Done.
If you have used PhotoShop at all, then you are familiar with Actions. Create a new Action that just creates a High Quality PDF Preset. Replay that Action for each digitally photographed page. Then Print each photograph to a PDF file.
Open each PDF file with Acrobat. From the Document menu in Acrobat, choose Optimize Scanned PDF. Even though we didn't scan these images, I found that this process failed unless I executed the Optimize Scanned PDF option.
Next, go back to the Document menu and choose OCR Text Recognition and accept the default settings. Once the text recognition is completed, test your document by using the Find command to locate some text you know is in the document. If the first Find attempt fails, try a couple more to be sure the entire OCR process failed.
Sometimes really poor quality text will not cooperate with this process. But I have had amazing success.
What Do You Do With A PDF?
We tend to think of a PDF as something we want to print. You can do a world of fascinating things with PDF files, but let's talk about one simple thing we can do.
We can select text. That page we photographed with our digital camera is now editable text.
But Acrobat is not intended for extensive text edits and the file, as is, has a lot more data than we need. Highlight the relevant text and use the Copy command, just like you would in your favorite word processor.
Now open another file. Execute the Paste command. The text you copied from the PDF is now in this second file – and you didn't have to type it!
What Software Will Accept Text from a PDF?
Just about any. If you can normally copy and paste while using a software application, you can paste what you have copied from the PDF.
Let's say you find a cemetery inscription that lists eight of your family members in a family plot, along with birth and death dates. Instead of typing them all, copy them from the PDF and paste them into your notes, your Family Search, your Roots Magic, or whatever kind of file you are storing all this information in.
Go back to that cover page information you photographed, and create your source citation. You can even copy the publisher name and other source data.
Once you are satisfied with the PDF, delete the photographs. The PDF is the same as the original photo. It just has the added feature of the text now being editable.
What's the Catch?
OCR recognition is not perfect. You need to carefully proofread the data. But, I personally consider proofing the data and correcting an oddly-interpreted character here and there to be far more pleasant than retyping all of it!
For the technologically challenged, it may sound like a lot of work. For those of us who are computer-savvy, this is a piece of cake. I can OCR photographs of 100 pages, about one memory-card full, in about an hour and a half. I couldn't begin to type even the random lines of text I need from all those pages in the same length of time.
The alternative is piles of notebooks filled with 8 ½ X 11 sheets of paper with ten words here or seven lines there with information you want to keep. Eventually, the paper yellows and the ink fades. Your grandchild spills soda on the paper and it tears while you try to dry up the mess.
But here is the best argument for doing this: it doesn't cost anything, beyond the computer equipment you already have. Forget the quarters for the copy machine. Plus, you don't have to retype data.