I’ve been using a new gadget for only a few days. But already, I’m convinced it will vastly increase my productivity and organization, and I wanted to let you all know about it. I bought mine retail from Amazon ($419 before tax; free shipping if you’re not in a hurry), and I’m on no one’s payroll but my own. So this is one man’s unpurchased opinion.
And while this post is not strictly photographic, it applies to photographers and to anyone else who has to manage mountains of inky information adsorbed onto thin layers of pounded cellulose. Paper, that is. File this post, therefore, under “…other stuff that moves me”. For I’ve been very moved indeed by my new Fujitsu ScanSnap 1500M.
Like many of you, I’ve done all I can to go paperless. I pay every bill I can online or by automatic bank draft. I have every statement, invoice, or receipt sent by email, or I download it from a website—when that capability is offered. I save emailed order confirmations of things bought online as keyworded PDF’s; the keyword sequence “tax”, “2010”, “msphoto”, for instance, marks the item as a 2010 photography-related-purchase receipt with tax implications, an easy confluence of tags to Spotlight-search come April 15. In short, I miss no opportunity to eschew paper in favor of 1’s and 0’s on a hard drive. But there’s still a torrent of stuff coming at me that has to be looked at, acted upon, and sometimes stored. All those leaves represent information locked up; space, both mental and physical, consumed.
Over the years I’ve tried many of the paper-to-digital solutions that have spilled out of the modern technological cornucopia. All I have found wanting for one or many reasons. Anyone remember WinFax? Or Paperport? These, and others, promised to convert paper-bound information into digital data that could be stored, searched, sorted, and manipulated. In each case, the process depended on interaction between a scanner or fax machine, and optical-character recognition (OCR) software, in order to work. The scanner or fax machine forms a pixel image of the scanned sheet of paper; the OCR engine reads those pixels and translates them into ASCII characters that can be stored and searched as text. Problem is, with every previous solution I’ve been stymied by either the scanner or the OCR software, if not both.
First, consider the scanner. I’ve owned a bunch of them of various types over the years; I’ve had probably 3 or 4 flatbed scanners, most recently the Epson V750, as well as three film scanners. So I have more than a passing familiarity with scanners and their quirks. For document scanning, a flatbed is usually the tool of choice. Problem is, it’s laborious to use most flatbeds to scan masses of paper documents. Unless there’s an automated document feeder of some kind attached to it, a flatbed is slow and cumbersome to use. You lift the cover, put the document face down, hit “scan”, correct the crop and adjust contrast and tone (at least on the first scan), hit “scan” again to make the final scan, then remove that sheet and start all over again with the next page. Maybe not awful if you have only a page or two to scan; but if you’re trying to input, say, a thirty-page home-equity-loan contract, you’ll need to clear your calendar. Even the scanners with document feeders have tended to be slow in operation, if slightly less cumbersome to keep fed with paper.
Don’t even get me started on OCR software. It’s been a few years since I regularly used it, or looked into it, so I’ll allow that things may have improved without my knowledge. But the OCR programs I’ve used in the past have been, overall, lousy at doing the job, especially if the original document was stained, wrinkled, or otherwise not pristine. And even if the OCR worked well, it often didn’t really cooperate with the scanner to allow a seamless workflow from start to finish. Too many individual, fussy steps to complete.
Enter the ScanSnap. The beauty of this device is that both hardware and software work well, and the two work well together. The scanner is small, having when closed a desktop footprint only slightly larger than half a letter-sized paper sheet. It is vertically oriented, with no lid to mess with as on a flatbed. You place the sheets you want to scan into its hopper and press the glowing blue “scan” button. The pages are drawn in one by one, from the top of the scanner out the bottom front (it can detect page overlaps or accidental simultaneous feeds, and let you correct without having to start all over) and scanned front and back simultaneously, at up to 600ppi, in color. And it’s damn fast; maybe 20 sheets per minute, if not faster. This thing just eats paper.
When the hopper’s stack of paper has been ingested and expectorated, a dialog box pops up on your computer in ScanSnap Manager to ask you what to do next. You tell it whether or not there are more sheets to go with the current batch. If so, you load them and re-commence scanning, and subsequent sheets are appended to the original batch you were working on. If you are done, then you are asked to name the multi-page PDF that will be made from all of these sheets of paper, and give it a destination. If you’d selected a default location in the program’s preferences, you can over-ride it now; I set up a “ScanSnap Inbox” on my desktop specifically to receive the scanner’s output. The PDF is then created and stored. You’ve just made a pixel image of your document’s pages, just as if you’d snapped them with your iPhone camera.
At this point, the real magic happens. You can have ScanSnap’s integral OCR do its thing right then and there to each newly-created PDF as it’s born, if you don’t mind a slight wait. I mean, slight; for all but the most humongous batches of paper, or poorest-quality originals, the OCR will likely be done before you get the staples out of your next document-to-be-scanned and jog the page edges into order. Or, you can simply save the created PDF and OCR it later. I’m using the 1500M, which is a Mac-optimized version of the machine; there is a Windows version also, and there are probably operational differences between the two. I’ll leave that for you to sort out, if you are interested, at the ScanSnap website I linked above.
At least on my Mac version, if you’ve chosen in the preferences to run OCR as each PDF is created, the ScanSnap’s own OCR software kicks in and does the job. But if you want to batch-OCR one or more PDF’s after-the-fact, you do it with the full copy of Adobe Acrobat that’s included with the machine. (That’s huge; bought separately, Acrobat Standard’s about a hundred bucks.) I already had Adobe Acrobat 9 Pro installed as part of Adobe CS5, so I didn’t need ScanSnap’s copy. Either way, Acrobat has its own OCR engine; you select one or more PDF’s to be OCR’d, and Acrobat zips through them. Presto: now you have PDF’s containing searchable text. I’m not sure whether the ScanSnap software itself can do this kind of batch OCR, but it really doesn’t matter, since it’s just as easy to use Acrobat.
As for OCR accuracy—I’m amazed. I’ve scanned brand spanking new crisp documents; old, wrinkled, tinted, and/or faded pages; tiny, faded, 15-year-old thermal-printed cash-register receipts. You name it; the scanner’s handled them all well, and done a great job with the OCR. In theory, you could simply scan all the documents into a giant document dump on your hard drive, allowing the scanner to give its default date+time-based name to each document, and count on the OCR to make the document’s contents searchable by Spotlight. Being much too anal for that, I’ve given my files descriptive names and keyworded a few (either in Acrobat or in the Mac Finder under “File Info”) to facilitate search.
In a mere couple of hours I cleared a six-inch stack of paper from the overflowing maw that is my desktop inbox. That includes time for my agile scamper up the short, gently-sloped learning curve for operating this device. It is brain-dead simple to use. I’ll henceforth be scanning every important piece of paper that crosses my transom as it comes in, and dealing with it right then, before the unstoppable forces of Procrastination sit down my path, lock arms, and begin to chant. This device erects no obstacles to doing precisely that, making it much more likely that I’ll follow through on my oft-redrawn grand plans to “get organized. And once winter’s drear confines me to the house, there’s that big legal-sized 4-drawer file cabinet in the basement with decades worth of paper documents in it….hmmm. [If I actually get around to doing that, I’ll let you know how it went.]
Though $419 ain’t chump change, I think the ScanSnap is worth it for the clutter-abatement and organization it will facilitate. Therefore, I highly recommend it.