AKICIDW [*]
Dec. 19th, 2009 02:08![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
with the boat work really entering the last stages now, i'm looking at my shelves full of books, and ponder scanning them. now, i have lots of experience scanning manga, as well as scanning prose and OCRing it. but the books i most want to scan myself because there are no digital copies out there are craft books.
which are often beautifuly laid out, and that presents a problem. of course the easiest is just making a pdf from the images, but that results in enormous files, and means the text can't be be searched. i'm big on searching these days, especially when it comes to patterns.
does anyone have experience with scanning such books and recreating something similar to the original before it went to press? any suggestions for software (i prefer mac, but can deal with windows and linux)? workflow?
[* = all knowledge is contained in dreamwith]
which are often beautifuly laid out, and that presents a problem. of course the easiest is just making a pdf from the images, but that results in enormous files, and means the text can't be be searched. i'm big on searching these days, especially when it comes to patterns.
does anyone have experience with scanning such books and recreating something similar to the original before it went to press? any suggestions for software (i prefer mac, but can deal with windows and linux)? workflow?
[* = all knowledge is contained in dreamwith]
no subject
on 2009-12-19 17:38 (UTC)no subject
on 2009-12-20 07:40 (UTC)no subject
on 2009-12-19 22:42 (UTC)no subject
on 2009-12-20 07:38 (UTC)no subject
on 2009-12-20 15:33 (UTC)Upon further reflection, scans will be OK once we routinely have monitors which permit 100% viewing of an 8x10 in page. I get many beadworking patterns as downloads these days, but I have to print them out so I can see all the info at once.
no subject
on 2009-12-21 20:47 (UTC)adobe acrobat seems to do a so-so job of sorting text from images, which is making me hope this isn't as much work as i'm thinking right now. except acrobat and i, we're not on good terms. it's the least intuitive software i've ever worked with.
no subject
on 2009-12-20 09:41 (UTC)e.g.
Sorry, that's is not accurate LaTeX source code -- I haven't written LaTeX in a while & can't be bothered looking it up, I'm afraid :) But you get the idea!
no subject
on 2009-12-20 11:23 (UTC)no subject
on 2009-12-21 16:51 (UTC)no subject
on 2009-12-21 20:43 (UTC)when one scans, one creates a raster image of text and/or graphics. the text is not machine-readable unless it has been through a process called OCR (optical character recognition). OCR is pretty good these days, if you make a good, clean scan. the problem is that craft and other how-to books contain lots of images lined up with text that refers to them. which means automatic processing (shove files in one side, get searchable text out the other) goes kind of out of the window.