piranha: red origami crane (reading)
[personal profile] piranha
MOUNTAIN VIEW, Calif. - December 14, 2004 - As part of its effort to make offline information searchable online, Google Inc. (NASDAQ: GOOG) today announced that it is working with the libraries of Harvard, Stanford, the University of Michigan, and the University of Oxford as well as The New York Public Library to digitally scan books from their collections so that users worldwide can search them in Google.

this is an extension of the google print project, and i am excited about it.

on 2004-12-15 01:20 (UTC)
eagle: Me at the Adobe in Yachats, Oregon (Default)
Posted by [personal profile] eagle
Here's the thing that worries about about this, and maybe I'm just being paranoid and this isn't actually a problem. There is nothing said in the press release about who owns the result of this work, and I have this sneaking suspicion that it's Google rather than the library.

Sure, being able to do text search on books in the library is interesting and useful, but it's not the entirety of the problem. The long-term goal really should be to get all of the information digitized and accessible in digital form so that we don't use it when the books decompose. Obviously that can't be fully public in the case of things that are in copyright, but I also don't want to leave the content owned by a commercial company, even a nice commercial company.

The Harvard project said that they were going to put the text of public domain books on-line as well, so maybe I'm just being too paranoid and the library will end up owning the data -- but does that include the search metadata? Or does that all stay with Google? And if so, what happens if Google changes their business model or for whatever reason doesn't want to provide free access to this any more?

Anyway, if this is just something Google is doing for free to be nice, hey, I don't want to look a gift horse in the mouth. But if this ends up being used as a reason to not get search indexes and digital information owned and controlled by the non-profit university and library systems, and instead just settling for having that information controlled by a for-profit company, that worries me.

Re: 'nother neato google project

on 2004-12-15 01:46 (UTC)
ext_481: origami crane (Default)
Posted by [identity profile] pir-anha.livejournal.com
yeah, i definitely think that's worth being concerned about, and checking out. i've not done that; i just saw the press release and thought "rare books online! yay!". and i am busy with something else right now, so i'm just putting up a link to their terms and conditions if anyone else wants to see whether they're explicit about this sort of thing.

https://print.google.com/publisher/terms

(man, i wish LJ would copy the subject header; it pisses me off that they don't. subject headers are a good thing.)

Re: 'nother neato google project

on 2004-12-15 09:10 (UTC)
eagle: Me at the Adobe in Yachats, Oregon (Default)
Posted by [personal profile] eagle
Google is apparently giving the digitized content back to the libraries to do with what they want. Excellent.

This article also raises the other things that I'm concerned with, though, and I'm sure that the search metadata is being kept private to Google since I'm sure it's part of their proprietary search technology.

The free software advocate in me always looks at these sorts of things as interesting technological demonstrations, but not actually the real thing yet. It's not real until it can be done by anyone without paying Google money. (But the great thing about a commercial company doing it, provided that the intellectual property land grab in the US doesn't get any worse, is that it means that everyone will be able to do it at least in 20 years or so, which is still within my lifetime. Hopefully by the time that we all have good access to this sort of digitized information, they will have perfected the technology that can print me a real book from electronic data, or at least will have improved the electronics of hand-held readers a lot over what they're like now.)

Re: 'nother neato google project

on 2004-12-15 18:46 (UTC)
Posted by [identity profile] huaman.livejournal.com
I argue this point with people all the time, about e-book type stuff. What usually gets through to them is when I say, this'll really be a useful technology when I'm not at risk of saying "Aw dammit, I dropped the thing in the bathtub while reading and now I'm out hundreds of bucks and can't read anything else!" or "Crap, I left it on the plane." And that's me, and I'm someone who *does* spend tons of time online and all, you know?

Re: 'nother neato google project

on 2004-12-16 04:06 (UTC)
ext_481: origami crane (Default)
Posted by [identity profile] pir-anha.livejournal.com
*nod*. same here, actually, though for slightly different reasons -- it'd be great if i could read in the bath, but this bathtub is too small (and the boat won't have one at all) and i am not prone to leave books on airplanes.

but i don't like reading books on the currently available palm-size thingies. they're too damn small. the screens are not friendly to my eyes. i also don't like reading books on my laptop because that's too large, *heh*. i like reading in bed, or other comfortable positions, and a paperback (not too thick, please) is just the right size for that. so, until there's an ebook reader like that, i am not shelling out any money for suboptimal tech. oh, and i don't want to store all my reading material on it either, *shudder*. must have some alternative backup so if i need to, i can print.

on 2004-12-15 18:20 (UTC)
Posted by [identity profile] huaman.livejournal.com
I'm not sure what all is public or not public, but, there is a SUL-AIR press release that went to all-sul-staff -- are you a SUL-AIR person if you're in ITSS? Anyway, I have the Mike Keller email about it to all-sul-staff if you're curious and don't have it.

Stuff that I personally have knowledge of involves things google's been doing a good job of issuing press releases about, that are actually hosted, in many cases, where I work, which isn't google. Those things are all subject to access control and the terms are dictated by the publisher, who owns the publishing rights. The whole "information wants to be free" debate is, of course, one that I've always been interested in personally, and one that has really interesting twists now that I've spent 6+ years dealing with not-for-profit academic publishing.

It's WAY more than I can readily distill down to an LJ comment, but, it turns out there are a lot of not-for-profit (which doesn't mean free) and commercial operations which already have indexing metadata as major pieces of business, a lot of rivalries and politics about who gets to do what, and who wants to make what free. It's all quite the morass of confusing stuff, frankly.

Anyway. Bringing stuff back to what I know about specifically, for example, a lot of the stuff indexed in scholar.google.com is actually hosted here -- and we don't own the content, we just put it online. And there are slews of metadata-searchers who link to the sites we host, but permission to access the full content is still up to the owners of the content. A lot of those content-owners, however, choose to have metadata deposited not only in places like NIH-type repositories, but other commercial operations that then link back, and so forth.

All of that to say, a lot of publisher type people, at least in the not-for-profit scholarly world, seem to be less concerned about the metadata than you or I might be, so long as the actual content is controllable. It's interesting.

on 2004-12-15 02:25 (UTC)
Posted by [identity profile] stonebender.livejournal.com
Waaaaaaay cool!

on 2004-12-15 11:53 (UTC)
Posted by [identity profile] janetmk.livejournal.com
Neato indeed.

The burning question (from my PG Distributed Proofreaders perspective) is how much, if any, they will clean up the raw OCR. Old books present special problems to standard OCR programs--caused by such things as yellowing and speckling, non-modern typefaces, book dust, and old typesetting practices.

Has the masterful Google developed improvements on OCR processing that will make browsing and reading the public-domain texts that they will make available a non-painful experience? Will a human actually look at the OCR output to correct major errors?

Do people care as long as they getting something for nothing?

Profile

piranha: red origami crane (Default)
renaissance poisson

July 2015

S M T W T F S
   123 4
567891011
12131415161718
19202122232425
262728293031 

Most Popular Tags

Expand Cut Tags

No cut tags