acmespaceship: (Default)
[personal profile] acmespaceship
Today's New York Times says:

"With little fanfare, Google has made a mammoth database culled from nearly 5.2 million digitized books available to the public for free downloads and online searches, opening a new landscape of possibilities for research and education in the humanities ... It consists of the 500 billion words contained in books published between 1500 and 2008 in English, French, Spanish, German, Chinese and Russian."

I have here graphed the occurrence in English books from 1900-2000 of "robot," "martian," "monster" and "spaceman."  I conclude that "spaceman" had a short run but "monster" is forever.  And who was using "robot" in 1900-1910 before Karel Capek?  (Google admits there are OCR errors which can explain some anomalies, which seems ironic given the word we're looking at.)

I also conclude that the word "uke" had a different meaning in English in 1730.

Have at it, folks.

Date: 2010-12-17 10:43 pm (UTC)
ext_63737: Posing at Zeusaphone concert, 2008 (zeusaphone rockin')
From: [identity profile] beamjockey.livejournal.com
Some metadata incorporated into Google Books is poor. ("Are poor?" "Metadata" is probably a plural.) In a lot of catalogues, if the librarians didn't insert a date for some reason, it defaulted to 1900.

A widely-quoted blog entry by curmdgeon Geoffrey Nunberg about this type of problem.

A more concise magazine article by Nunberg.

Progress on the problem.

Beyond this, no doubt there is more to be learned by the diligent googler. But is it research or is it time-wasting?

As for "uke," all the 18th-century uses I skimmed really were OCR errors for such words as "duke" and "take," possibly due to a lower standard of reproducibility in typesetting. Wobbly letters. Differential paper shrinkage. Imperfect application of inks. Take a look yourself.

Date: 2010-12-20 04:44 pm (UTC)
From: [identity profile] acmespaceship.livejournal.com
In the immortal words of Johnny Carson, I did not know that. About the default to 1900. Very interesting. I always thought time travelers would tend to cluster around the turns of centuries just because people are lazy and requesting "1900" takes less thought than, say, "1921."

"Metadata" is plural, but so is "data," and I gave up that fight long, long ago.

See my comment to WHL about infallible Papal steampunk robots. Which I think is now my favorite phrase of the week.

Date: 2010-12-18 12:11 am (UTC)
From: [identity profile] whl.livejournal.com
Bill has pointed out some studies of the errors, but following up the anomaly on Robot, I found:
A copy of "Microprocessors and microsystems: Volume 10" from 1906 (Some careless time traveler brushing up for their final in history of CS probably dropped it.)

And from Appleton's cyclopedia of American biography, we have:
  • Dr. Robot was created prefect apostolic by Pope Pius IX. on 9 July, 1876.


I claim that if Pius IX created a robot in 1876, he qualifies for steampunk.

(Hmmm, I wonder if that cyclopedia was by Victor Appleton...)

Date: 2010-12-20 04:38 pm (UTC)
From: [identity profile] acmespaceship.livejournal.com
It would be an infallible robot. An infallible Papal steampunk robot! With great clothes, because everything at the Vatican has great clothes. Ooh, I want to see that, although as a Protestant I should be very, very afraid.

Our holographic re-animated Martin Luther probably can't compete. We'll need to activate the Calvinator.

Profile

acmespaceship: (Default)
acmespaceship

April 2017

S M T W T F S
      1
2345 678
9 101112131415
16171819202122
23242526272829
30      

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 22nd, 2026 09:59 pm
Powered by Dreamwidth Studios