Matt Raymond at Library of Congress Blog:
Have you ever sent out a “tweet” on the popular Twitter social media service? Congratulations: Your 140 characters or less will now be housed in the Library of Congress.
That’s right. Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress. That’s a LOT of tweets, by the way: Twitter processes more than 50 million tweets every day, with the total numbering in the billions.
We thought it fitting to give the initial heads-up to the Twitter community itself via our own feed @librarycongress. (By the way, out of sheer coincidence, the announcement comes on the same day our own number of feed-followers has surpassed 50,000. I love serendipity!)
We will also be putting out a press release later with even more details and quotes. Expect to see an emphasis on the scholarly and research implications of the acquisition. I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I’m certain we’ll learn things that none of us now can even possibly conceive.
Jennifer Van Grove at Mashable:
Twitter further explains the news in its own announcement. Biz Stone writes that after a six-month delay, “Tweets will be used for internal library use, for non-commercial research, public display by the library itself, and preservation.”
The news is quite significant and reinforces the importance of the information we share in 140 characters or less. In many ways history can be relived through tweets, and now the Library of Congress can ensure that not a single character is lost in the sea of real-time information.
Chris Morran at The Consumerist:
Remember that Tweet you wrote about Tiger Woods that seemed hilarious at the time? Or that night you shared your thoughts on your cousin Bob’s lack of personal hygiene? Good news — all of the world’s most trivial 140-character-or-less Tweets will soon be housed forever in the Library of Congress.
The Library of Congress now has its bookish little hands on every public Tweet ever Tweeted in the 4-year history of Twitterdom.
Doug Mataconis at Below The Beltway:
Of course, it also includes two-plus years worth of my standing morning tweet, which usually is something like “Awake. Need coffee,” along with late night tweets that I’m sure would rather be forgotten by the people who sent them. Now, they’re preserved for posterity.
When I first heard this announcement, I was more than a little, well, surprised. What possible use could the Library of Congress have for the often inane 140 character statements of 105 million people ?
John Dupuis at Science Blogs:
Needless to say, this is a pretty incredible announcement. It’s great that a major public institution can step forward and do the kind of digital preservation job that only that kind of institution would be capable of.
It would be really great if their next step could be a similar archiving project for, say, Blogger or WordPress blogs. Or perhaps other big national libraries around the world could each pick a site and dedicate themselves to preserving their content for future generations.
Heidi Moore at The Big Money:
The problem: Who says my tweets belong to Google or the Library of Congress? They didn’t even buy me dinner to discuss this. And they won’t buy you dinner, either, even though they are annexing the work that you did with absolutely no logical or plausible explanation of why they should own it.
Twitter’s entire appeal—and how it was sold to its users—was this: short, ephemeral 140-character bursts that were largely completely unsearchable. Twitter’s own search function doesn’t go back more than two weeks, and mostly it doesn’t work properly. Thus, while many tweets were substantive links and discussions of major issues from stock trading (through the Stock Twits network) to agriculture, many more were typo-laden, banal observations about what to eat for lunch. (I, like most smart Twitter users, don’t follow those people. But they’re there.)
First, let’s not pretend that the Library of Congress cataloging and saving every tweet ever—a capability not even open to private corporations—was a totally foreseeable consequence, outside of the Psychic Friends Network. If you ask anyone who wrote a tweet or anyone who knows what Twitter is, “hey, where do you think your tweets will be in five years?” it’s fair to say that “The Library of Congress” wouldn’t have been the first or the 15th answer. It’s also fair to say that “making money for Google in a vast database created quietly without public knowledge” would be high on the list either.
Twitter had a duty to let its users know—clearly, not in vague terms—that their ephemeral tweets would become permanent and searchable. That’s basic corporate misrepresentation.
Second, let’s think about why Google and the Library of Congress have a right to any tweets. Do you know why they would? I don’t. This isn’t about privacy. It’s about who the content belongs to. And just because something is on the Web, open to all, doesn’t mean it belongs to the government OR to Google. A wide variety of news sources are on Google, but that doesn’t mean that Google owns the right to catalog and republish them in the future, packaged in Google’s own way outside of their original users.
Phoebe Connelly at The American Prospect:
On Wednesday, the Library of Congress announced it had signed an agreement with the microblogging service Twitter to archive all public tweets sent since the service began in 2006. I spoke with Martha Anderson, the director of the National Digital Information Infrastructure and Preservation Program at the Library of Congress, about the project and how it fits into the library’s digital-archiving efforts. She warned me when we got started that her department had a cumbersome name.
So who came to you with the request, or the idea about Twitter?
Twitter approached us. They were looking around; they are a small business — which happens, quite often. Businesses cannot afford to sustain all the content they create over the life of the business. And Twitter hadn’t reached that point yet, but they were aware of the need to sustain the content someway.
So they began to look around for a strategy for conserving that content in the long term. They knew we had this program at the library, so they called us and asked if we were interested in the Twitter archive.
We do a collection for every Supreme Court nominee — Web sites and blogs and all sorts of things. Well, one of the things they asked us to collect were tweets for the nomination of Justice Sotomayor. So that was the first indication we had that our selection officials were interested in Twitter.
Correct me if I have this wrong, but in the past, you’ve done your Web archiving on a subject basis, and this is the first time you’re grabbing an entire type of content off the Web?
Exactly. And that’s the significance of this. Yesterday [Wednesday, when the agreement was announced], one of my staff came in to tell me that people were saying this was a change from static to streaming. This is first time [on the web] we’re looking at a whole corpus of material from a source.
And I think personally, this is me, don’t quote me as saying this from the library, as librarians we need to think more about our relationships to content creators, content-generating activities, in a way we used to think about things with publishers — we would get a relationship to a publisher through copyright, or that sort of thing. Now, the information base is different, and we really need to work on those kinds of relationships.
Is there anything analogous in Library of Congress history?
Well, the library is accustomed, with analog materials, to collecting everything from a creator — we have in our prints and photograph division all the output from the Department of Interior’s historic American buildings survey. It’s a huge record of American architecture.
A lot of time we will get all the negatives and works of a photographer. So we’re used to a mass of things, rather than a selection in the analogue world. This is our first foray into doing this in the digital world.
When do you start?
The agreement has been signed, but we still have a lot of technical details to work out — how we’ll technically transfer it, and when. There’s a built in six-month window, so we don’t have the live Twitter archive at any given time. There is a window for people if they want to delete their tweets, things like that.
There’s a built-in lag? Yes, so once the transfer is complete, if a researcher comes here, we’ll let them know that it’s 2006 till six months prior. And there’ll be a rolling period of transfers after that.
Can individuals choose to opt their tweets out of it?
You know, I don’t know. I think that’s a question for Twitter. There’s several questions about that which they are still working out. We asked them to deal with the users; the library doesn’t want to mediate that.
What about user information? Have you any thoughts about whether you’re going to keep that or strip that out? Obviously, that gives a lot of context for a tweet.
It does. And I think that’s one of the big issues for us to understand in terms of privacy. And there’s a lot of work going on, especially over at [the National Institutes of Health] about how to anonymize data and still make it useful. We’re really big on partnering with people to learn what they’re learning, so I think that’s an area we’ll look into. In serving it, what can we do to make it useful to research but not identify personal information?
Is the plan to keep all tweets, forever?
Nothing is forever! I think this is a real learning opportunity. We’re embarking on this with the idea that what we receive, we will keep for the long term. That’s about the best we can say.
How much will it cost?
Well, it’s a gift; we didn’t pay for it. But it will be the cost of storing what is, right now, around 5 terabytes, and the staff effort of maybe one full-time person over the years.
UPDATE: Christopher Beam at Slate