Shall I Compare Thee To A Snake, A Gorilla, A Jungle, Bananas, Sex…

Uri Friedman at The Atlantic with a round-up.

Paul Kedrosky:

Over the weekend I tried to buy a new dishwasher. Being the fine net-friendly fellow that I am, I  began Google-ing for information. And Google-ing. and Google-ing. As I tweeted frustratedly at the tend of the failed exercise, “To a first approximation, the entire web is spam when it comes to appliance reviews”.

This is, of course, merely a personal example of the drive-by damage done by keyword-driven content — material created to be consumed like info-krill by Google’s algorithms. Find some popular keywords that lead to traffic and transactions, wrap some anodyne and regularly-changing content around the keywords so Google doesn’t kick you out of search results, and watch the dollars roll in as Google steers you life-support systems connected to wallets, i.e, idiot humans.

Google has become a snake that too readily consumes its own keyword tail. Identify some words that show up in profitable searches — from appliances, to mesothelioma suits, to kayak lessons — churn out content cheaply and regularly, and you’re done. On the web, no-one knows you’re a content-grinder.

Charles Arthur at The Guardian:

The reason why this has happened is obvious: Google is the 900-pound gorilla of search, with around 90% of the market (excluding China and Russia), and there’s an entire industry which has grown up specifically around tickling the gorilla to make it happy and enrich the ticklers. I’ve not come across anyone who describes their job as “Bing results optimisation”, nor who puts that at the top of their business CV. Well, I’m sure there are people inside Microsoft whose job title is exactly that. But not outside it.

There are two lines of thought on what happens next.

1) Google comes back from the Christmas break newly determined to fix those damned scraping sites that don’t originate content, because it says in its own webmaster guidelines that “Google will take action against domains that try to rank more highly by just showing scraped or other auto-generated pages that don’t add any value to users.”

The only value those scrapers add, in fact, is to Google, because they display tons of AdSense ads. (Well, you can make a fair bet that they aren’t Bing’s equivalent.)

Wait – the scrapers that dominate the first search page, the place from which 89% of clicks come (for only 11% of clicks come from the last 990 results out of the first thousand, or at least did in 2006, a number that has probably only shifted down since then) all benefit Google financially, even while it sees market share improvements? That’s not quite the disincentive one might have hoped for that would make Google act.

2) People start not using Google, because its search is damn well broken and becoming more broken for stuff you care about by the day. This could happen. The question is whether it would be visible enough – that is, whether enough people would do it – that it would show up on Google’s radar and be made a priority.

Over at Hacker News, the suggestions in the comments echo the idea that Google’s search really isn’t cutting the mustard any more (“vertical search” is the new watchword). Which means that really, Google does need to implement method (1) above. It might not notice if a few geeks abandon it – but once the idea really gets hold (as it will through the links they offer and comments they drop) that Google’s search is broken, then the rout begins.

I haven’t been able to get a comment from Google on this, though I’m sure it would run something along the lines of “Google makes every effort to make its search results the best and takes seriously the issues raised here.”

Update: Google responded to this article: “Google works hard to preserve the quality of our index and we’re continuing to make improvements to this. Sites that abuse our quality guidelines or prove to be spam are removed from our index as fast as possible”. (For clarification, I didn’t initially contact Google as it was a public holiday when I wrote the original article. Matt Cutts did not respond to Twitter contact as he is on holiday, Google says.)

It would be crazy not to. The question is whether it really can make a difference.

Vivek Wadhwa at Tech Crunch:

This semester, my students at the School of Information at UC-Berkeley researched the VC system from the perspective of company founders. We prepared a detailed survey; randomly selected 500 companies from a venture database; and set out to contact the founders. Thanks to Reid Hoffman, we were able to get premium access to LinkedIn—which was very helpful and provided a wealth of information.  But some of the founders didn’t have LinkedIn accounts, and others didn’t respond to our LinkedIn “inmails”. So I instructed my students to use Google searches to research each founder’s work history, by year, and to track him or her down in that way.

But it turns out that you can’t easily do such searches in Google any more. Google has become a jungle: a tropical paradise for spammers and marketers. Almost every search takes you to websites that want you to click on links that make them money, or to sponsored sites that make Google money. There’s no way to do a meaningful chronological search.

We ended up using instead a web-search tool called Blekko. It’s a new technology and is far from perfect; but it is innovative and fills the vacuum of competition with Google (and Bing).

Blekko was founded in 2007 by Rich Skrenta, Tom Annau, Mike Markson, and a bunch of former Google and Yahoo engineers. Previously, Skrenta had built Topix and what has become Netscape’s Open Directory Project. For Blekko, his team has created a new distributed computing platform to crawl the web and create search indices. Blekko is backed by notable angels, including Ron Conway, Marc Andreessen, Jeff Clavier, and Mike Maples. It has received a total of $24 million in venture funding, including $14M from U.S. Venture Partners and CMEA capital.

In addition to providing regular search capabilities like Google’s, Blekko allows you to define what it calls “slashtags” and filter the information you retrieve according to your own criteria. Slashtags are mostly human-curated sets of websites built around a specific topic, such as health, finance, sports, tech, and colleges.  So if you are looking for information about swine flu, you can add “/health” to your query and search only the top 70 or so relevant health sites rather than tens of thousands spam sites.  Blekko crowdsources the editorial judgment for what should and should not be in a slashtag, as Wikipedia does.  One Blekko user created a slashtag for 2100 college websites.  So anyone can do a targeted search for all the schools offering courses in molecular biology, for example. Most searches are like this—they can be restricted to a few thousand relevant sites. The results become much more relevant and trustworthy when you can filter out all the garbage.

The feature that I’ve found most useful is the ability to order search results.  If you are doing searches by date, as my students were, Blekko allows you to add the slashtag “/date” to the end of your query and retrieve information in a chronological fashion. Google does provide an option to search within a date range, but these are the dates when website was indexed rather than created; which means the results are practically useless. Blekko makes an effort to index the page by the date on which it was actually created (by analyzing other information embedded in its HTML).  So if I want to search for articles that mention my name, I can do a regular search; sort the results chronologically; limit them to tech blog sites or to any blog sites for a particular year; and perhaps find any references related to the subject of economics. Try doing any of this in Google or Bing

Anil Dash:

Noticing a pattern here?

Paul Kedrosky, Dishwashers, and How Google Eats Its Own Tail:

Google has become a snake that too readily consumes its own keyword tail. Identify some words that show up in profitable searches — from appliances, to mesothelioma suits, to kayak lessons — churn out content cheaply and regularly, and you’re done. On the web, no-one knows you’re a content-grinder.

The result, however, is awful. Pages and pages of Google results that are just, for practical purposes, advertisements in the loose guise of articles, original or re-purposed. It hearkens back to the dark days of 1999, before Google arrived, when search had become largely useless, with results completely overwhelmed by spam and info-clutter.

Alan Patrick, On the increasing uselessness of Google:

The lead up to the Christmas and New Year holidays required researching a number of consumer goods to buy, which of course meant using Google to search for them and ratings reviews thereof. But this year it really hit home just how badly Google’s systems have been spammed, as typically anything on Page 1 of the search results was some form of SEO spam – most typically a site that doesn’t actually sell you anything, just points to other sites (often doing the same thing) while slipping you some Ads (no doubt sold as “relevant”).

Google is like a monoculture, and thus parasites have a major impact once they have adapted to it – especially if Google has “lost the war”. If search was more heterogenous, spamsites would find it more costly to scam every site. That is a very interesting argument against the level of Google market dominance.

And finally, Jeff Atwood, Trouble in the House of Google:

Throughout my investigation I had nagging doubts that we were seeing serious cracks in the algorithmic search foundations of the house that Google built. But I was afraid to write an article about it for fear I’d be claimed an incompetent kook. I wasn’t comfortable sharing that opinion widely, because we might be doing something obviously wrong. Which we tend to do frequently and often. Gravity can’t be wrong. We’re just clumsy … right?

I can’t help noticing that we’re not the only site to have serious problems with Google search results in the last few months. In fact, the drum beat of deteriorating Google search quality has been practically deafening of late.

From there, Jeff links to several more examples, including the ones I mentioned above. As Alan alludes to in his post, the threat here is that Google has become a monoculture, a threat I’ve written about many times.

Felix Salmon:

It turns out that the banana we all know and love — the Cavendish — is actually the second type of banana grown in enormous quantities and exported across Europe and North America. The first was the Gros Michel, which was wiped out by Tropical Race One; you might be saddened to hear that “to those who knew the Gros Michel the flavor of the Cavendish was lamentably bland.” Indeed, Chiquita was so sure that Americans would never switch to the Cavendish that they stuck with the Gros Michel for far too long, and lost dominance of the industry to Dole.

In both cases, the fact that the same species of banana is grown and eaten everywhere constitutes a serious tail risk, even if today’s desperate attempts to genetically modify a disease-resistant Cavendish bear fruit:

A new Cavendish banana still didn’t seem like a panacea. The cultivar may dominate the world’s banana export market, but, it turns out, eighty-seven per cent of bananas are eaten locally. In Africa and Asia, villagers grow such hetergeneous mixes in their back yards that no one disease can imperil them. Tropical Race Four, scientists now theorize, has existed in the soil for thousands of years. Banana companies needed only to enter Asia, as they did twenty years ago, and plant uniform fields of Cavendish in order to unleash the blight. A disease-resistant Cavendish would still mean a commercial monoculture, and who’s to say that one day Tropical Race Five won’t show up?

This is exactly what I was talking about a year ago, in my post about Dan Barber, world hunger, and locavorism, when I talked about how monocultures are naturally prone to disastrous outbreaks of disease, and how a much more heterogeneous system of eating a variety of locally-grown foods is much more robust and equally capable of feeding the planet.

[…]

The problems with monoculture aren’t purely agricultural, either. Anil Dash has a post up today about the decline of Google search quality, and diagnosing the problem as being that “Google has become a monoculture”; Alan Patrick quotes a commenter at Hacker News as saying that if search were more heterogeneous, spamsites would find it more costly to scam every site.

I’m not completely convinced that seeing large numbers of SEO sites atop search results for consumer goods is entirely a function of the fact that Google is a monoculture. My guess is that in fact what we’re seeing is simply the result of enormous numbers of SEO sites, all using slightly different methods of trying to game the Google algorithm. Even if only a small percentage of those SEO sites succeed, and even if they only succeed briefly, the result is still a first page of Google results dominated by SEO spam — a lose-lose proposition for everybody, but one which wouldn’t be solved by having heterogeneous algorithms: they would all simply have different SEO sites atop their various search-result pages.

But maybe if Google wasn’t a monoculture, there wouldn’t be quite as many SEO sites all trying to hit the jackpot of, however briefly, landing atop the Google search results. In general, monoculture is a bad and brittle thing — and that goes for search as much as it goes for bananas.

Brad DeLong

Paul Krugman:

Brad DeLong takes us to two articles on trouble with Google: basically, scammers and spammers are doing their best to game the search engine, and in the process making it less useful to the rest of us. And people are turning to other search engines that are less affected, precisely because they’re less pervasive and the scammers and spammers haven’t adapted to them.

This makes me think of sex.

If you follow evolutionary theory, you know that one big question is why sexual reproduction evolved — and why it persists, given the substantial costs involved. Why doesn’t nature just engage in cloning?

And the most persuasive answer, as I understand it, is defense against parasites. If each generation of an organism looks exactly like the last, parasites can steadily evolve to bypass the organism’s defenses — which is why yes, we’ll have no bananas once the fungus spreads to cloned plantations around the world. But scrambling the genes each generation makes the parasites’ job harder.

So the trouble with Google is that it’s a huge target, to which human parasites — scammers and spammers — are adapting.

I’m not quite sure what search-engine sex would involve. But Google apparently needs some.

Matthew Elshaw:

And that’s not all, there are a large number of other posts which share the same thoughts on Google’s declining search quality.

While the major problems with Google’s search quality appear to be the rise of content farms and review sites, some posts also mention a number of other grey hat SEO tactics like link buying and doorway domains that are still working for some sites.

With the number of posts on this topic, I don’t think it will be long before a Google representative steps in to clear the air. In the mean time, what do you think about Google’s search results? Have you seen a decline in quality in recent months?

Leave a comment

Filed under Technology

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s