Trouble In the House of Google

Seconded, but Bing was not much better, it seems all search engines are getting gamed starting back in November and still are.

Right or wrong my personal observation is it was not every day, just most days. But some days searching was totally useless.

Thanks to Attwood/Coding H/Stack O for pointing this out and thanks for pointing it out to an audience that actually cares.

They should just divide your page rank by 2^(number of adds on the page). That ought to kill off all these spammy content aggregators.

I’m just confused, I’m defending Google on this one. Yes, search result quality has been affected in some ways, with some types of searches. Do I see this degradation in my general day-to-day use of Google as a search engine? Not much.

To vet my next statements: I’m a .NET developer. I spend 8 hours a day or more on my development workstation, and more time on my personal computers, and this is my observation.

Most of what I have noticed is centralized around product searching. If I’m looking for something relatively popular, or very general, I tend to get more of the “scraper” type sites. This is something I see on all search engines though.

Just an example… or a few:

Today, looking for a way to use the OnStar system in my vehicle via bluetooth (way cool by the way).

The Search: http://www.google.com/search?rlz=1C1GPCK_enUS410US410&sourceid=chrome&ie=UTF-8&q=onstar+bluetooth

Very good results, and I found exactly what I was looking for.

Next, looking for a way to connect to an X session on an android phone.

The Search: http://www.google.com/search?rlz=1C1GPCK_enUS410US410&sourceid=chrome&ie=UTF-8&q=android+display+x+via+ssh+tunnel

Found what I was looking for without trouble.

Some programming things: Looking for a reference on the “RenderBeginTag” for writing server controls in ASP.NET.

The Search: http://www.google.com/search?sourceid=chrome&client=ubuntu&channel=cs&ie=UTF-8&q=RenderBeginTag

Good results, found exactly what I was looking for.

Looking for a refresher on how to serialize an object to XML.

The Search: http://www.google.com/search?sourceid=chrome&client=ubuntu&channel=cs&ie=UTF-8&q=serializing+object+in+.net

You guessed it, found exactly what I was looking for.

Lastly, looking for a quote from the late and great George Carlin.

The Search: http://www.google.com/search?sourceid=chrome&client=ubuntu&channel=cs&ie=UTF-8&q=George+Carlin+The+Public+Sucks

Wow, found it.

From a colonial outpost, excellent article and comments on an issue that has become as clear as daylight (over our veld at least) to ordinary users like myself. Google is now a problem and not the answer if you just want decent information.

I’m sure there’s lots of useful info in the comments. Too technical for this user. One suggestion to Google: since their best results are in my field almost invariably parasitic on Wikipedia in the first instance, why don’t they respond to Jimmy Wells’s current funding appeal with a modest donation? Preferably confidential, and not too large.

I’m a researcher and web designer/pc repairer (not a programer) but have been doing web research for the past ten years and for a book for the past two years and specifically in the past year–hands down–Google search results are “dumber” and more shallow.

Results are more topical, less detailed and often don’t include any relevant results for the actual search you are doing. I find an astounding percentage of results going to yahoo answers and the like where visitors on those sites post their own opinions (and are primarily uniformed teenagers or people getting paid through services like Digital Turk).

I ran a computer repair business for two years and was self taught using Google as my primary knowledge base–if I tried to do that today it would be incredibly more difficult.

On the book research I am having difficulties at times finding more than headline grabs from news stories or spam/scrape sights with gimmick tags.

Google has apparently caved to the “refine results for us” crowd and lost a lot of utility for dedicated data miners such as myself.

Essentially what is happening is this: imagine you run a publishing company. You decided to publish your books under a special copyright that allows anyone to just re-publish it and redistribute it in any way they want. After a certain time have passed you started getting very very angry about your sales dropping. You blaming all those who took advantage of that special license you so generously offered awhile ago.

Somewhat similar situation has happened to Google with Android. They created this great operating system and open sourced it. Suddenly, Verizon decided to use Android but being an open source, and Verizon being a business they decided it’s in their best interest to completely disable Google search on their phone and also disable Google Maps because the free Google maps compete with their own $10/month navigation service and it also happened that Microsoft paid them to replace the default search from Google to Bing.

But at least Google did not whine about it like you. They moving forward with their own deals with Samsung and making their own phone where noone would block Google from it.

" Horror of horrors Bing was better in the end!"

Bing does no better. In 2010 when I did a search for “WPF how to horizontal scroll mouse”, this scraper site ranked ahead of stackoverflow on Google:

http://efreedom.com/Question/1-3727439/WPF-ScrollViewer-Enable-Horizontal-Scrolling-Mouse

http://stackoverflow.com/questions/3727439/wpf-scrollviewer-how-to-enable-horizontal-scrolling-with-mouse

It doesn’t anymore (on Google).

With Bing, try searching “WPF how to horizontal scroll mouse” and the scraper site ranks ahead.

Here’s one idea for how Google can strike back.

Beating the Content Farms: Google Can Automate the Like Button
http://bit.ly/g62ygf

Let’s see. You figure out how Google works, piggy back off of their technology, “giving” people back their answers to their questions, and are perplexed that someone else is piggy backing off of you and Google?

Wake up and smell the web!

Wow, I never realized the severity of this problem until today when I googled “access properties from another EAR”. The #1 result was this question from SO:

http://stackoverflow.com/questions/2044895/how-to-access-application-xml-file-of-an-ear-deployed-to-ibm-websphere-6-1

Sure enough, the number 16 result is:
http://efreedom.com/Question/1-2044895/Access-Applicationxml-File-EAR-Deployed-IBM-WebSphere-61

Which is a 100% copy of the StackOverflow content. The efreedom.com domain hardly even bothers to change the URL!!

Whoah, never expected this day! Link from Krugman (http://krugman.blogs.nytimes.com/2011/01/10/google-needs-sex/) to Brad DeLong (http://delong.typepad.com/sdj/2011/01/trouble-in-the-house-of-google.html) to here!!!

My $0.02 if anyone is interested (unlikely): Google should only index valid secure websites (https:// aka port 443). I think the economic costs of what, $25/year for https:// these days(?) would take 99% of the spammers out of the game.

@Julius Davies

Spammy content farms are big business, $25 is a drop in the bucket for them.

I really like the comparison of Google to gravity, because it implies that Google is a critical component of the nature of the internet. The problem with Google is that it is only a service of the internet, not an integrated part of the internet’s structure. We expect the internet to have a search feature because everything has a search feature, our operating systems, our apps, everything. We expect the pages on the internet to be indexed and organized because, hey, everything else is too. Unfortunately the internet doesn’t have any of these features built in. Google is just another website just like every other website, and it’s limited by the same things that limit every other website application. Using Google to search the internet is like using a third party app to search your Outlook emails (if you have billions of emails). What Google, or any search engine, does is ultimately flawed by the technology it’s built upon. If the internet was created today from scratch it would no doubt be engineered to support indexing and all of the other things that we take for granted, so third parties don’t have to hack their ways around billions of individual html files.

Hi Jeff,
What is this page for? http://stackoverflow.com/questions-all/145
Seems like a link farm, apparently. I landed those chronological lists twice today. They are absolutely unrelated for my Google search queries. Since the page numbers are gradually increases, even keywords were not there.

It would seem that Google’s perfect system already exists in Gmail.

I’ve been using Gmail since it started in 2005. As of today I have over 50,000 archived emails. When I first started Gmail, I would get a couple of spam emails a month in my inbox and immediately report them as spam using the dedicated button.

In the past 3 years, spam emails in my inbox have completely disappeared. They still accumulate by the hundreds in my spam folder, but I never get them in my inbox.

Thanks to their reporting system, a minority of Gmail users provide cross-referenced data that allows the spam to be identified and properly categorized plus whatever else Google uses. I believe there was a white paper on their anti-spam technology a couple years ago showing their techniques.

In contrast, I have an old GMX.de free email account that is now nothing but hundreds of spam emails a month in the inbox.

Google is hardly a monopoly. I’ve been using DuckDuckGo for a while now, and it seems to work very well. I’ve also started playing with Blekko. If more people start leaving Google, maybe they’ll make more of an effort to weed out spam sites.

At the rise of Search the public couldn’t believe that a single search box could ever return adequate results from all over the web at all. SEO evaluated enormous, but when it comes to results that reflect your needs more exact SEO is not the holy grail.
For example business apps do use other indexing mechanisme like metadata combined with a social distance in addition to ranking. I do believe the public gut feeling that SEO will not provide you with the right answer will become real in the end. At least a part of that feeling. I also believe that the quality of results could improve by adding the social component in SEO.
So is it time to redefine gravity and implement a new model to assure that we still will find relevant and authentic information in the future? I do think it is time to innovate.

Great article. Thanks.

I personally find myself spending more and more time on filtering results that I know (or think) is SO scraped content.

Google should make it possible to flag other sites’ content as duplicate of that on your own site (of course they need to verify that it actually is duplicate), so that sites going to extremes in terms of scraping will get degraded in the results.

I guess Google has (long time ago) lost the human touch, and is increasingly becoming a system to beat.

Gravity is not broken, gravity is just gravity.

But yes, there needs to be an explicitly made catalog of things, because otherwise search results can contain what ever related material.

I just tried today to search for information on how to pay a certain well-known company through a certain well-known bank. Page after page of spam, spam, spam, spam, spam, spam, spam, baked beans and spam. I must have tried 10 different searches with various synonyms, phrases and exclusions, and all it did was slightly change the order and keyword relevance of the spam.

The worst part is that the results from all of these sites are identical. It would be nice if Google had at least a modicum of intelligence to say, “Hey, if Mr. User here isn’t interested in result #1, he’s probably not going to be interested in all of these identical copies of it down below.”

Three years ago it used to be that the content I was looking for often didn’t exist at all or wasn’t indexed, and I was okay with irrelevant results. I’d gladly settle for only a 50% chance of getting the results I wanted as a replacement for the ocean of obvious, pathetic spam I seem to get 90% of the time now.

Here’s a thought, Google: How about tossing all of these garbage copypasta spam sites into a “mirrors” link for the original result? Surely you can figure out which site is actually the original; you invented PageRank, so checking a few indexing dates should be practically a “hello world” level of difficulty.