Scott on Writing

Musings on technical writing...

Google Blog Search

Google's added yet another beta product to their lineup, this time it's the Google Blog Search.  This new search service competes directly with other blog search engines, such as IceRocket and Technorati.  I've been using IceRocket to search the “blogosphere” as of late, and they seem to have very few splog sites in their results.  Furthermore, IceRocket sorts the results chronologically (whereas Google sorts by relevancy by default) and has neat little tools like the Blog Trends Tool.  However, I do find that IceRocket's response time can be a bit slow at times; that is, doing a search or going to the next page of results might take a couple seconds, whereas with Google it's instantaneous.  Both IceRocket and Google Blog Search provide an RSS (or Atom) feed of the search results.

What's most disappointing with Google Blog Search (and IceRocket, to a lesser degree) is the predominance of splog entries.  If you do a search on anything remotely spammy - lasik, cialis, texas holdem, etc. - the majority of the results are going to be splog sites.  Mark Cuban points the finger at Blogger.com in his post A splog here, a splog there, pretty soon it ads up... and we all lose:

What makes the problem particularly frustrating is that it doesn’t cost anything to setup a blog on what is probably the most common blog host, blogger.com from Google. It’s fast, its easy, it’s free and it can be automated. [Note from Scott: you can make a new blog entry in Blogger.com by simply sending an email message to a specified address...]  So blogs are coming at us left and right. We are killing off thousands a day, but they keep on coming. Like Zombies. It’s straight from Night of the Living Dead. Brain dead splogs. Coming at us by the thousands.

Blogger is by far the worst offender. Google seems to be working hard to adjust their relevancy indexes to exclude splog from having influence on search rankings, but they don’t seem to be doing anything more than removing reported splogs. Kind of like going after the zombies one at a time with a shovel. Can we get some help on this Google?

Keep in mind that Mark is one of the owners/investors in IceRocket...  Speaking of Mark Cuban, he also has a great entry on Google's Blog Search as well, comparing it to IceRocket and listing the major concerns he finds with Google's latest offer: Welcome to the show Google BlogSearch.

Hopefully Google will figure out a good compromise, one that eliminates the vast, vast majority of splog sites but that doesn't nullify any (or many) legit sites.

There are also a number of features currently missing from Google's Blog Search that will, I'm certain, be added eventually.  Some of these include:

  • No integration with Google Search History
  • No “Blogs” tab atop the Google search results (akin to the Images, News, Groups headings)
  • No way to submit my blog's RSS/Atom feed.  According to Google Blog Search Help, “If your blog publishes a site feed in any format and automatically pings an updating service (such as Weblogs.com), we should be able to find and list it. Also, we will soon be providing a form that you can use to manually add your blog to our index, in case we haven't picked it up automatically. Stay tuned for more information on this.”
  • No means of categorization.  It would be nice to be able to drill into blogs by topic rather than just having to do a keyword search.
  • Lack of meta-statistics on the blogosphere.  A buzz index like IceRocket provides or other metadata that can be gleamed from Google's massive index would be most appreciated.

posted on Friday, September 16, 2005 9:48 AM

Feedback

# re: Google Blog Search 9/16/2005 9:55 AM Scott Mitchell

Regarding the speed differences between IceRocket and Google Blog Search, I made a post on Mark's blog - http://www.blogmaverick.com/entry/1234000247058976/#c454086

Here's the germane part of the message:

"... searching on a term in IceRocket may take two to three seconds (or more!) to bring up a page of results, whereas Google is instantaneous. For example, I just tried searching IceRocket.com on Katrina, and here's what it says in the upper-right hand corner:

The most recent posts 1 - 10 of 522,937 for katrina. (9.01 seconds)

Compare that to Google Blog Search:

Results 1-10 of about 785,486 for 'katrina' (0.37 seconds)"

# re: Google Blog Search 9/17/2005 1:46 PM Randy Charles Morin

You said: "If you do a search on anything remotely spammy - lasik, cialis, texas holdem, etc. - the majority of the results are going to be splog sites."

So, if you search for SPAM you'll find it. Really! Sounds like it's working.

# re: Google Blog Search 9/17/2005 10:38 PM Scott Mitchell

Randy, you have a point, but it kind of sucks that those words are being 'held ransom' by spammers. Say you want to read some real life reviews of Lasik procedures - might make sense to search the blogosphere, no?

Besides, your average user might try this, search for a term that is classically 'spammy,' and be put off by the results to the point of giving up on the service, thereby reducing their exploration into blogs. Not good.

# re: Google Blog Search 9/19/2005 5:38 AM Pradeep T.P

I am interested to know from you scott, do you have any idea why google starts its services with Beta and let that word hang for there for ages. I am curious to know whether they have forgotten to remove the same from their news web site http://news.google.com. I still have a feeling whether their news feeds are correct at all under the shadow of "BETA". The same goes for their BLOG search now!

# re: Google Blog Search 9/19/2005 2:17 PM Scott Mitchell

Pradeep, I can only guess as to why Google does the Beta track. I think it's to get products out the door faster and solicit feedback.

Regarding the News service, I've always heard that it lingers in beta because there's an issue with monetizing it when essentially scraping others' content. So Google is leaving it in Beta since they can't legally make money from that particular service.

At least that's what I've read in the past... but they do seem to have a long Beta cycle (GMail was in beta for over a year, no?), and obviously not all products 'graduate'

# re: Google Blog Search 12/5/2005 11:54 PM Tech Bytes

I guess most spammers that do stuff like create 100s of splogs on blogger.com and other places dont have time to create original content.... One way of identifying such blogs is for genuine bloggers to use tools such as http://www.copyscape.com and report plagiarism to the search engines.... if a large number of people start doing this, it would at least keep all this content scraping in check.....

Title:  
Name:  
Url:
Protected by Clearscreen.SharpHIPEnter the code you see:
Comments   

My Links

Ads Via DevMavens

Archives

Post Categories

 

I am a Microsoft MVP for ASP.NET.
I am an ASPInsider.
<March 2010>
SMTWTFS
28123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Comment Stats

DayTotal% of Total
Sunday 2056.8%
Monday 42514.1%
Tuesday 51917.2%
Wednesday 55618.4%
Thursday 58019.2%
Friday 54718.1%
Saturday 1886.2%
Total 3020100.0%

Hour1Total% of Total
12:00 AM 782.6%
1:00 AM 812.7%
2:00 AM 682.3%
3:00 AM 822.7%
4:00 AM 692.3%
5:00 AM 1264.2%
6:00 AM 1193.9%
7:00 AM 1816.0%
8:00 AM 1926.4%
9:00 AM 1585.2%
10:00 AM 1886.2%
11:00 AM 1936.4%
12:00 PM 2016.7%
1:00 PM 1846.1%
2:00 PM 1695.6%
3:00 PM 1354.5%
4:00 PM 1153.8%
5:00 PM 1073.5%
6:00 PM 1013.3%
7:00 PM 1073.5%
8:00 PM 923.0%
9:00 PM 882.9%
10:00 PM 913.0%
11:00 PM 953.1%
Total 3020100.0%

Comments by Blog Entry Date/Time

Day Entry MadeAvg.Total
Sunday 5.00160
Monday 4.80384
Tuesday 4.04477
Wednesday 7.39680
Thursday 6.26676
Friday 5.07466
Saturday 4.78177
Total 5.403020

Hour1 Entry MadeAvg.Total
12:00 AM 5.2937
1:00 AM 1.002
5:00 AM 0.000
7:00 AM 3.8550
8:00 AM 3.72134
9:00 AM 6.06297
10:00 AM 5.63276
11:00 AM 4.22194
12:00 PM 6.16351
1:00 PM 3.09133
2:00 PM 4.89230
3:00 PM 7.67322
4:00 PM 4.00108
5:00 PM 6.07170
6:00 PM 4.64116
7:00 PM 8.95188
8:00 PM 8.63164
9:00 PM 5.00115
10:00 PM 6.31101
11:00 PM 4.5732
Total 5.403020

Learn More About Comment Stats
1 - All times GMT -8...


Blog Stats

Favorite Web Sites

My Books

My MSDN Articles