| |
|
FEB. 25, ISSUE #1058
|
Web Search |
|
|
Blog Search
|
Add a Blog
Search 17,800+ Blogs
Grab a Blog RSS Feed
Blog Express for RSS Feeds
|
SiteProNews Blog
|
Read daily blog posts in the
SiteProNews Blog by two of the Web's top writers, Kalena Jordan
and Jerry Bader.
|
Tools & Services |
|
Webmaster Tools |
Web Page Analyzer
Meta Tag Generator
Keyword Popularity Tool
Link Popularity Checker
Search Engine Submitter
Internet Tools Directory
Site Resource Directory
|
SEO-News Forums |
|
Join the SEO-News Forums to post comments, articles and tips or learn from SEO experts.
|
ExactSeek Toolbar |
Get the toolbar with spyware scanning, webpage keyword analysis, web search on multiple meta engines,
popup-blocking, Alexa site ranking, word highlighting, auto-upgrade and erase browser cookies.
|
Traffic Exchanges |
|
Get Free Visitors to Your Site with these Outstanding Services:

TrafficZap

TrafficSwarm
|
Site of the Day |
|
Design Float is a Digg style, community driven news
aggregator dedicated to the design industry that collects and organizes design-related content from
across the web.
Does your web site qualify as a SPN Site of the Day? Webmaster resource sites can apply via email:
sotd@sitepronews.com
|
App of the Day |
|
TeamViewer 3.5 (1.3 MB) is a simple solution
for remote control, desktop sharing and file transfer that works behind any firewall and NAT proxy. To connect
to another computer just run TeamViewer on both machines. Provides secure, encrypted data transfer to maximize
security. Freeware for Windows 9x/ ME/ 2K/ XP/ Vista.
If you have a Webmaster App that you would like listed on the SPN site, send us an email with details to:
wapps@sitepronews.com
|
Jayde Newsletters |
|
|
Must Read Ebooks |
|
SPN offers one of the best eBook libraries on the Web. Our current selection includes
Commercial and over 183
Freeware eBooks.
Authors can submit eBooks to SiteProNews via email:
ebooks@sitepronews.com
|
Link to SPN |
|
Link your site to SiteProNews, the newsletter
and resource site for Webmasters.
Or,
Add SPN to your site with just 2 lines
of code. Top content for your site without any of the work.
Visit our SPN Partners page. Some great
sites have opted to support the SiteProNews newsletter.
|
SPN Partners |
|
UmbrellaNews - The Web's most comprehensive news site. Stories
updated every 7 to 10 minutes.
SubmitPlus - Post your site to 110
search engines... Gratis.
Template Monster - The Web's number one website
templates are available for immediate download.
Online Site Builder - Providing high end web design, site management & digital media tools
Web-Source - Your Guide to Professional
Web Site Design & Development.
TheCgiSite.com - A directory of programming resources.
TechNewsletters.com - A directory of IT newsletters with ratings & descriptions.
Alexa Toolbar - An indispensable tool for web
professionals, providing Visitor Data, Site Stats, and Contact details for all the sites you visit!.
NewWebDirectory - A new internet web directory of professionally reviewed
web sites providing both freebie and paid site submission.
FreeWebMonitoring - Monitor your web site's availability 24 hours a day,
7 days a week with email alerts and weekly web site statistics.
DropJack.com - A new social content website similar to Digg. Submit original
content or articles and news items that you believe are newsworthy.
SmartWebGadgets.com - Useful gadgets and widgets for your blog or website.
Enhance any web page with just 2 lines of code.
Top 10 Exposure - Forget PPC. Get Google-Type ads
for $3 - $4 per month and top 10 exposure across 225+ search engines & web directories.
ExcellentGuide.com - Provides a directory of trusted, reliable and credible websites
based on a unique credibility scoring system
|
|
|
How Google Applies Science to Search (Part 1)
By Kalena Jordan (c) 2008
|
Dr. Craig Nevill-Manning is a New Zealander who joined Google in 2000 as a Senior Research Scientist
to develop more precise search techniques. Previously, Craig was an assistant professor at the Computer
Science Department of Rutgers University, where he conducted research in data compression, information
retrieval and computational biology. Before that, he was a post-doctoral fellow in the Biochemistry
Department of Stanford University, where he developed a software suite used by pharmaceutical research
laboratories to identify the role of particular proteins within cells.
A scientist at heart, Craig is probably best known as the developer of
Froogle (recently re-named Google Product Search) and
the founder of Google's software engineering center in New York City.
|
This article is a summary of his presentation at Webstock 2008.
Google's Spelling Bee
Craig started his presentation by talking about one of his first challenges in his job at Google:
the spelling correction tool. As the popularity of the search engine grew, Google needed to be able
to spell-correct lots of obscure words. So his solution was to take a sampling of content from the
entire web. Craig's team came up with a algorithmic model and ran it over the web. He discovered that
there were several correct answers to the same question. For example, words like "kofee" could mean
either the searcher is seeking a cup of java or information about Kofee/Kofi Anan.
To combat this, Craig came up with an interesting solution: the "Did you mean?" alternative spelling
option, based on predictive examples of searcher spelling patterns. You can see this in action if you
type in "kofee anan" in Google. Above the search results is a line that reads: "Did you mean: kofi
annan" and links to the search results for this spelling variation too.
But the research went even further. Craig's team worked out how to take into account the context of
the search query by studying the two or three other keywords surrounding the query, for example
"kofee cup" or "kofee anan". The research used the science of bigrams
and trigrams to better understand how people search.
Bigrams are groups of two written letters, two syllables, or two words, very commonly used as the
basis for simple statistical analysis of text. So Craig and his team applied this knowledge to
Google's spelling correction system and now, Google's algorithm can determine the searcher's intent
with much more accuracy, based on the context of the search query.
As an example of the spelling challenges that Google faces, Craig showed the audience the
huge number of ways "Britney Spears" is misspelled
on the web. He said it's encouraging to see that the most popular spelling is also the most correct
one. Scale is important!
|
Google Maps Lead to Apps
The Google team wrote the code for Google Maps many
years ago but the code was actually built into your browser. When Google maps first launched,
people took the dense data-script and worked out how to reverse engineer it for their own use.
Google engineers decided to release an API key to make these mash-ups easier after seeing so
many people reverse engineer Google Maps without Google's help. Now people can mash-up Google
maps within minutes to create their own applications.
To show how easy this is to do, Craig took the audience through the steps to create an
interactive application with Google Maps. In the space of about two minutes, he signed up
for an API key, grabbed the HTML code and pasted it into his page. He then hacked the map
to show Wellington Town Hall (our location) and made the point how easy it is to create
really useful tools out of technology that is already available.
As an example, Craig showed the audience Seattle Bus Monster.
This site uses an API key for Google Maps to make Seattle bus data and tracking available 24/7. Anyone
who needs to catch a bus can look online and instantly find their nearest bus location and run to the
bus stop in time to catch it. It's these types of interactive applications that add value to both
corporate and government sites.
Craig referenced Rodney Brooks from MIT whose provocative paper "Fast, Cheap and Out of Control"
offered new logic and a completely different view of machines. The idea is that there is no center
of control among robots so you should make lots of them; don't treat them so precious. Craig said
developers should use this logic to create lots of small apps that you can replicate and tweak,
rather than one big expensive app that can go horribly wrong. Scale trumps smarts every time!
|
Experiments in Scale That Have Impacted Google's Operations
Precision vs. Recall
Back in the early 90's, information retrieval on the web was limited to things like Lexus/Nexus. So
at that stage, Google would take queries and apply it to the broadest possible search. This was great
recall at the cost of precision. But Larry and Sergey wanted something better so they decided to use
Boolean search. At the time it was heresy because everything was focused on recall. But the Google
founders knew that things had to be super relevant so they developed an algorithm - the core algorithm.
It was very simple and relied on Boolean search to determine relevancy.
Genomic Sequencing
In the mid 90's a large project - the Human Genome Project
- was underway. The race was on to sequence the genome. Scientists decided to feed this out to a bunch
of different people. They chopped up the genome for researchers everywhere and allowed it to replicate.
The researchers mapped each chunk with genetic markers and computed a tiling path of tiny fragments.
Sequencing was very expensive, so the data was computed based on a minute number of chunks - very labor
intensive. The sequencing took forever and reassembling was a long way off. But then a company came along
that said they could do it faster. Sequencing becomes cheaper by automating the job using machines rather
than individual people so this company used a clever computer algorithm to conduct the sequencing. This
reduced the cost and the researchers were therefore able to reassemble more fragments and achieve a rough
draft of the genome in 2000. This sequencing approach was the shotgun approach, where accuracy is lower,
but the larger scale allowed the impossible to become possible.
Web Definitions
Google used to do a terrible job of defining terms. Craig noticed people were searching for "definition
of...", or "what is a...." etc so he wanted the search engine to provide better results for these searches.
He found lots of web pages that contained glossaries and definitions, so he hacked up a Perl script to get
the glossary formats.
The first recall results were only 50 percent accurate. He wanted to improve this rate, so he did some
experiments with the data. But he could never reach an accuracy level he was happy with. It was later he
realized that most of the questions people actually needed answers to could be answered with his crappy
little Perl script. He concluded that 100 percent accuracy is not important, that scale is much more
important.
Now Google allows you to use the "definition:" query and the question format to get definitions from
around the web. Type in "what is a blog?" and you'll get lots of results from Craig's original script.
Protein Sequencing
In biology, Craig says, you're constantly producing proteins. The proteins fold up with particular
sequencing. Within computing, you can use this knowledge to do amazing things. You can conduct
computations with this type of data but it's time consuming. Somebody at Stanford University noticed
that proteins spend a lot of time moving about before folding into an alpha helix. So it was suggested
they start the computations with lots of configurations. In this way you can parallelize the data by
scale and one will be magically close to a folded protein. So they worked out a way to reduce the problem
to a simple process based on mass scale. This is why Google uses maximum scale to conduct algorithmic
computations.
Chess vs. Go
You can now compute the value of any potential move in chess. Based on that information, you can compute
your projected probability of winning the game from any move. Chess grand masters put a lot of time into
this knowledge. But the opposite is true for the game Go, because there is more randomness to the game play.
(Stay tuned for Part 2)
About The Author
Article by Kalena Jordan, one of the first search engine optimization experts in Australia, who is well known and
respected in the industry, particularly in the U.S. As well as running a daily
Search Engine Advice Column, Kalena manages
Search Engine College - an online training institution
offering instructor-led short courses and downloadable self-study courses in Search Engine Optimization and
other Search Engine Marketing subjects.

Printer Friendly Version of this
Article
|
Recommended Webmaster Articles |
|
Webmaster Resource Sites & Services
|
|
Top SEO Tools -
A suite of the best online submission, reporting and SEO Tools available. Sign up for a frëe tríal.
Add Me! - a pioneer in search engine submission, and
the most popular. They provide frëe submission and paid submission.
|
Google Ranking Secrets Revealed!
Boost Your Google Ranking, Get More Orders, And Make More Monëy!
Build Your Traffíc with ABCSearch Get $100 of
FR-E-E qualified Visitors. Sign-up today and we'll match any initial deposit up to $100. Geo-targeting,
full reporting and one clíck results!
|
|
 |