getButterfly Logo getButterfly code wrangling since 2005

Let’s talk about MSN’s innovative RankNet technology. The latest major change in the search engine giant, MSN Search, has been the inoculation of “neural networks” into its search engine algorithm, something internal researchers call “RankNet”. This change took place in late June of this year. This algorithm is fresh, and it is becoming a great consideration for many search optimizers.

RankNet is, in essence, a “learning machine” that takes the patterns of human searches into account, and learns from them, in order to provide more relevant results the next time around. They start from a baseline of predictions made that are input into its neural net. Chris Burgess of MSN says, “We take a bunch of data, ‘propagate’ it through the network and get values out of the network.”

They make their predictions with supervised learning, which means, and we’ll get a little technical here, so feel free to skip this part,

“…a machine learning technique for creating a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs. The output of the function can be a continuous value (called regression), or can predict a class label of the input object (called classification). The task of the supervised learner is to predict the value of the function for any valid input object after having seen only a small number of training examples (i.e. pairs of input and target output). To achieve this, the learner has to generalize from the presented data to unseen situations in a ‘reasonable’ way.”

The first patent identified for RankNet is “Method for scanning, analyzing and handling various kinds of digital information content” which mentions the neural net concept:

Computer-implemented methods are described for, first, characterizing a specific category of information content–pornography, for example–and then accurately identifying instances of that category of content within a real-time media stream, such as a web page, e-mail or other digital dataset. This content-recognition technology enables a new class of highly scalable applications to manage such content, including filtering, classifying, prioritizing, tracking, etc. An illustrative application of the invention is a software product for use in conjunction with web-browser client software for screening access to web pages that contain pornography or other potentially harmful or offensive content. A target attribute set of regular expression, such as natural language words and/or phrases, is formed by statistical analysis of a number of samples of datasets characterized as “containing”, and another set of samples characterized as “not containing”, the selected category of information content. This list of expressions is refined by applying correlation analysis to the samples or “training data”. Neural-network feed-forward techniques are then applied, again using a substantial training dataset, for adaptively assigning relative weights to each of the expressions in the target attribute set, thereby forming an awaited list that is highly predictive of the information content category of interest.

Subscribe to getButterfly Blog

Once a week or so we send an email with our best content. We never bug you, we just send you our latest piece of content.

If you found any value in this post, agree, disagree, or have anything to add - please do. I use comments as my #1 signal for what to write about. Read our comment policy before commenting! Comments such as "Thank you!", "Awesome!", "You're the man!" are either marked as spam or stripped from URL.

Leave a reply

Love programming?

Learn about the most amazing things. Get smarter everyday!