Technology behind Online Advertising

Online advertising becomes instantly popular after Google launched its search-based ads. Google’s success comes from its high traffic where tens of millions of people visit the site daily to search for information. It also comes from Google’s large ad inventory, where millions of advertisers come to bid. Most importantly, search-based ads connects advertiser with the user’s active intent. This leads to high click-through rate, thus offers better results for advertiser and higher revenue for Google.

Given Google’s success, other high-traffic websites also want to enter this market. The question is: How to duplicate Google’s results? A more technical question is: how to conduct ad auction like Google and achieve high revenue from advertising?

Google ads are placed against keyword search. As a user searches on certain keywords, both search results and ads are displayed, with ads on the side. However, there might be hundreds of advertisers bid for the same keywords such as “cars” and “lawyer”. The key question is how to list these ads in reasonable order.

Google lists ads based on two main parameters: 1. Bidding price;  2. Click-through rate. The position of an ad is the combination of these two, such as Bidding price x click-through rate. For example, if an ad has $1 bid price and 0.6 click-through rate, it will be placed higher than an ad with $5 bid price and 0.1 click-through rate. A bidder pays for the bidding price right below his own (a bidder of $5 pays for $4.9 submitted by another bidder). This is called Generalized Second Price Auction[1]. Economists spent a lot of time discussing how this second-price auction is better than the first-price auction by removing incentive of keeping changing bids. In reality, the bidders still constantly change their bids to minimize cost or increase click-through rate. For example, if a bidder observes that position 2 (from bidding $4) generates the same click-through rate as  position 1 (from bidding $5), the bidder would think position 1 and 2 is equivalent and will only bid to stay at position 2, and thus bids $4[2]. This is the reality in online advertisement auction. From Google’s viewpoint, it does not matter how often bidders adjust their price. All that matters is the total revenue from click-through rate and bidding price.

The dominant factor is click-through rate. If no user is interested in clicking the ad, it does not matter how high the bidding price is. How does a website increase click-through rate for the ads it shows? Making the ads relevant is very important. This comes back to the ranking of ads retrieved. Similar to the problem of search result ranking, more than 100 ads may be retrieved from a search of just 2 keywords. Ranking based on relevance is crucial to achieve high click-through rate.

Is ranking of keyword-related ads the same as the ranking of searched documents? It seems the problem of ad ranking is easier because the bidding price determines whether an ad will be placed high or low. Combined with the click-through rate, we have a neat formula for the position of each ad. Here we assume the click-through rate is known. But what if the ad is new and there is no data on its click-through rate? Potentially we can treat this new ad as having the highest possible click-through rate, and multiple it by the bidding price. After 1 or 2 days, the data come in and we can adjust its position to reflect its real click-through rate.

However, ad ranking can be harder than search result (document) when keywords searched by the user do not match any bids, even though those keywords may mean the same things as words bid by the advertisers. Thus Google encourages advertisers to bid for as many keywords as possible to increase the likelihood to be matched. Large E-commerce advertisers bid for close to a million keywords in order not to miss any potential buyer.

Even with the best effort from advertisers, there are still many keywords missed out from bids. This is because casual usage by users may differ from formal words envisioned by advertisers. It is hard to exhaust all possible ways people search or express their need. 

The search company (like Google) is in the unique position of understanding the user’s need. If it needs to return high-quality research results, it needs to find a way to map those search words to existing common words, and be able to retrieve documents accordingly. Thus search company can apply the same technology for improving search results to ad selection. Query expansion is one of such techniques developed.

Another challenging situation is placing ads on pages not related search results. For example, a user may browse certain books and read reviews. Ad display in such case is called Contextual Advertising. There is only context, no search words. One solution is mapping the page content to a set of keywords, the all ads can be displayed accordingly.

Online advertising is a fast growing domain where many high-traffic website wants to create their own ad engine. People not only search for web pages, they also come to E-commerce sites to search for products. They go to social networking site to update their profiles. They go to news website to post comments. There is big potential for generating ads for each of these activities.  

The technology for providing online advertising is closely tied to search relevance technology, but requires additional study on context, and on bidding behavior. At year 2010, this is a fertile ground for cutting-edge computer science research.

[1] Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz: “Internet Advertising and the Generalized Second-Price Auction: Selling Billions of Dollars Worth of Keywords”. American Economic Review 97(1), 2007 pp 242-259

[2] At position 1, our bidder needs to pay the next lower bid say $4.5. Then at position 2, it only needs to pay $3.5 of 3rd highest bidder


The Science of Ranking Search Results

When a user visits a website and types his search keywords, he expects to find relevant information immediately. We take such easy access to information for granted. Behind the scene, there is large amount of work going to retrieving the most relevant pages, and list them in order of relevance. This is called ranking of search results.

There is huge economic incentive for having good ranking. For a web search or social networking company, this means advertising revenue: The user stays on the website, get interested on related ad posting and click on them. For E-commerce company, this means a successful sale. If the user finds the product he is looking for, he will most likely buy it. In addition, for all these companies it is user experience issue. When the user feels happy after finding the information he needs, he is more likely to come back, more likely to explore the website, and more likely to buy something.

Given the importance of ranking of search results, large amount of research and engineering effort have poured into this field. The problem fits nicely with a computer field called Information Retrieval (IR). The field of information retrieval exists since 1978, when the first SIGIR (Special Interest Group on Information Retrieval) conference was held. In this first SIGIR conference[1], all the important concepts were there: Retrieval from database, from file systems, document retrieval, and relevance feedback from the user. But the booming of IR field only started after the Internet search engine (in 1993[2]) and E-commerce (in 1994[3]) came to existence. Big commercial interest and fast growing of online data fuel the development of this field.

The interface between a user and the website is a search box, where the user enters some keywords. Immediately, the system has to respond with some results. How do we present the most relevant information based on a few keywords? The premise is that there will be thousands of pages (documents) matched with simple string matching. After all, there are limited numbers of English words, but there are millions of products and billions of web pages. It is unavoidable that simple keyword match will give us thousands of equally possible results. But there is only limit space to display results on a web page, and the user is not willing to going through page after page to find the result they want (unless they desperately need the answer). Therefore ranking of search results becomes crucial.

Ranking of web pages is almost a solved problem, thanks to Google’s ingenious PageRank[4] algorithm and its continuous modification. It relies on a social phenomenon that an important page will have more other pages linking to it. Additional improvement to this algorithm is mostly done inside Google or a few other big search engine companies.

Ranking of product pages is still an ongoing task faced by most E-commerce companies. There is no inherent linking among these pages as they are all generated internally. Therefore PageRank cannot be used here. But there is other information we can explore

  1. Syntactic correlation between query and pages retrieved
  2. Search by other users on similar product and their clicks after ranking results were presented (confirmation from those users)
  3. The behavior (such as pages visited, visit sequence) of current user

Item 1 is measured by TF-IDF (Term frequency and Inverse Document frequency), and cosine similarity.

Item 2 is measured by correlation between a query and most frequently clicked pages. This can be built by mining user search logs.

Item 3 is measured by page correlation between visited pages and retrieved product pages. Potentially some classifier can be built from visited pages and visiting sequence mapped to a category (of query type), then such category can be directly applied to rank the resulting product pages. The classifier can be trained on all user log, where certain click stream leads to final product page click or product purchase. Since we are building classifier, other information about the user can be thrown into the model: gender, age group, geographical region, income level and so on. These user specific information are not easy to get and may not increase much prediction power.

The third type of ranking is advertisement display. Assume you give user correct results (optimally ranked) on your page, how would you display advertisement next to these results? This is similar to ranking product pages, in the sense there is no inherent PageRank among advertisement items. Therefore the methods applied to product page ranking can also apply to advertisement ranking, namely get similarity between query and ads page, get historical data on confirmed results, get mapping between user clicks and ads click. Advertisement brings some new dimension such as auction bids (price) they paid for these keywords (or simple stated preference on these keywords), and clickthrough on these ads. The goal of ads display is not highest relevancy, but highest total revenue from all the ads. In general, this assumes advertisers already take care of relevance issue.

Sometimes we have to match ads with non-search pages, for example, news page from user click, email page, social networking pages and so on. This is called “contextual advertisement”, where there are no explicit keywords for advertiser to bid on. The content provider has to decide which ad to display.

Giving the rapidly growing data on the Internet, the field of information retrieval will keep growing. It is an exciting time to be in this field and create new enabling technology.


[2] The first full-scale web search engine W3CCatalog was released in 1993.

[3] Amazon was founded in 1994.

[4] See Wikipedia entry on PageRank:

Fierce competition in mobile navigation

When you search for “navigator” or “navigation” in iPhone App store, more than 9 different apps from various companies would show up. At least 9 companies are competing in this market. All of them provide voice guide. Majority of them have real human voice, and a few low-priced one use Text to Speech (TTS). Below is a list of these 9 major competing GPS navigators on iPhones:

Navigation Application Price frequency of  charge Voice
VoxTrek $2.99 once TTS
MotionX GPS (by FullPower) $2.99 monthly  
MapQuest Navigator $3.99 monthly  
Gokivo $4.99 monthly  
AT&T Navigator (from Telenav) $9.99 monthly  
CoPilot Live $29.99 once  
NDrive $32.99 once TTS
TomTom $59.99 once  
Navigon $79.99 once  

As we can see from the above table, price for mobile navigation has dramatically decreased. MotionX’s $2.99 per month is less than one third of charge from AT&T offer. The pricing model is also moving toward a monthly subscription instead of one-time purchase. This is good for consumers who only spend a small amount of money to try out. It is good for app providers who can update the feature and data for the navigation. The monthly subscription model will be the future to stay. In addition, some providers offer 30-day free trial, which is very attractive for the uncertain buyer.

The graphical interface on these navigators varies widely. From color, road sign to direction arrow, each app is inventing its unique way of enticing the user. Some navigators is apparently over-loaded with signs and hard to interpret. Another crucial factor for user purchase is

starting page, before user reaches navigation page. This pages gives user the ability to change settings, add addresses and so on. If the starting page is too complex, the user would normally be overwhelmed and would go away.

It will take the next 1-2 years to see how the mobile navigation war plays out. Given that the voice-guided navigation is similar across different product (nice voice and relatively correct route guides, and similar speed of rerouting), the main differentiating factors will be price and user interface. As for navigation quality, it is hard to judge and may not vary much among these apps. For consumers, this is excellent news. With such affordable price, it won’t be long when everyone’s cell phone is equipped with a navigator

Blog on Machine Intelligence

Artificial intelligence is my great love. It is time for me to write about it daily. I am starting a new blog site: Machine Intelligence.

The Future of Computing

In the Michigan CSE 50-year anniversary celebration, the future of computing was discussed by the speakers. There are 4 major themes.

  1. Parallel programming
  2.               The driving force are:

    1)      The ending of Morse’ law, the new ear of multi-core computers.
    2)      High performance computing

  3. Reduction of power consumption and heat from computer processors.
  4. New computer architecture for heat, power consumption is under research and development.

  5. Large data handling: storage and data mining
  6.  Yahoo! processes 25 terabytes data each day.  In financial sector, Fidelity handles 140,000 transactions each day. This requires massive data storage, processing and data mining.

  7. Intelligence in computer program (Automation)
  8. 1)      Robots (Intelligence in hardware): walking robots, flying airplane
           Surgery robots, hospital sensors that detects hospital bed’s height and prevents patients from falling. Household robots that do vacuum cleaning, kitchen cleaning, entertaining.
    2)      Softbot (intelligent program): financial transaction
    3)      Decision support
    (1)   Security risk detection (network)
    (2)   Targeted web marking (Google and Yahoo display targeted ads)
    (3)   Business data mining: analyzing customer behavior
    (4)   Medical diagnosis, patient support, disease monitoring

The synthetic arm

Today I listened to the talk by Prof. Solzbacher from University of Utah on building biomedical devices. The small chips implanted in human brain send signals to the articial arms which uses wireless receivers. With this technology, an amputee can have command his synthetic arm, and a paralyzed person can command his hand. How remarkable our technology has become.
Solzbacher illustrated how implantable microchips are designed, constructured, and refined. How they have to fit into certain size requirement, and safety requirement. It is a lot of engineering.

A synthetic arm requires microchips, batteries that last long, wireless technology that sends neural signal out, a small-size computer that receives the signals and generates patterns and commands. All of these technologies have to come together to create a synthetic arm. In theory, each of these components can be done. But combining them and making them work in a practical environment poses new challenge. It is this combination that is most powerful.

Autonomous vehicles, robots and helicopters

In a talk at Stanford today, a graduate student presented his work on autonomous cars, robots, and helicopters. It’s amazing to see the four-leg robot walk over uneven surface without falling, and the unmanned helicopter doing in-air flipping and circling acts.

While the topic of the talk was about reinforcement learning, the real issue is dynamic decisions for a sequence of actions. A helicopter flies through space, but also over time. A robot has sequence of actions to take. It needs to react optimally to the spatial environment, but also take into account of sequential results of each step. Through learning (training with space features), and optimization over timed sequence, these robots (including vehicle and helicopter) perform their task very well.

How much is the success due to reinforcement learning, and how much due to the actual tedious engineering on the robots? I guess the later part accounts for 80%. Of course, there exists a way to scientifically measure the ration of contribution. For the practical purpose, people are much more interested in the walking and flying robots, than the abstract algorithm.

Engineeing is the mundane work to make something happen. I used to stay away from engineering, but now I embrace it. Only by being hands-on, I truly understand the real problems. Of course there exists a balance of utilizing other forces beyond one’s own. Then one has to build good alliance. Regardless how we do it, the goal is to make things happen, and happen fast.

Statistical Language Modeling

The problem with grammar-based approach

In a dialog system, a user’s response can be highly variable and hard to predict. It can  also contain disfluencies (such as “um” and “uh”), and ungrammatical sentences. In addition, the grammar for this application may need to fill several NL slots from only one utterance. This happens for open-ended prompt, but also for some restrictive prompts.

One solution is to write a grammar that lets callers say any arbitrary phrase within the domain of the task, and still achieves high accuracy.  Coming up with such a set of grammar rules can be tedious. In most cases, the out-of-grammar rate obtained with hand-crafted rules is very high and any attempt to add more rules often leads to poor in-grammar recognition accuracy.

The SLM approach

A statistical language model (SLM) assigns probabilities to sequences of words. Typically SLM is based on n-gram model, where the probability of a word depends on the previous N-1 words. In a unigram, the probability of a word is not affected by the context preceding that word. A bigram model is a model in which the probability of a word changes according to the word that precedes it. In trigram model, the probability of a word changes according to the 2 words that precedes it.

An SLM grammar is trained from a set of examples that models the user’s speech. Since the probability assigned to a phrase depends on the context and is driven by data, an SLM provides a grammar in which more plausible phrases are given higher probabilities.

SLMs are useful for recognizing free-style speech, especially when the out-of-grammar rate is high.

SLM in practice

A SLM model is used by a large-scale system Commute-UX (Tashev et al., 2007) that provides location-based information to in-car commuters. The system provides information about traffic, gas prices, and weather, based on real-time data obtained via web services.

In this system, a statistical LM is trained from the slots (e.g., city name, road name) and phrases learned from sample queries (e.g. “the closest gas station in <City>”) and augments it with a filler word N-gram (Yu, et al., 2006) to model the insignificant words.

Another application is in Schooten et al.’s Question-answering dialog system, where the LM is trained on user training corpus, newspapers text from 1988 to today, manual transcriptions of radio and TV news shows, and 1,100 questions containing 13,000 words from quiz questions. This leads to 0.4% out-of-vocabulary rate on the test corpus, which suggests very high coverage.

Research problems:

The language model (LM) is highly task-dependent and its quality usually determines the recognition accuracy of the speech recognizer. The design of the LM requires several steps

The filler part of the LM absorbs hesitations, by-talk, and other non-information bearing words unseen in the training sentences. The filler word N-gram is pruned from a generic dictation LM.


Ivan Tashev, Michael Seltzer, Yun-Cheng Ju, Dong Yu and Alex Acero, 2007. Commute UX: Telephone Dialog System for Location-based Services, Proceedings of the 8th SIGDial Workshop on Discourse and Dialogue, Antwerp, Belgium.

Dong Yu, Yun-Cheng Ju, Ye-Yi Wang, and Alex Acero. 2006. N-gram based filler model for robust grammar authoring. Proc. ICASSP, Toulouse, France.

Boris van Schooten, Sophie Rosset, Olivier Galibert, Aurélien Max, Rieks op den Akker,  and Gabriel Illouz. “Handling Speech Input in the Ritel QA Dialogue System”, Proceedings of the 8th Annual Conference of the International Communication Association (Interspeech 2007). Antwerp, Belgium, August, 2007

Geoffrey Zweig, Patrick Nguyen, Yun-Cheng Ju, Ye-Yi Wang, Dong Yu, and Alex Acero. “The Voice-Rate Dialog System for Consumer Ratings”, Proceedings of the 8th Annual Conference of the International Communication Association (Interspeech 2007). Antwerp, Belgium, August, 2007

The Unique Challenge of Spoken Dialog Systems

While dialog systems have been studied since the early stage of AI (1960s), the spoken dialog systems (SDS) received attention only in the last decade. The commercialization of speech recognition software and the application of it to stock market (Nuance’s launch in 1999) and call center, makes spoken dialog systems a commercial reality.

The progress of spoken dialog system hit a major bottleneck: speech recognition. This makes spoken dialog systems  limited to applications with very simple vocabularies. These applications include: directory assistance (business name), voice dialing from cell phone (person names), call center service (credit card account, airline reservation, bus schedules), and medical record retrieval. More sophisticated dialog systems that are realized in text but have not been realized in spoken system:

1. Chatbot (require a large general-purpose dictionary and grammar)

2. Web search (allow the user say any word, which requires a large dictionary too)

3. Search on audio book or device

4. Question-answering system (similar requirement for large-vocabulary dictionary)

A spoken dialog system requires two things to achieve good accuracy: 1. large sound data to train acoustic model. 2. Large utterance text data to train (or write) a good grammar

Building a good spoken dialog system requires ongoing data collection, feeding back into the model, and re-training. Microsoft Tellme’s highly accurate 411 service benefits from its large data. This does not mean that a designer should wait for such data available before building the system. A designer can do the following in face of lack of data:

  1. Use sound data from other tasks (basic user utterance should be similar)
  2. Simulate the text data by feeding some data-generating rules
  3. Design a quick re-training module that immediately identify mistake when facing the real user.

Spoken dialog systems are most suited for situations where using text-based system is not an option: driving a car, using a mobile phone (even through many have text capabilities), working in a surgery, having physical disability (without hands, or eyesight), interacting with a robot (with no remote device at hand). It also works in situations where interactive conversation is needed: report on credit card problem; technical support; reminding system for elderly patients, market survey calls. Spoken language has it many advantages over text: it’s immediate, interactive and personal. As SDS becomes more sophisticated and successful, we will see more adoption of SDS in wider domain. As robots enter normal households, educational games reach a higher level, computer games become more interactive, and customer service becomes more automated, we will see growing need for SDS. 

In which cases we don’t need speech? It’s when other inputs dominate: text or touch. These input modalities has almost 100 accuracy. Thus web surfing, search, reading, touch to play can do without speech. Speech is also not welcome in situations where people want privacy: at night, in a crowd, or in a meeting. Even you have cell phone with you, you would rather enter through keypad than through voice.

The talking cure

Human beings are inherently social beings. We are most happy when we are connected with others. Isolation is a form of punishment, and it is the cause of many forms of unhappiness. When we are in an emotional turmoil, talking provides good relief. 

Before we explore in depth on the function of talking, let’s look at other solutions to emotion besides talking. When you feel bad, you can watch a movie or TV show and temporarily forget the pain. But the same feeling is still there. You can distract yourself by doing something, but that does not resolve your issues.

Talking is communicating your thoughts. It allows you to put thoughts in words. It allows you to sort through your raw emotion. During conversation, you gradually come to grasp what you are truly feeing, and why you feel that way. A good friend will listen patiently and ask pointed questions. They may also offer personal advice based on their experience. Such advice can be very helpful. It allows you to see a solution, and get out of the circle of chewing on your bad emotion.

Therapists offer a similar function. They listen to you, and give you some advice in the end. During the therapy session, it is the patient who does most of the talking. Talking by itself is therapeutic. After such a session, the patient will feel a little better, at least temporarily. This is because a therapy session satisfies a fundamental need: getting connected by expressing oneself.

Talking to a computer is very similar to talking to a stranger, or chatting with an anonymous person on the Internet.  It gives you an opportunity to express yourself. You spell out your innermost thought by words. By doing so, you can view your own emotion in a more objective way. A good computer therapist will ask you questions that allow you to fully express yourself. It will ask about your actions and solutions. Ultimately, a bad emotion can be resolved by only by taking some actions or changing some views. By working out a solution, a person feels more in control and can immediately feel better.

Therefore I believe that millions of people can benefit from our computer therapist. Her deft conversation will make you feel soothed and engaged. Even though you know this is not a person, the engagement through conversation allows you to express yourself. You are being helped along the way by discovering solutions. This process — talking and discovering solutions – empowers you and connects you. You are no longer fighting in your head against those bad feelings. There are expressed out as words. How many people would love to have such opportunities?

Many people may be intimidated or embarrassed by seeing a therapist. But talking to a computer therapist makes them feel safe. Many people may not be able to afford the expensive therapy sessions. But talking to a computer therapist costs a tiny amount or nothing. In the desperate search for solutions, for emotional relief and for connection, people will find a computer therapist can be their best friend.