Chorus
Chorus
Sitemap
Chorus
Twitter Linkedin
Chorus
Chorus
Chorus
Chorus
Chorus
Chorus
 

CHORUS+ Network of Audio-Visual Media Search

AVmediasearch.eu is the hub where you will find all useful information related to events, technologies, resources… linked to audio-visual search.


About Audio-Visual search

Chorus
Lines
Chorus Mobile Image Search
Chorus
Chorus Techno-economic and socio-economic analysis of mobile search
Chorus
Chorus Socio-Economic Aspects of Healthcare and Mobile Search
Chorus
Chorus Search Computing: Business Areas, Research and Socio-Economic Challenges
Chorus
Chorus Recommendations about benchmarking campaigns as a tool to foster multimedia search technology transfer at the European level

Mobile Image Search

Chorus

Author(s) :

Spiros Nikolopoulos, Chatzilari Elisavet, Yiannis Kompatsiaris – CERTH-IT

Stavri Nikolov – JRC

with the contributions of Henning Müller - HES-SO  and Xavier Le Bourdon – JCP

Executive Summary

The objective of this document is to survey the existing applications for visual-based mobile image search and investigate their technical and socio-economic aspects. To this end, we go through the basic architecture of these applications and list the utilized technologies. Moreover, we provide an overview of the user needs and usage patterns of such applications, as well as a brief description of the business models that are currently employed. The most significant part of the document is devoted on presenting 12 of such mobile image search applications, where for each one we provide brief background information, we describe the functionality of the offered service and the domain of target users, and we discuss the adopted business model. In concluding, we compare the examined applications in terms of the adopted technology, target users and business model and identify the most important trends in the sector both from a technological and socio-economic perspective.

1.   Introduction

The mobile search industry is almost as old as the telecom industry and its primal objective is to enable people to find location-based services by entering a word or phrase on their phone. An example of usage would be a person looking for a local hotel after a tiring journey or taxi company after a night out. The services can also come with a map and directions to help the user. With the years, mobile content has changed its media direction towards mobile multimedia. Starting with keyword-based search and going through the step of voice search, now the end user is offered the functionality to capture a photo in his cell phone and find relevant information on the Internet. Visual-based search works like traditional search engines but without having to type any text or go through complicated menus. Instead, users simply turn their phone camera towards the item of interest and the mobile search engine returns relevant content based on its interpretation of the user’s visual query. Despite the many similarities, mobile search is not just a simple shift of PC web search to mobile equipment since it is connected to specialized segments of mobile broadband and mobile content, both of which have been fast-paced evolving recently [8]

High-end mobile phones have developed into capable computational devices equipped with high-quality color displays, high resolution digital cameras, and real-time hardware-accelerated 3D graphics. They can also exchange information over broadband data connections, sense location using GPS, and sense the direction using accelerometers and an electronic compass. All these functionalities enabled a new class of augmented reality applications which use the phone camera to initiate search queries about objects in visual proximity to the user. Pointing with a camera provides a natural way of indicating one’s interest and browsing information available at a particular location. Once the system recognizes the user’s target it can provide further information (the menu or customer ratings of a restaurant) or services (reserve a table and invite friends for a dinner) [1], allowing to link between the physical and the digital world [2]. In this context, visual search has been extensively researched in recent years [3], integrating mobile augmented reality [4] and outdoor coordinate systems [1] with visual search technology. 

One advantage of mobile visual search is that it is way faster than conventional searching methods. The reason for this is that even highly typists who manage up to 900 characters per minute on the PC keyboard start working at a very low pace when it comes to cramped keyboards and touch screens on mobile phones. Typing with all fingers becomes almost impossible, but capturing and sending an image just takes a few seconds. Moreover, people may prefer snapping a photo than using words to describe its content, especially when the object of interest is difficult to describe in words, for example when pointing to an unknown edifice. Thus, a mobile search engine that will allow visual search might have easier time finding an audience.

According to [7], in 2006, smartphones accounted only for 6.9% of the total market, while in 2007 the market segment reached 10.6%. The total annual sales of mobile devices reached 1,275 million units in 2008, with 71% of them sold with data facilities, of which 15% (of total sales) correspond to smartphones. In Europe, 280 million units were sold in 2008, of which 19.3% where smartphones and 65.5% enhanced devices. It is evident that the camera-enhanced, hand-held devices are being spread at a very fast pace.

Moreover, according to a leading market research firm eMarketer, by 2011, mobile search is expected to account for around $715 million. According to a recent study (April 2011) from Google conducted by Ipsos OTX, an independent market research firm, among 5,013 US adult smartphone Internet users at the end of 2010, ”71% of smartphone users search because of an ad they’ve seen either online or offline; 82% of smartphone users notice mobile ads, 74% of smartphone shoppers make a purchase as a result of using their smartphones to help with shopping, and 88% of  those who look for local information on their smartphones take action within a day.” Earlier this year, Performics predicted that mobile search would soon reach 10 percent of all the search impressions its clients were seeing. In the end of April 2011 the firm said that “mobile impressions accounted for 10.2 percent of all paid search impressions (desktop + mobile).”

These and other recent studies clearly show signs that mobile search is moving mainstream and gaining momentum. Unfortunately, as far as we are aware, there are no figures about the size and dynamics of the image-based mobile search segment of the market. However, we can reasonably expect that the segment of mobile image search will scale proportionally to mobile search, creating new opportunities and offers. This is also advocated by the fact that major players in the mobile communication and search industry like Nokia and Google, are investing a lot of effort in the mobile image search concept and are aggressively trying to create applications and relationships in order to take advantage of the mobile ad market.

2.    Architecture & technologies

The majority of existing mobile image search applications employs a client/server architecture with a data pool lying behind the server (Figure 1). Cell phones act as clients that capture the object of interest and send queries to the server. The server on the other hand is responsible for analyzing the image, identifying its content, retrieving relevant information from the data information pool and sending it back to the client. Different approaches have been adopted for implementing each of the individual steps required by the aforementioned process. Below we briefly describe each of these individual steps and list the available technological solutions.

Figure 1: Client/Server architecture for mobile image search

Client/server communication: The communication between the client and the server is an important feature that differentiates the functionality of the existing mobile image search applications. By communication we refer to the client’s need for transmitting to the server the image depicting the object of interest or just a representative subset of it, as well as the server’s need for sending back to the client the related information. Due to the low speed restrictions of GSM/GPRS networks and the limited capabilities of older cell phones, the first mobile image search applications relied on the use of MMS (multimedia messaging) or e-mail services to establish the necessary communication. More specifically, after capturing the object of interest the user needs to send the image file via an MMS or e-mail service. Then, after processing the image and retrieving the relevant information the server responds to the client’s query by enclosing the related information in an SMS or e-mail that is send back to the user’s cell phone. It is evident that when operating on a low speed network, it is impossible for the mobile image search application to perform the aforementioned process in real time and support augmented reality functionalities. On the other hand, building on the advances of broadband networks and the new features offered by smart phones, the most recent mobile image search applications rely on the use of Wifi or 3G networks. In this case the client\\\\server communication is transparent to the user since all necessary communication actions are handled by the application. The network speed is sufficient for transmitting large blobs (binary large objects) like images, as well as receiving the necessary information, all within a few seconds. This is why some of the existing mobile image search applications have already started to offer augmented reality views through the employment of real-time mobile image search, as discussed below.     

Client Interface: Another important feature of mobile image search applications is the experience offered through the client interface. The particularity of each application is basically focused on the mechanism used to capture and sent the visual content to the server, as well as the mechanism used to display the received response. In this context we can distinguish between three different types of user experience: a) Menu mediated where the user needs to switch between different application menus (e.g., MMS, SMS or e-mail menu) for both sending the captured image and viewing the received response, b) Snap-based where a single interface is used both for sending the image (i.e., by pressing a button) and viewing the received response (i.e., usually through a dedicated place in the screen that is reserved for this purpose), c) Real-time where the user is offered an augmented reality experience with meta-tag popping up as he turns his camera phone towards the object of interest. Although the most intriguing, the Real-time experience still faces some important technological challenges which is why the vast majority of the existing mobile image search applications employs a Snap-based approach.

Image processing: Another important attribute that differentiates the existing mobile image search applications is the place where the captured image is processed. In the case of client-side processing the captured image is processed by the smart phone processor and a set of representative features is extracted [1]. These features are subsequently transmitted to the server for retrieving the relevant information. Although reducing the network load and speeding up the whole process, client-side processing is only feasible when the smart phone is equipped with enough processing power to extract the necessary features. On the other hand, in the case of server-side processing, the full image file is transmitted to the server that takes care of extracting the representative features and retrieving the relevant information [5]. The server-side solution removes the processing burden from the client device at the expense of increasing the network load and response latency. 

Image content recognition: Although it is common practice among the companies not to disclose any details concerning the utilized technology, we can safely assume that image content identification is accomplished using one or more of the following approaches: a) Nearest neighbor based approach. Using content-based image retrieval techniques the query image is matched with one or more very similar images with known content. Then, based on the assumption that very similar images depict the same content, the information returned as a response to the client’s request is the information associated with the matched image(s). Nearest neighbor is the most scalable approach for image content identification and is currently adopted by the majority of existing applications. However, it requires the indexing of a significantly large number of images with known content before starting to produce meaningful results. b) Object recognition based approach. Using the principles of pattern recognition a model is learned for each object. The query image is examined by all available models and the objects with higher confidence are considered to be depicted in the image. Given the high training cost for learning the object recognition models, this approach is only applicable in a constraint domain with a limited number of objects. c) Watermarking based approach. Using encryption techniques a content identifier is embedded into the digital image before it is made public. Then, when the server receives a watermarked image as query a watermark detection mechanism is used to decrypt the content identifier and retrieve the relevant information from a database. The major drawback of this approach is that content identification cannot be performed on images that have not been watermarked, which makes it applicable only in cases with full control over the distributed images. d)Human computation based approach. Some of the existing applications for mobile image search use human annotators to facilitate the identification of image content when the automatic detection mechanisms fail. Although interesting, it is doubtful whether the number of willing annotators will be enough to cover the rapidly increasing needs of mobile image search. 

3.    User needs and usage

The mobile phone is one of the most commonly used and spread ICT device world-wide. It offers directness and a level of personalization that no other consumer device can match. Primarily introduced for voice (and later text) communication, mobile phones have turned into a full-future multimedia device in less than 15 years.

Mobile image search constitutes one of the most attractive services offered by smarthphones that users primarily use while they are on the move. They use it when they don’t have access to a PC (e.g., waiting in the airport, bus stop, etc.) or when it is more convenient using their phone (e.g., it would take longer to switch on their PC or move to the next room). In addition, mobile image search is particularly useful when the user’s location information is important for retrieving relevant results. The basic motivation for using mobile image search is the ability of visual content to transfer rich semantic content that is either too complicated or too ambiguous to be expressed with words. Indeed, if the user is not sure how to describe something with words it may be easier to search with a picture. Moreover, it is also common to use image search services with embedded OCR capabilities for translating foreign billboards and traffic signs. Finally, the ability of mobile image search to turn the world around us into semantic links (pointing to news, websites, special offers, etc.) at an easy of a photo snap, is what makes this service way more attractive than text or voice-based search that are more demanding from the perspective of user input.

Mobile image search still occupies a small fraction of the issued queries but this is changing very rapidly. As shown in Figure 2, although image-based interfaces are currently not considered as one of the critical components of mobile search, the situation is expected to change in the near future with image-based interfaces acquiring an equally important role with text-based interfaces.This change will be further boosted by the technological advances in the relevant research areas that are expected to offer more robust and scalable services. Indeed, it is true that the existing mobile image search applications are usually efficient in delivering meaningful responses when operating on restricted domains. As a consequence, the users are discouraged from the obtained responses when they test these applications outside the supported domains. However, this is expected to change as the technology gets more and more efficient.

Finally, given the widespread use of mobile photo-sharing services and the rapid growth of augmented reality applications, mobile image search is expected to become the core functionality of many future applications such as image-based browsing of personal photo archives, image recognition based augmented reality, etc. Indeed, as consumers generate an increasing number of digital multimedia content, finding a specific image, audio clip or video becomes a non-trivial task. Mobile device users typically browse their personal multimedia libraries on standard mobile devices by scrolling through image thumbnails or by manually organizing them on folders and browsing through the folders. As a consequence, rich multimedia content is lost in the users\\\' personal repositories due to the lack of efficient and effective tools for tagging and searching the content. Motivated by this fact current literature has already started to incorporate the technological advances of image search into a mobile environment [10], [11].

Figure 2: Expected interface usefulness for mobile image search in 2012/2015 [9]

4.    Business models

Since mobile image search is still in its early stages a business model that could render sustainable this type of services has not been established yet. The currently existing applications can be mostly considered to running though an experimentation phase both in terms of the employed technologies as well as the potential business opportunities. Attempting an overview of the currently existing approaches we can distinguish between the models of intangible and monetary benefits [6].

According to the Intangible benefits model, free services are provided to users in exchange of their attention, loyalty and information. Then, the company can “sell” the attention, loyalty and information of users in exchange for money. For instance, mobile image search can be used by a company as an attractive application for engaging more customers to their client base. Profit does not derive directly from the use of mobile image search but from attracting more customers to use a paid service that incorporates the search functionality as an additional feature. This model is typically followed by the mobile telecommunication operators (e.g., Vodafone) and mobile vendors (e.g., Nokia). An intangible benefit model is also followed by many software companies that use mobile image search to advertise their technological competence and expertise for attracting potential customers.

On the other hand the monetary benefits model comes up in the majority of relationships where a transaction or a subscription process takes place and customers are required to pay in exchanged of services of goods. This model is usually implemented through fixed transaction feesreferral fees, etc. Within mobile image search one aspect of the monetary benefit model is primarily based on advertising that uses visual search to promote services and goods. In this case advertisements related to the content of the query image are displayed to the user’s cell phones, in way similar to Google Adds. Another aspect of the monetary benefits model is based on charging the access to Software-as-a-Service (SaaS)-based interfaces. One typical example of this model is the SmartAds service offered by Kooaba. With this service Kooaba allows its customers to turn their print ads into clickable links prompting the readers to acquire more information about the product. The SmartAds functionality consists of a) Query API that is offered for free (with a request limit per day)  and allows issuing requests to the existing database of objects, and b) Data API that requires an account and allows customers to upload their own print ads into the existing database. A revenue stream is generated by charging the use of the Data API, or the Query API with no request limit. A similar approach is followed by IQ Engines where the customers are offered the possibility of uploading their own photos in the IQ Engine database and issuing queries though their mobile application using a Query API. In the same spirit the TinEye Commercial API allows users to issue queries on the Tin Eye database after purchasing a search bundle. Finally, another aspect of the monetary benefits model, that is less flexible than SaaS, is based on signing explicit contracts with the marketers in the context of a product promotion campaign such as the Nokia interactive campaigns launched using Snap2win, or the partnership between LinkMe Mobile and the Guthy-Renker Japan‘s Proactiv product line. In this case the mobile image search company collaborates with the advertising agency in order to set-up a campaign that uses the core functionality of mobile image search to attract the users’ interest. 

Thus, we can see that there are currently three business models adopted by the mobile image search companies. An advertising-based model that relies on the relevance between the query image and the registered adds, the Software-as-a-Service (SaaS)-based model that relies on charging the customer for extending and querying the database of objects, and the contract-based model where an advertising agency employs the core mobile image search functionality in the context of an advertising campaign. Our estimate is that even if the contract- and SaaS-based models are currently very attractive for the companies offering mobile image search services, this is primarily because image recognition technology is still rather immature and can only function robustly in restricted domains. Thus, the customer has no option but to pay for making its content searchable, since the general purpose search engines are still inadequate to satisfy the requirements. However, with the advancement of image recognition technology we anticipate that the advertising-based business model will dominate the mobile image search market.

5.    Case Studies

5.1    Kooaba

Kooaba (www.kooaba.com) was founded in November 2006 as a spin-off company from Swiss Federal Institute of Technology (ETH) in Zurich, Switzerland, with the aim to exploit the information captured in images using sophisticated image recognition technology. Having strong ties to the Computer Vision Lab at ETH Zurich, Kooaba’s products are updated to incorporate the latest research results.

The image recognition service offered from Kooaba receives a snapped image as query and displays related information, further links and available files. After installing the application the user can click pictures from books, CDs, DVDs, games, or film posters and retrieve relevant information. The complete Kooaba system is composed of three key ingredients: image recognition technology, content delivery to the user, and automatic crawling of a large reference database of objects. In order to facilitate the image recognition functionality, Kooaba has developed analgorithm which compares and classifies the image with similar images present in its database. It is not necessary that the subject is completely enclosed by the frame of the photo; in fact it can also be shot from various angles. In addition to this the service also recognizes posters which are partially obscured by a person walking by in front of them (occlusion), photographed sub-sections (cropping), or even separate a disturbing background from the actual subject (clutter). The algorithms determine discriminant regions in images and are able to subsequently match them to other views of the same object. By relying on regions in the image rather than on the whole image, the algorithms are able to identify the correct match even under challenging conditions. The necessary images required for the comparison are stored on the cloud infrastructure provided by online behemoth Amazon: the Amazon Elastic Compute Cloud (Amazon EC2). Finally, in order for Kooaba to work effectively, a fast Internet connection is required, preferably Wi-Fi. If the mobile handset uses a slow GPRS connection, the searching speed also suffers.

Kooaba focus is on entertainment and has already accumulated millions of images of products, especially the covers of books and CDs. The most important images to archive are the posters of current box office hits. A characteristic example of how Kooaba prioritizes the accumulated content is that in the USA, Kooaba immediately found the recently published Blu-ray disc of “Harry Potter and the Half Blood Print“. However, a random sampling with an independent film “Lord of the Undead“ by Timo Rose, which is not known to the general public, was not recognized.

The primer business model adopted by Kooaba is a SaaS-based model where the customer relies of the image search functionality to turn print ads into clickable links. This functionality comes with the name SmartAds and consists of a) Query API that is offered for free (with a request limit per day)  and allows issuing requests to the existing database of objects, and b) Data API that requires an account and allows customers to upload their own print ads into the existing database. A revenue stream is generated by charging the use of the Data API, or the use of the Query API with no request limit.

5.2    IQ Engines (oMoby)   

oMoby (www.omoby.com) is a service offered by IQ Engines (www.iqengines.com) which is a company founded as a collaboration of computer neuroscientists at UC Berkeley and UC Davis. The goal of IQ Engines is to bring advances in biological vision models to practical image and video search, using algorithms that are hierarchical and massively parallel. The advances in image and video search are delivered on a web server platform that can be used to solve image and video search applications. 

 oMoby is essentially a shopping tool that can help the user find information about products by snapping a photo. More specifically, the user takes a picture of any product that he is interested in searching or buying and oMoby provides links to retailers offering product information, reviews, prices, and more. oMoby provides also the functionality to email the search results so as to view them from a web browser. The image recognition functionality of oMoby relies on the technology developed by IQ Engines to identify and label photos. By recognizing their key content, images are used to power searches for relevant information, retailers and advertisers. One interesting feature of oMoby that differentiates it from the other mobile image search applications is that apart from computer vision it uses also crowdsourcing to recognize images. Thus, when a query image comes in, the computer vision system is employed to recognize its content; if that doesn\\\'t work, oMoby turns to a network of humans to help with the recognition. The mechanism used for supporting this crowdsourcing functionality is not disclosed by the company. 

oMoby focus is on purchasing and on-demand advertising. According to Hubspot estimates, 40% of consumers compare prices using their mobile devices while in-store shopping, 20% of adult smartphone owners have used their device to make a purchase in the past 30 days and 84% of 25-34 year olds have left a website because of intrusive or irrelevant advertising.

oMoby aims to facilitate the mobile device owners that turn to their smartphone to “ease” the process of gathering information and making an informed purchase. Moreover, the visual search functionality offered by oMoby aims at enabling retailers to identify and market to a consumer who has expressed an interest in a specific product and to do so with the permission of the user.

The business model adopted by oMoby is also a SaaS-based model that is implemented through four APIs (i.e., Query API, Update API, Training API, Result API). The customers are offered the possibility of uploading their own photos in the IQ Engines database and issuing queries though their mobile application using a Query API. In this way the customer can launch his own advertising campaign by providing information to the consumer while inside the shop. Concerning pricing policy, IQ Engines charges on the basis of the queries that are issued to their database as well as using a monthly fee. One indicative example of a query-based pricing policy consists in offering the first 1000 image queries free of charge, with a query charge of 7 cents for all subsequent queries. Depending on the customer needs IQEngine offers 5 different policy plans for pricing.

5.3    Mobile Acuity

Mobile Acuity (www.mobileacuity.com) is a technology company enabling interactive brand marketing campaigns using their Visual Interactivity™ platform. The company was incorporated in January 2006 and has received seed capital from investors including Imprimatur Capital and Scottish Enterprise. The company is a spinout of University of Edinburgh, UK. 

Mobile Acuity aims at using the camera phone as an innovative mobile marketing tool and as a new way to search for digital content by pointing and clicking. Their snap2win™ mobile marketing platform allows consumers to connect with a brand by pointing their camera at an advertisement or product. Snap2win™ utilises Mobile Acuity’s mobile visual recognition technology which identifies known objects in a consumer’s picture message or video stream. Mobile acuity offers a range of products that can identify media, products and places captured by consumers on their camera phone. Image Zoning determines which section of an image the consumer is pointing at and returns an appropriate response. Face Finder can accurately extract faces from an image and reuse them within the response returned to the user. Virtual Blue-screening can accurately extract the foreground of an image and reuse it within the response returned to the user. Colour ID analyses the dominant colours present in an image and uses them to create a customized response. 

The main focus of Mobile Acuity is to help global brands to reach their audiences in new ways. Towards this direction Mobile Acuity has been used by advertising agencies who aim at matching the innovative solutions offered by their company with the needs of their brand clients. Indeed, in 2008 Edinburgh Mobile Acuity announced that their visual recognition platform has enabled Nokia Interactive, to win two out of the twelve categories at this year’s mobile marketing awards. The winning campaign uses the Mobile Acuity’s powerful visual recognition platform to allow consumers to take football penalties with their camera phone by pointing at a printed poster of a soccer goal.

Concerning the employed business model, Snap2win™ was launched in 2007 and has quickly been adopted by a number of leading global brands as part of their mobile marketing strategy. To facilitate this task, Mobile Acuity’s platform allows visual recognition campaigns to be created using the web based campaign creation tool. Revenue streams are generated when Mobile Acuity is employed by some advertising agency or global brand to set up their campaign.

5.4    LinkMe Mobile

Founded in 2001 by engineers from NASA's Jet Propulsion Laboratory, LinkMe Mobile (http://www.linkmemobile.com/) aims at using mobile search technology to redefine how brands and advertisers connect with consumers via mobile phone. LinkMe Mobile (formerly SnapNow U.S.) is a consumer-centric company that connects consumers to their world, and clients to consumers’ interests and likes, via patented visual, voice, and audio recognition technology. The company is based in Southern California, with offices in Tokyo, London, and Toronto.

LinkMe Mobile services aim to connect consumers to their world, and clients to consumers’ interests and likes. The goal of LinkMe Mobile’s technology is to turn images into hyperlinks, encouraging consumers to interact with brands, advertisements, or products. Consumers can also link to or call up additional related content by doing a voice search and/or sending to the servers an audio capture that triggers a tailored response. Through their mobile phone, LinkMe Mobile takes consumers to a mobile store, to a brand site, or anywhere else on the mobile web that has been linked with a particular campaign. One important characteristic of LinkMe Mobile that differentiate it from the other mobile image search companies is the combination of visual, voice, and audio recognition technology.

Using LinkMe Mobile the user can snap a picture of an image with a camera phone, send it via MMS or email, and receive a response. The dedicated application LinkMe Mobile App™, streamlines the process to three clicks of a button. Moreover, by employing the MeMeMe voice recognition technology LinkMe Mobile extends the platform by providing a voice interface for mobile applications and users. Similarly, by also employing audio recognition technology, LinkMe Mobile allows its users to connect through audio capture such as radio commercials, songs, and other audio content.

LinkMe Mobile works mainly with Brands, Agencies, Content Owners, Publishers, Retailers, and Carriers in creating new communication opportunities with consumers. Clients are provided with the ability to facilitate targeted, interactive, and personal brand conversations with consumers. Revenue streams are generated by executing the contracts offered on the basis of an advertising campaign, i.e. LinkMe Mobile currently has a contract-based business model.  

5.5    SnapTell

Founded in 2006, the aim of SnapTell (www.snaptell.com) is to revolutionizing the way consumers and marketers connect. Using a camera phone and image recognition technology, users can access the requested information, allowing marketers to create high-impact campaigns. SnapTell was acquired in 2009 by A9.com, a subsidiary of Amazon.com, with the aim to integrate the mobile image search technology into the Amazon shopping experience.

The service offered by SnapTell provides a mobile marketing solution, which can be used by marketers to deploy mobile marketing campaigns. It enables consumers to access marketing content and information on the go. For instance, SnapTell can be used to produce a list of vendors that sell the product of interest, including prices and links to the corresponding web sites. It also includes links to several other sites that have related content. The application can also take advantage of the location information offered by smartphones to look for nearby stores that sell the product. SnapTell service is motivated by the fact that consumers will not tolerate mobile marketing spam and offers an opt-in mobile marketing solution that allows consumers to define their mobile marketing experience.

The technology adopted by SnapTell encounters the problem of image recognition as a problem of matching a query image against a database of images. In order to do that, SnapTell employs an algorithm for image matching, called "Accumulated Signed Gradient" (ASG), that is able to work with databases containing millions of images. This technology works effectively on pictures taken with any camera phone (e.g., VGA or low resolution (320x240) cameras) and is able to handle pictures taken in real life conditions that may have lighting artifacts, focus/motion blur, perspective distortion and partial coverage. The technology works in a wide variety of real life scenarios including print advertisements, outdoor billboards, brand logos, product packaging, branded cans, bottles and wine labels. To facilitate scalability SnapTell uses indexing and search techniques to organize hundreds of millions of images. The developed system is also using distributed computing to achieve large scale. Finally, a novel feature of the provided functionality is the ability to automatically extract text embedded in images.

The business model adopted by SnapTell consists in partnering with Marketers in order to create high-impact campaigns and to drive brand awareness, loyalty and revenues. By giving marketers the ability to reach consumers and create a brand relationship with them, the revenue stream of SnapTell is generated by offering its mobile image search technology to support the advertising activities of brands, agencies, content owners, publishers, retailers, operators, etc. Following a similar route, and by joining forces with A9.com, SnapTell is expected to integrate its experience with Amazon’s shopping experience into the field of on-line and visual shopping,

5.6    Point & Find (Nokia)

Point & Find (pointandfind.nokia.com) is a service offered by Nokia that uses visual search technology to let users find more information about the surrounding objects, places, etc. in real time. Point & Find started as a pilot service that allowed users to tag their physical world. After this first experience, Nokia is planning to integrate Point & Find\\\'s underlying visual search and Augmented Reality technology in a way that consumers can access it as part of their core device and service experience.  

The goal of the initial Point & Find service was to let users discover useful and contextually relevant information by pointing the camera phone at objects. For instance, the user can point his camera phone at a movie poster on the street to read reviews, glance at ratings, lookup show times, and even find the closest theatre and purchase tickets if his phone has a built-in GPS. It is a new way to easily find information and services on the go by simply point the phone at real life objects.

Nokia Point & Find is based upon real time image processing and automated object recognition technology. When Nokia Point & Find enabled phone is pointed at an object, the system uses a variety of the phone’s sensors (including the camera and GPS positioning) to identify the depicted object. Then by searching through a database of previously tagged items, the system identifies the object and returns a set of links to associated content and services. In addition, Nokia Point & Find offers a range of capabilities for creating worlds of enhanced objects such as: a) discover content by pointing at objects, entering keywords or navigating a directory. b) Tag physical real-world objects using a Nokia Point & Find enabled camera phone. c) Automatically upload tags into the Management Portal for further editing. d) Define labels and links for tags to web URLs or phone numbers. Using these capabilities a user may tag real-world physical objects just by taking photos. The tagged images are then automatically uploaded into the Nokia Point & Find system and the Nokia Point & Find Management Portal is used to develop and manage customized Nokia Point & Find worlds.

Such worlds are indented to facilitate a brand experience offering contextually relevant content that mobile users can access by pointing their camera phone at real life objects. The creation of such worlds by marketers and their deployment by Nokia is the business model that was initially adopted by Nokia. However, there has been a decision recently in Nokia to integrate Point & Find’s underlying visual search and augmented reality technology in a way where consumers can access it as part of their core device and service experience. Thus, in this case, the revenue stream will not be generated by partnering with the marketers in the context of some advertising campaign contract, but from using mobile visual search to attract more users in the Nokia community.    

5.7    Gazopa

GazoPa (www.gazopa.com) is a similar image search service developed by Hitachi. Users can search images from the web based on user’s own photo, drawings and keywords. GazoPa enables users to search for a similar image from characteristics such as a color or a shape extracted from an image itself. Since GazoPa uses image features to search other similar images, a vast range of images can be retrieved from the web. There are currently 90,000,000 web images indexed by GazoPa. 

By leveraging the image search service, GazoPa has developed an iPhone application (www.gazopa.com/iphone_app) that allows the user to look for images similar to the world around him. The query images can be snapped using the camera phone or uploaded from an already existing photo album. GazoPa offers multiple search options such as search by color, layout, shape, as well as the ability to filter by size, video thumbnail, etc. It provides an intuitive and easy way to perform image search by incorporating features like: a) Search by camera, b) Search by drawing, c) Search by photo at album, d) Shuffle images, e) Filter video thumbnail, f) Filter by size, d) Search option setting color, standrad, layout, shape, face. One of the most interesting features of Gazopa is that it allows the user to draw a quick sketch of what he wants his picture to look like and submit it to get back similar photographs or artwork.

In what refers to the adopted technology, the results are mainly filtered through analyzing the color and shape of the depicted object or person. More specifically, the technology works by analyzing the angles and lines of the source image and finding matches online. Once an image has been found, it can be viewed on the phone (from within GazoPa) or in a mobile browser. GazoPa offers good customization options that allow the user to control search criteria such as color, layout, shape, and face, which all determine the kinds of retuned results.

One interesting application that is based on the aforementioned technology is GazoPa Style Visual Fashion Search (http://style.gazopa.com/). It is a visual product search site that enables users to browse and search for similar fashion items easily. Currently women’s clothing shown on eBay, Amazon and Etsy are searchable at GazoPa Style Visual Fashion Search. The revenue stream for GazoPa is most probably generated by charging referral fees when a user is redirected from GazoPa Style Visual Fashion Search to the respective on-line store and make the purchase. Another interesting application developed by GazoPa is GazoPa Answers (http://answers.gazopa.com/). GazoPa Answers is a visual Query&Answer (Q&A) site that enables users to ask questions visually using photos and drawings. Since GazoPa Answers allows users to upload multiple photos, users can ask multiple choice questions such as "Which handbag is most appropriate for a formal meeting?" and "Which dress is prettier?", etc. Finally, users can also search for answers using GazoPa"s similar image search engine.

5.8    Google Goggles

Google Goggles (www.google.com/mobile/goggles/) is the image recognition application created by Google Inc. which can be currently found on the Mobile Apps page of Google Mobile. Google Goggles was developed for use on Googles Android operating systems for mobile devices. While initially only available for Android phones, on October 2010 Google announced availability of Google Goggles for iPhone devices that run iOS 4.0.

Google Goggles is a mobile application that lets users to search the web using pictures taken from their mobile phones. For example, taking a picture of a famous landmark would search for information about it, or taking a picture of a product's barcode will search for information on the product.  It can be used for things that aren''t easy to describe in words since there is no need to type or speak the query. Google Goggles works better with certain types of queries such as pictures of books & DVDs, landmarks, barcodes & QR codes, logos, contact info, artwork, businesses cards, products, or text. However, it's not so good when taking pictures of animals, plants, cars, furniture, or apparel.

The technology used by Google goggles can be described as follows. After capturing an image, Google breaks it down into object-based signatures. It then compares those signatures against every item it can find in its image database. The results are returned, ordered by rank. Some results are returned before even snapping a photo, using the GPS and compass functionality of the smart-phone. The features embedded in Google goggles are: a) Content-based image retrieval, b) OCR, and c) Augmented reality. Based on these features Goggles is capable of identifying products, famous landmarks, storefronts, artwork, popular images found online, translate words in English, French, Italian, German & Spanish,  extract contact information from business cards. 

It is particularly interesting to inquire into Google’s plans for generating a revenue stream out of this service. Towards this direction we have seen Google taking small steps to leverage mobile image search into a core element of Google’s search functionality. During the year 2010 we have seen the acquisition of PlinkArt by Google, an application for identifying, discovering and sharing art. Similarly, like.com was also acquired by Google, which is an automated cross-matching system for clothing. Both acquisitions are strong indications about the company’s intention to extent the recognition capabilities of Google Goggles and facilitate users in various aspects of their every-day activities such as shopping, site seeing, navigation, etc. Then, the most likely scenario is that Google will employ the successful business model of Google Ads to display advertisements relevant to the user’s visual queries and collect referral fees. 

5.9    CLIC2C

CLIC2C (www.clic2c.com) is an interactive service provided by aquaMobile (http://www.aquamobile.es/). acuaMobile started in 2006, with a focus on the deployment of technology solutions that bridge the gap between the physical and digital worlds, thus unfolding a set of experiences in entertainment, access to information and multimedia content. CLIC2C is a service that enables mobile phone users to interact with their physical traditional environment emulating the experience of online content access, search and discovery. It can transform the information printed on paper (newspapers, magazines, catalogs, posters & packaging) in dynamic multimedia content to be displayed on the mobile phone.  

CLIC2C application works by holding the phone towards the interactive item. The detection of the mathematical code can be performed either in the streaming or snapshot mode. In streaming mode the user needs to point the phones camera at 4-6 inches (10-15 cm.) parallel to the image and the application will connect him to the associated content. In snapshot mode the user needs also to press the “capture” button before the application connects him with the associated content. The outcome of this procedure is to turn a single page advertisement, poster, or point-of-sale item into a link pointing to a vast amount of content, video, or even to a place allowing secure transactions.

CLIC2C technology makes use of Digital Watermarks powered by Digimarc (https://www.digimarc.com/). These watermarks are embedded inside the images before they are made public. Then, if a printed image has been digitally watermarked, the CLIC2C watermark detection mechanism can read the embedded mathematical code and deliver relevant content via the handset’s data connection. The printed image that has been watermarked using the CLIC2C technology can be recognized by the existence of a characteristic logo beside the image. 

The information delivered is directly related to and of a quality that is controlled by whoever embeds the watermark. This fact makes CLIC2C a useful tool for enterprises/advertiser (allowing interactivity for press advertisements, brochures, catalogs or packaging), for printed media (allowing the transformation of printed media into interactive multimedia content) and for organizations (increasing the interactivity of documents and promotional material). Thus, the business model adopted by CLIC2C is a contract-based model where revenue streams are generated by partnering with the marketers in the context of an advertising campaign.

5.10    WeKnowIt Image Recognizer

The WeKnowIt image recognizer (www.weknowit.eu/wkiimagerecognizer/) allows mobile phone users to quickly discover the location and name of photographed objects. This application was developed by Software Mind S.A., the National Technical University of Athens and Yahoo! Research, Barcelona, in the context of the WeKnowIt (http://www.weknowit.eu) Integrated Project (IP) (ICT-215453) funded by the European Union's 7th Framework Programme: Information and Communication Technologies (ICT).

The aim of WeKnowIt image recognizer is to provide the user with detailed information on the location and name of a POI (Point-of-Interest) that he just photographed. The core services used by the application are: a) recognize an object (POI - Point Of Interest) on a picture, b) determine its geolocation, and c) determine tags associated with this POI. It works by just snapping a photo of the location and uploading the image on the WeKnowIt system. Then, the description of the location retrieved from Wikipedia is displayed on the user’s mobile phone.

More specifically, the technology for image recognition is supported by the VIRaL (Visual Image Retrieval and Localization) application which is a content-based image search engine developed by NTUA (National Technical University of Athens) and currently contains more than 1 million Flickr images from 30 European cities. The purpose of visual analysis is to determine the geolocation of the query image, suggest tags for this image and identify visually similar photos without giving any text. On the other hand, the POIs functionality is developed by Yahoo! Research, Barcelona and returns a list of close-by POIs for a given geolocation. These POIs are used to contact Wikipedia so as to gather and present to the user information about the POI recognized in his image. Finally, the Google Maps API is used to localize the POI on the map.

Although primarily oriented towards tourism the automatic geolocalization service offered by the WeKnowIt Image Recognizer can be used to facilitate various promotion actions. For instance, a revenue stream for the WeKnowIt Image recognizer can be generated by helping touristic agencies or governmental organizations to make their touristic campaigns more attractive or even allow the visitors themselves to obtain a radically different site-seeing experience for a small fee.

5.11    Wizup

Wizup (http://www.wizup.mobi/) is a mobile audio-visual search application that was developed as part of the Windows Phone 7 developer challenge. It is an application that is currently being marketed in France, but has the potential to extend in many countries. One important characteristic of Wizup is the combination of audio along with visual recognition, which offers the end user an additional channel for collecting information about the world around him.    

Wizup is able to listen to radio stations, understand images from magazines, or recognize TV channels. By snapping a picture or letting the mobile phone listen the broadcasted audio, the wizup application is able to deliver relevant content to the mobile's screen. It's a particularly useful tool for marketers, but also for the end users since digging for more information becomes easier. Currently, wizup is able to analyze and recognize in real time content of a) 150 radio stations, b) 1200 Press titles, c) 100 TV channels, d)  8 500 000 titles and e) 4 major national networks display. The information offered to the user ranges from additional product information (e.g., promotions, sales), ratings from social sites, etc.

Little information is disclosed about the technology behind wizup. Based on the demonstrated functionality we may assume that wizup is based on a content-based indexing scheme for both audio and visual signals, that is being constantly extended with new content. In what refers to the envisaged business model, it seems as if Wizup aims at generating revenues streams by allowing the marketers to enrich their advertisements with additional, more targeted info. The digital interactivity of media can be used to facilitate advertising through music, TV, radio, magazine, posters, packaging, consumer products program, by using wizup to deliver information about a sound and/or visual element. The enrichment of all kinds of media by a multitude of services is what wizup foresees to become the main revenue source.

5.12    TinEye Mobile

TinEye Mobile (http://ideeinc.com/products/tineyemobile/) is an API-based, automated image recognition service developed by Idée Inc. (http://ideeinc.com/). Idée Inc. is a company that develops advanced image identification and visual search software. The employed technology looks at the patterns and pixels of images and videos to make each image or frame searchable by colour, similarity or exact duplicate.

TinEye, is a reverse image search engine that can be used to find out where the query image came from, how it is being used, if modified versions of the image exist, or to find higher resolution versions. Based on this search engine TinEye mobile allows users to search a product catalog using their mobile phone camera. Given a query image, it locates identical or modified images within or between large scale image collections. One such example where TinEye mobile image recognition technology is used to facilitate the wine industry is Snooth (http://www.snooth.com/iphone-app/). Snooth is an iPhone application that allows the user to take a photograph of a wine label and find the closest store that stocks the selected wine, as well as the prices in each store it finds. The premium version of Snooth provides also reviews about the photographed wine and finds similar bottles. Snooth has integrated TinEye Mobile and made close to one million wines searchable via image recognition. 

The technology developed by TinEye does not use file metadata or keywords to detect image matches. It uses specialized digital image fingerprinting techniques to identify image matches despite resizing, cropping, rotation, flips or occlusion. The image collections are processed in order to generate digital fingerprints for each image. These fingerprints are indexed to facilitate fact image retrieval and are later compared against the digital fingerprint of the query image. Sensitivity can be tuned so that even drastic transformations are identified: resizing, rotation, skew, close-cropping, flip, colour, match area.

Concerning the adopted business model, the TinEye Commercial API allows users to issue queries on the Tin Eye database after purchasing a search bundle. The price for a search bundle is 5,000 searches for $300 USD or 30,000 searches for $1,500 USD and expires after one year of use. On the other hand, the business model adopted by Snooth is heavily based on advertising. Given that more than one million users visit Snooth every month to browse wines and prices the revenue stream of Snooth is generated by the wine companies that want to become part of the Snooth database. The Snooth Wine App is free but the premium version, Snooth Wine Pro App, which includes image-based wine search and no advertisements costs $4.99.   

6.    Service comparison and important trends

After studying the mobile image search market and examining the existing services we can derive Table 1 that compares these services in terms of the employed technology, the target domain and the adopted business model. It is clear that image recognition is the dominant technology employed by the majority of the existing services. The companies providing the mobile image search service usually maintain a large database of images that are used to recognize the content of the query image. An alternative approach is based on watermarking that brings the additional requirement of having to watermark the images before they are made public. Finally, crowdsourcing is another interesting approach for understanding the image content but little information is disclosed about the details of the mechanism.

The most popular domain that is targeted by the majority of the existing mobile image search services is shopping. Marketers use these services to increase brand awareness, launch clever advertising campaigns and discover new channels for transmitting focused information to the user. On the other hand, the consumer is looking for new ways to make informed purchases and improve his shopping experience. Apart from shopping mobile image search is also encountered in other sectors like entertainment, art, fashion, tourism, medicine, etc. In entertainment the mobile users are offered an additional means to obtain information about their favorite program, movie or an artist and even participate in on-line games that are set-up for advertising purposes. In tourism the use of mobile image search can help visitors in getting more information about the object of interest, while in medicine its role can be to allow understanding complex scenarios much quicker and easier than free text. One such example of using mobile image search in the medical domain is the prototype application presented in [12] that has been developed in the context of the EU-funded project Chorus+. Finally, in highly specialized sectors like art, fashion and wine, mobile image search acts as an explorative tool that helps users discover new content that may be of interest to them.

As for the adopted business models we have identified three different ways that allow the mobile image search companies to generate revenue streams. The first model is based on partnering with the marketers in the context of some advertising campaign that makes use of mobile image search. In this case, the company’s income comes from the contract signed with the advertising agencies to offer its technological expertise for supporting the advertising campaign. Moving one step further from the contract-based business model, some of the mobile image search companies have adopted the Software as a Service (SaaS) –based model. In this case, instead of signing explicit contracts with advertising companies, the revenue stream is generated by charging the use of an Application Protocol Interface (API) that exposes the mobile image search functionality. This is a highly flexible business model since there is no restriction on who gets to use the API as long as he pays the corresponding fee. Finally, there is also the advertising-based business model that works similar to the Google adds paradigm. Although currently adopted by only a small portion of the mobile image search companies, it is expected that sooner or later will dominate this market in the same way with text-based search.

Table 1: Comparison table for mobile image search services

SERVICE
TECHNOLOGY
TARGET USERS
BUSSINESS MODEL
Kooaba
Image recognition
Entertainment
SaaS-based
IQ Engines (oMoby)
Image recognition
Crowdsourcing
Shopping
SaaS-based
Mobile Acuity
Image recognition
Consumers, Marketers
Contracting with marketers
LinkMe Mobile
Image and audio recognition
Consumers, Marketers
Contracting with marketers
Snaptell
Image recognition
Consumers, Marketers
Contracting with marketers
Point&Find (NOKIA)
Image recognition
Augmented reality
Mobile users,
Marketers
Engage more users in NOKIA experience
Gazopa
Image recognition
Shopping, Mobile users
Improving on-line shopping experience
Google Goggles
Image recognition
Mobile users
Advertising-based
Clic2c
Watermarking
Entertainment
Contracting with marketers
WeKnowIt IMG REC
Image recognition
Tourism
Touristic promotion actions
Wizup
Image and audio recognition
Consumers, Marketers
Contracting with marketers
TinEye Mobile (Snooth)
Image recognition
Wine industry
Advertising-based
 

In the end we should also refer to the growing interest that is being developed lately around the mobile photo-sharing services like Instagram7, Path8, Picplz9 and Color10.The widespread use of these services has dramatically increased the pace by which new image content is generated and shared, paving the way for new services and functionalities. Despite the fact that the aforementioned services currently lack a searching aspect, it is safe to predict that sooner or later the available image recognition technology will become an integral part of photo-sharing, allowing users to enhance the search and sharing experience with image-based search.

Another booming sector that is also related with mobile image search is Augmented Reality (AR). As people direct their phone cameras to find the nearest tube station or to get reviews about a restaurant across the street, they are “searching” for information. Thus, the AR applications are practically a different means for providing faster, richer or more immediate ways to get information and content. There are now at least a dozen applications that offer varying degrees of augmented reality through the camera lens. For instance we have been witnessing the rapid growth of applications like Nearest Tube11 that tells Londoners where their nearest tube station is via their iPhones video function, Sekai Camera12 that uses positional data to determine the user’s location and overlays meta-information on top of the live camera-view that can be related to various products, places, or even friend notes, or TwittARound13 that uses the mobile phone’s video camera to show live Tweets pop up based on the user location. In addition, substantial effort has been allocated on developing general scope augmented reality browsers that are capable of augmenting our view of the world with diverse information channels like Junaio14 that allows the augmentation of the camera view captured by the smart phone, with instant source of information about places, events, bargains or surrounding objects, Wikitude15 that combines GPS and compass data with Wikipedia entries and overlays information on the real-time camera view of a smart phone, and Layar16  that uses the same registration mechanism (GPS and compass) and incorporates this into an open client-server platform. Although current AR applications primarily rely on GPS and compass information to register the points of interest, the technology of image recognition is expected to obtain a central role in the functionality of these applications, as indicated by Raimo van der Klein, CEO of Layar17. It is evident that as the image recognition technology becomes more and more robust, mobile image search will soon turn into a vital element of many smartphone applications. 

7.    Conclusions

In concluding this survey, we should highlight the fact that the technologies for image recognition are the key element for supporting mobile image search. Around this key element, different purpose applications have been developed focusing on shopping, entertainment, tourism, advertising, finding the cheapest or the most convenient option, etc. Concerning the employed business model the majority of the existing companies doing business in this market follow a monetary benefits approach where revenue streams are generated either by the contracts signed in the context of a certain advertising campaign, or by charging fees on the basis of a SaaS-based model, or both. However, this is expected to change as the image recognition technology evolves favoring a business model, where profit is not derived directly from “selling” the image search functionality but from capitalizing the user’s loyalty and trust established in using the search application (e.g. advertising-based model). Moreover, we should highlight the importance of the client interface in attracting new users and the observed tendency in incorporating augmented reality technologies as an integral part of mobile image search services. Finally, we can also predict that the constantly growing interest around the mobile photo sharing applications and AR applications will soon motivate the mobile image search companies to enhance these applications with image-based searching functionalities.

8.    References

 
IEEE  Conference on Computer Vision and Pattern Recognition, 2007.
on Mixed and Augmented Reality (Sept. 15 - 18, 2008), pp. 125-134.
.-L., Compañó, R., Feijóo, C., Bacigalupo, M., Westlund, O., Ramos, S., et al. (2010). Prospects of mobile search  EUR 24148 EN. Seville: Institute for Prospective Technological Studies. European Commission.
[8]    Notes of the Exploring the Future of Mobile Search Expert Workshop 9th June 2010, Het Pand, Ghent, Belgium (http://www.ist-chorus.org/index.php?article_id=107&page=116&action=article&)
 
[11] Xin Yang, Sydney Pang, and Tim Cheng, "Mobile Image Search With Multimodal Context-Aware Queries",  IEEE International Workshop on Mobile Vision 2010 Program, 2010.
 

9. Notes      

 
http://instagr.am/
 
http://www.path.com/
 
http://picplz.com/
 
http://www.color.com/
 
http://www.acrossair.com/acrossair_app_augmented_reality_nearesttube_london_for_iPhone_3GS.htm
 
http://sekaicamera.com/
 
http://thenextweb.com/2009/07/13/twittaround-augmented-reality-twitter-app/
 
http://www.junaio.com/
 
http://www.wikitude.org
 
http://www.layar.com/
 
http://site.layar.com/company/blog/letter-from-the-ceo-exciting-news-for-layar/
Chorus
Shorus
Chorus

THIS WEBSITE HAS BEEN BUILT UPON THE EFFORTS OF ITS PREDECESSOR PROJECT CHORUS

About Chorus+
Partners