Social Multimedia Analysis

Multimedia content is being produced and shared through the Internet at an unprecedented rate. For example, more than a million images are shared every day and 100 million hours of video are shared each year. With this onslaught of data, the ability to automatically understand the contents of images and videos is critical for enabling applications such as content-based retrieval, similar item search, personalized content search, privacy protection, and modeling the flow of multimedia contents on social networks. Such capabilities can provide cost-efficient solutions for collecting information about viral content (e.g., memes), customer feedback on new products, and geo-political or military events around the world, which has not previously been possible without dedicated research and intelligence groups.

Kitware is developing a suite of large-scale multimedia analysis tools that focus on visual content understanding, content-based search, online privacy protection, and network modeling. These software tools incorporate the latest state-of-the-art techniques in multimedia analysis to detect objects, scenes, activities, in-scene text, and audio signals embedded in unconstrained images and videos. These techniques are jointly used to analyze and detect patterns of interest in data. The development of a privacy advisor, which alerts users when images with potentially privacy-sensitive material are about to be inadvertently shared on the web is an example of one of Kitware’s ongoing projects. Our tools have demonstrated high accuracy on large-scale, real-world data and can be adapted to diverse application domains. In addition, Kitware tools have integrated advanced visualization and interaction that allow a seamless search experience on web browsers and improve search accuracy by incorporating users’ relevance feedback.

To learn more about our efforts in social multimedia search and understanding, please see our papers at ACM MM 2013, Machine Vision and Applications 2014, and ICPR 2014.


System architecture for search of large multimedia archives through extraction of visual and audio features. Search results are refined
through iterative user feedback.

Refined search results for "flash mob."

Search results based on visually similar styles.