Eran Swears, Anthony Hoogs, Matt Leotta, and Sangmin Oh attended the IEEE Conference on Computer Vision and Pattern Recognition, which took place from June 23 to June 28, 2014 in Columbus, OH. CVPR is the premier annual conference for computer vision research, with more than 2000 attendees and a paper acceptance rate below 30%. Kitware participated in the main conference as well as several co-located workshops, presentations, and short courses including:
A paper in the main conference, on “Complex Activity Recognition using Granger Constrained DBN in Sports and Surveillance Video”
An invited talk on “Video Scene Segmentation and Recognition by Location-Independent Activity Classes” at the Workshop on Perceptual Organization in Computer Vision
A demonstration of Complex Activity Recognition and Functional Scene Element Recognition
A poster presentation on Collaborative Computer Vision Research in the Vision Entrepreneurs Workshop
A half day tutorial on Emerging Topics in Human Activity Recognition
A poster presentation on “Pyramid Coding for Functional Scene Element Recognition in Video Scenes” in the Scene Understanding Workshop
A thesis research summary was presentation at the Doctoral Consortium
At the main conference, Eran and Anthony presented a poster covering their CVPR paper “Complex Activity Recognition using Granger Constrained DBN in Sports and Surveillance Video,” which was also featured as a video spotlight. The poster presentation was a success, as the crowd around the poster remained consistent throughout the two-hour presentation session.
The conference also included video spotlights of the posters, which played before and during the poster sessions. These videos are available online. The poster spotlights were effective in helping people determine what posters they wanted to visit before the poster sessions began.
Activity Detection and Scene Understanding Demos
Eran, Matt, Anthony, and Sangmin gave a live demonstration in the Activity Detection and Scene Understanding demo session. The demo showed a user defining a complex activity as a graphical model using the tools that Rusty and the team developed in vpView; automatic, live detection of the activity on a video dataset; and user examination of the results in vpView. The team displayed the actual graphical user interface (GUI) and how the user interacted with it, while running detection algorithms a couple of times!
The demo was reasonably well-attended and worthwhile. There was an ebb and flow of the crowd around the demo due to a parallel poster session. The combination of PowerPoint presentations, demo software, and multiple computers for simultaneous display was very beneficial to clearly represent this capability.
Vision Industry & Entrepreneur Workshop (VIEW)
This workshop included several invited talks and posters about research and entrepreneurship in the computer vision industry. Kitware displayed a poster that provided an overview of its computer vision work. The poster, created by Arslan Basharat and Brad Davis, was presented by Matt, Anthony, and Sangmin. It sparked interesting discussions about what it means to participate in collaborative research in the industry.
Emerging Topics in Human Activity Recognition Tutorial
Sangmin was a co-organizer of this tutorial with Michael Ryoo (JPL), Ivan Laptev (INRIA), and Greg Mori (Simon Fraser University). The tutorial covered a wide variety of topics, including visual features, group activity recognition, first person videos (i.e., those taken using Google Glass), and applications of activity recognition. The tutorial was a success with around 50 attendees. Sangmin showcased Kitware’s work on complex activity recognition, WAMI tracking, and sports activity analysis. The slides for this tutorial can be found online.
Workshop on Perceptual Organization in Computer Vision
This workshop was very worthwhile, with a strong program of invited speakers including Anthony, who gave a talk titled “Video Scene Segmentation and Recognition by Location-Independent Activity Classes.” The talk discussed action detection and matching in VIRAT, with an emphasis on finding short-duration events in long, continuous video archives. (Most action recognition datasets work with short clips containing single actions.)
Scene Understanding Workshop
This workshop was enlightening and informative, consisting of invited talks and a poster session, during which several presenters discussed trends in predicting information in a scene. Such trends include predicting information outside of an image’s bounds and creating a list of possible futures for a vehicle or person’s trajectory. In addition, a lot of work was discussed in scene understanding, particularly from single images. However, work in this area has started to move into video over the past several years. Many of the ideas and observations from single image analysis can be applied to video. The workshop also highlighted using 3D modeling to improve segmentation and recognition of objects. In particular, there were typically very large improvements in performance when incorporating 3D CAD models of objects.
Additionally, Eran presented a 30 second poster spotlight and spent a couple hours presenting Kitware’s previous ICCV paper, “Pyramid Coding for Functional Scene Element Recognition in Video Scenes.” In general, the other scene understanding posters at the workshop were applied to high resolution images with well defined objects and used pixel features. As a result, there was not as much interest in our more challenging problem as we would have liked, given that our applications have many, sometimes ill defined, objects in low resolution imagery, and scene understanding is performed using motion behaviors rather than pixel features. In any case, Eran did meet a few people that were interested in doing internships and possibly full-time work at Kitware.
This year, there were 1807 total paper submissions, 540 of which were accepted for the conference and 104 of which were presented at one of the oral presentation sessions. This makes the acceptance rate of papers for the conference 29.88 percent. The percent of papers selected for oral presentation was 5.76. The papers can be downloaded online.
It was very satisfying to see that many of the CVPR 2014 papers used the VIRAT public dataset (from Kitware’s CVPR 2011 paper, “A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video”) for activity prediction and classification. Kitware’s team compiled the dataset with university collaborators and made it available to the wider computer vision community for research. It is one of the largest collections of surveillance footage to be made public.
So far, the CVPR 2011 paper has been cited over 90 times in conference proceedings and journals, and this number is growing!
Visual SLAM & State of the Art 3D Reconstruction Techniques Tutorials
Matt attended two closely-related tutorials on Visual SLAM (Simultaneous Localization and Mapping) and Structure from Motion (SfM). SLAM comes from the robotics community and has a focus on real-time performance on video. SfM, on the other hand, often operates in batch on collections of unordered images. Both tutorials gave good overviews of key contributions to the field. Both discussed sparse 3D reconstruction (point clouds) and dense 3D reconstruction (surfaces). The lines between SfM and SLAM are becoming increasingly blurred. Kitware’s recently released MAP-Tk is intended to eventually span both of these topics.
In general, the conference was very productive and comparable to ICCV. There was a lot of interest around Kitware’s work, as well as potential job and internship opportunities. As expected from a top tier conference, the team found that the state of the art is continuing to make significant progress in tracking, scene understanding, 3D reconstruction, and deep learning to name a few. Of particular note, the state of the art in tracking, even on WAMI data, is being pushed to new limits providing us with some useful tools to integrate into our tracker!