Welcome to the new-look MARS. See something that needs attention? Use our "Send Feedback" link at page bottom.
 

Efficient Affine Image Matching for Building and Maintaining 3D Models

Date

2010-11-02

Authors

Fleck, Daniel

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

3D models of buildings are used in many applications such as location recognition, augmented reality, virtual training and entertainment. Creating models of buildings automatically is a longstanding goal in computer vision research. Many current applications rely on manual creation of models using images and a 3D authoring tool. While more automated approaches exist, they typically are inefficient, require dense imagery, other sensor data, or frequent manual interventions. The focus of this thesis is to automate and increase the efficiency of 3D model creation from image collections. Matching sets of images to each other is a frequent step in 3D model building. In many applications image matching must be done hundreds or thousands of times. Thus, any increase in matching efficiency will be multiplied hundreds or thousands of times when used in these applications. This dissertation presents a new image matching method that achieves greater efficiency by using the fact that images taken from similar viewing angles are approximately related by an affine transformation. An affine transformation models translation, rotation and non-isotropic scaling between image pairs. When images are related by an affine transformation ratios of areas of corresponding shapes are invariant. The method uses this invariant to fit an affine transformation model to a set of putative matches and detect incorrect matches. Methods assuming global and local affine transformation models were created. The first assumes a single global affine transformation between each image pairs. The second method imposes a structure on the feature points to cluster features in a local region. The method then fits different affine models to each cluster. Both methods were evaluated using sets of synthetic matches with varying percentages of incorrect matches, localization error and rotation. Additionally, the methods were applied to a large publicly available image database and the results were compared to several recent model fitting methods. The results show the best affine method using local regions maintains equivalent accuracy and is consistently more efficient than current state of the art methods. When creating and using 3D models, it is often important to predict if images taken from specific locations will match existing images in the model. Image matching prediction is used to evaluate image sets for vision-based location recognition and augmented reality applications. This dissertation presents a new way to predict if images will match by measuring affine distortion. Distortion is measured by projecting features into a second image and computing the affine transformation between the corresponding feature regions. Feature distortion is computed from the skew, stretch and shear of the transformed region. Using the distortion measure for all features in an image pair, a distortion vector is created describing the image pair. Using the distortion vectors and the actual number of matches, a classifier is trained to predict the confidence that images will match. Results are presented that compare this method to other published approaches. The results demonstrate the affine distortion-based classifier predicts matching confidence more accurately than other published techniques. The classifier is also used to create a spatial model of locations around a building. The spatial model shows the confidence that a new image taken from a specific location and pose will match an existing set of images. Using this model, location recognition applications can determine how well they will work throughout the scene. The approach presented uses the classifier described above and more realistic location sampling to create a spatial map that is more accurate than other published approaches. Additionally, as part of this goal, the minimum set of images needed to cover the space around the building is computed. The approach uses structure from motion to create 3D information about the scene. Synthetic cameras are then created using approximate locations and directions from which people commonly take pictures. The affine distortion-based classifier is applied to compute the confidence that images from the synthetic cameras will match the existing set of images. Results are presented on a spatial map showing the confidence that new images captured at specific locations and poses will match the existing image set. Additionally, the minimal set of images needed to maintain the matching coverage is computed using a greedy set cover algorithm. The minimal set can be used to increase efficiency in applications that need to match new images to an existing set of images (e.g. location recognition, augmented reality and 3D modeling applications). Finally, a process is presented to validate the 3D information computed using structure from motion. Validation ensures that the data is precise and accurate enough to provide a realistic 3D model of the scene structure. Results from the process show that the Bundler structure from motion software generates 3D information accurately enough to calculate distortion and generate the spatial coverage map.

Description

Keywords

Image matching, Affine models, Computer vision, Model fitting

Citation