Generating 3D Spatial Descriptions from Stereo Vision Using SIFT Keypoint Clouds

Abstract

To facilitate more natural interaction with robots, we have been investigating an approach for generating descriptions of objects in a scene, using point cloud models built with the Scale Invariant Feature Transform (SIFT). The 3D models are constructed from 24 images taken from different viewing angles. For each recognized object, the model is placed into an internal representation of the environment, at the recognized location. The object keypoints are then projected onto horizontal and vertical planes. The convex hulls of the projected points are computed in each plane and used as boundary representations for computing the Histograms of Forces (HoF). Features from the HoF are then used in a system of fuzzy rules to generate descriptions using spatial referencing language, e.g., the cup is on the table to the right of the lamp. Support is included for right, left, front, behind, on top, above, below, inside, contains, and near.


Back to Table of Contents