MATIS3D : a Neural Network Based Stereo Vision Machine


More details are given in:
Garcia C., "Matis3D: An Adaptive Stereo Vision Machine", Proceedings of the 10th Scandinavian Conference on Image Analysis (SCIA'97), June 1997, Lappeenranta, Finland.

This project aims at building a stereo vision machine to be used in an industrial environment to perform different tasks such as 3D measurements, quality control and vehicle guiding. In order to build such a stereo vision machine allowing different kinds of industrial applications, we have designed a kernel that does not depend on the image features used and on the problem constraints. We describe this kernel, based on a neural network solution and implemented on a torus of Transputers.

The Stereo Matching Problem

The goal of stereo vision is to obtain 3D information from two images taken from different viewpoints. The depth information is essential in many applications such as robotics, remote sensing and medical imaging. This is achieved by measuring the disparity between two images taken by two calibrated cameras. However the difficult problem in obtaining the disparity is to find the correspondence of features between two images. Once the correspondence between the two images are known, the depth information of the objects in the scene can be obtained easily. The key step in stereo vision is the matching process which finds the corresponding features between the left and right images. To solve the matching ambiguity in the correspondence problem, various techniques have been applied to reduce the search space. One of the most effective approaches is to apply constraints to restrict possible feature matches. The matching process can be viewed as a complex optimization problem in which image features are selected, extracted and matched using a set of constraints that must be satisfied simultaneously. Another problem to solve is that extracted image features and the stereo system usable constraints are variable depending on the application.

The matching problem can be formulated as :
"For a point p1 (projection of P) in image 1 , find the point p2 of image 2 which corresponds to the same physical point P of the observed scene"



The stereo Matching Problem

It is clearly a combinatorial problem. So, we have to choose higher level features (edgels, chains of contour points, regions) and to use constraints to reduce the complexity of the matching process (unicity, order, figural continuity, epipolar geometry).

A Neural Network Model

We can formulate this problem as a Constraint Satisfaction Problem (Mohan 1989). This method consists in finding a subset of verified hypotheses, under a set of constraints. These hypotheses can be supported or rejected by others.

We can notice the analogy of the CSP with a binary recursive neural network. Therefore, we suggest that the stereo vision matching problem can be cast into a global optimization problem of the type mentioned above and consequently can be mapped onto the binary recursive neural network such as the cost function be the same of the Liapunov energy function of the network. We have chosen to use the Hofield Neural Network to solve this optimization problem (Hopfield and Tank 1985) for two main reasons : this neural model naturally corresponds to the CSP formulation and neural units are very simple and can be implemented in a hardware form (Skubski 1992).

The Hopfield neural network (Hopfield 1982) is fully-connected : each neuron encodes one hypothesis of the Stereo Problem (i.e: feature P1 corresponds to feature P2) and communicates using synapses with the other neurons. It receives an external excitation Ii called the input bias encoding a local constraint based on a similarity value between features P1 and P2. Each neuron participates to the global competition between hypotheses via weighted synapses Wij that encode global constraints like unicity, order, figural continuity and epipolar geometry. The network states evolve according the network updating rules and converge towards a final configuration in which the active neurons (green) correspond to validated hypotheses (results of matching).


The Neural Network Model

A Parallel stereo Vision Machine

We developed an auto-calibrated general stereoscopic system Matis3D in the Vision automation Group of the IBM plant of Montpellier.

Components (1), (2) and (9) are support cards rack and power supply. The (first generation) Matis3D kernel system (6) is based on a torus of 16 T800 transputers. Each of them simulates the behavior of a set of binary neurons in the neural model.

The four links of each transputer permit the exchange of data between each group of neurons. Each camera (3) is connected to a frame grabber (4 and 5) PCIS from IBM Japan. These frame grabbers contain both a Transputer which is able to communicate with the kernel system to send images features and with other machine like a PC (8) or a robot digital controller (7 and 10).


Overview of Matis3D



Hardware Kernel of Matis3D

Local and global constraints of the problem are downloaded into the 2 PCIS Transputers from a PC, as well as the features extractors. It allows engineers to solve a specific problem by studying the available constraints, choosing the features to be matched and sending this configuration. Images features are extracted in each image and then synapses weights and input biases are computed and dispatched to the Troot transputer of the kernel system, before each of the two steps of the algorithm starts.

The kernel Transputers torus solves the matching problem running the algorithms we described before. Then, Troot receives the final configuration of the neural states and sends it back to the PCIS module. PCIS modules send results to the interface module of the machine (8) in order to compute the reconstructed 3D features as well as to robot controllers (7) in order to perform a specific task.





A view of MATIS3D performing a 3D reconstruction task

Some 3D Reconstructed scenes

The following stereo images are extracted from the Teleos Research Project Database and are used as test images. The stereo matching process is based on edge points and the presented 3D results are not interpolated. For each image pair, we present three different perspective views of the reconstructed scene. One can notice some erroneous points due to image borders and occlusions. The tests were performed on images pairs with parallel epipolar lines correspondence as well as on images pairs taken by a stereoscopic system in a general position.

Image pair : Fruits : 6256 recontructed points


Image pair : Rocks : 5549 recontructed points


Image pair : Ground : 5205 recontructed points

Some stereo pairs and 3D reconstruction results
Copyright © C. Garcia - 2001