WiMi Hologram Cloud Inc., a leading global Hologram Augmented Reality (AR) Technology provider, announced the development of a 3D object recognition system based on multi-view feature fusion. The system uses convolutional neural networks to analyze different viewpoints of 3D objects and fuse features from multiple views to infer global information about 3D objects, which is fed into a fully connected network for classification and inferring labels of 3D objects from multiple views.
WiMi’s 3D object recognition system based on multi-view feature fusion consists of three main parts: viewpoint information selection, feature extraction, and feature fusion.
CIO INFLUENCE: Apprentice Now Joins Amazon Web Services Training Partner Program to Deliver AWS Cloud Skills Training
The viewpoint information module can project 3D objects into the 2D plane from multiple perspectives. Different viewpoints involve different object orientations and structural information. A graph structure can be built between multiple views and clustered into groups based on spatial distribution. A reasonable viewpoint information selection strategy can optimize the training data of the network.
The feature extraction module is to extract features using convolutional neural networks. After the convolutional layer, the feature mapping module can act on the view feature response map. Multiple mapping matrices are learned using a multilayer perception machine, and multiple matrices map the corresponding views onto an approximate feature space. The mapping matrices can generalize the viewpoint transformation relationships between views and map the feature map to a group-level feature that describes the region.
CIO INFLUENCE: Ericsson presents a Green Financing Framework
The feature fusion module focuses on fusing multiple features with a reasonable and effective strategy to achieve multilayer fusion based on clustering. The convolution operation weighs the high-dimensional view features and encodes the weight information between different views. CNN deals with feature response maps with spatial data. Features are extracted from the convolutional layer of CNN after using maximum value pooling to obtain the maximum response on the feature map. The system learns the correlation between adjacent views to generate global features with more explanatory power and fuse them into feature maps.
After all view features are fused into global features, the system inputs the global features into the fully connected layer, mines the high-dimensional features in the fused features with spatial information, and completes the classification and output results.
3D object recognition technology is one of the core technologies of computer vision and the critical technology for 3D scene understanding. WiMi will continue to expand the application of its multi-view feature fusion-based 3D object recognition algorithm based.
CIO INFLUENCE: Datometry Releases Driver Integration for BigQuery, Further Future-Proofing Its Customers’ Investments
[To share your insights with us, please write to sghosh@martechseries.com]