We previewed some of our 3D generation capabilities in our CommonSim-1 preview blog post. We are now making our images-to-3D generation capabilities widely available through front-end apps, and will also be offering APIs shortly after for even greater accessibility. This should enable creators and developers to rapidly populate 3D worlds for AI training, games and content.
Our iOS app and web app make it easy for anyone to convert images to NeRFs (neural radiance fields) and textured meshes. These assets can be blended and dropped into existing simulation engines. We will be releasing a set of plugins and open-source code to make it easy to integrate the NeRF asset types into existing simulation and content creation workflows.
As we gather more user feedback and scale up our cloud resources, we will keep inviting new people from the waitlist. A set of gallery assets and the web portal can be viewed here.
Contrary to existing 3D scanning and modeling software, our product is aimed at creating realistic and whole object models. This is essential for any serious gaming, AI training or content workflow beyond visualization. This often requires capturing objects from all sides, breaking several assumptions in classical structure from motion and NeRF pipelines. In order to do this, we support three capture protocols to cover different modeling scenarios:
The detailed instructions can be viewed here (capture and generate). A summary for capturing is described below:
For capturing static objects, users can walk around the object (left video) to cover the half hemisphere as much as possible. If the user captures using the iOS app, there is an option to place a 3D bounding box to indicate the region of interest. In all other cases on iOS or the web app, we deployed an interactive video segmentation model to let users quickly isolate the object of interest from all occluders — this enables an extremely flexible workflow to handle rigid objects that are under occlusion or motion.
The second protocol (middle video) lets the users walk around the object similar to the static protocol but flip it before repeating this process. This enables capturing all sides of the object, including extremely thin objects like a playing card which is very difficult otherwise.
The last protocol (right video) lets the users manipulate objects in their hands. Our video segmenter automatically isolates human hands. We will also shortly be updating the segmenter to automatically isolate the foreground object. For now, the user has to place a few points of interest on the image frame(s) and then the rest is automated.
All asset types including segmentation masks, standardized NeRF formats and textured meshes can be downloaded from the UI by clicking on an object session. Textured meshes can be imported into off-the-shelf game and rendering engines. Currently the NeRF formats are restricted to be loaded in Omniverse or Blender (instructions and github repo coming).
Comparisons to Apple Object Capture and Photogrammetry
We support two different NeRF variants — a fast NeRF and HDNeRF. HDNeRF takes a longer time to converge but often yields higher resolution and smoother geometry for manufactured objects. We combined this approach with a machine learning driven texturing pipeline to generate high resolution meshes with UV unwrapped texture maps. This approach provides the best of both worlds — leveraging NeRFs to model photometric details, precise geometry and a texturing pipeline which is robust to errors due to noise in camera/geometry estimation. We observed that in many cases where HDNeRF works well, it far exceeds the level of geometric realism.