State of the art Image to 3D in seconds

State of the art Image to 3D in seconds

2024/03/21

Last month we announced a new AI foundation model as part of Cube 2.0, bringing you image-to-3D in seconds via a unique blend of state-of-the-art AI technologies. Today, we are thrilled to announce a new version of our foundation model that brings us closer to our vision of real-time HD-quality. Our latest model uses more powerful neural network machinery, and it is trained with higher-resolution visual data on a large cluster of NVIDIA H100 GPUs. As a result, the new model is able to capture intricate details of the input and predict high-quality 3D outputs in just seconds. This new model is now active in all Cube turbo sessions.

To demonstrate the capacity of our new foundation model, we have created a new image-to-3D benchmark called CSM Leaderboard. We evaluated our model on the full benchmark of over 100 examples and compared against three publicly-available baselines from recent literature: TripoSR, LGM, and OpenLRM. A preview of the results is provided in the table below, and the complete set of results will be shared with an upcoming public release of the CSM Leaderboard benchmark dataset.

For more results, check out our full report here.

To help quantify the improvement that our new foundation model provides, we recruited 3 human evaluators and asked them to rate the 3D results from all 100+ benchmark examples. For each example, the evaluator is asked to select which of the 4 spin results they believe to be the best. Models and examples were randomly shuffled to minimize any potential bias. The results from this experiment are provided in the chart below. Overall, human judges selected the result from CSM Cube as their favorite 40.3% of the time, outperforming the next-best model considerably.

Next-level HD quality with Cube refines

The second phase of 3D generation in Cube, Cube refine, is tailored to artists and professionals seeking high quality, production-ready 3D assets. While the initial predictions from our foundation model offer quick results, the refine phase takes these outputs and significantly enhances their definition, texture, and realism. This process, although more time-intensive, ensures that the final 3D models are of superior quality, suitable for applications demanding high-resolution detail from animation to simulation.

The refine process produces a high-quality 3D model with enhanced textures, lighting effects, and overall structural accuracy. The final product better reflects the complexities of the original image. In the table below, we demonstrate how our refinement process is superior to the best techniques available in the field today. This capability positions Cube 2.0 as a valuable tool for those in need of detailed 3D models, bridging the gap between rapid prototyping and high-quality final production.

More results are available here.


Re-texture 3D assets with AI

At CSM, we believe in tooling that directly involves users in the generative AI process. In addition to our existing user-guided features like Text to 3D, Image to 3D, and Sketch to 3D, last week we announced a powerful new AI Retexturing tool that allows users to quickly transform the texture of assets into exactly what they're looking for. Texturing is often a challenging task for creators without extensive 3D modeling backgrounds. With AI Retexturing, however, those users can easily paint higher resolution, more detailed textures directly onto their assets. These images are generated using a text-guided diffusion model, giving the user precise control over the texture they apply. Retexturing is also fast -- with our intuitive interface, dramatically increasing the quality of an asset takes less than five minutes.

Below, see a demo video showing how the new retexturing tool works. Small edits that previously necessitated restarting the 3D generation process or spending valuable time in legacy modeling software can now be achieved quickly and with production-level quality using our new AI Retexturing tool.