Discover The Amazing Pipelines of SORDI.ai!
The largest and most comprehensive synthetic dataset
Published by BMW Group, SORDI.ai helps developers and researchers to streamline and accelerate the training of artificial intelligence in production. The dataset offers over 1.2 million photorealistic images and thousands of detailed point clouds, capturing even the most intricate aspects of real factory environments. Together with Google and NVIDIA, SORDI.ai has been made available as open-source, in an effort to build the world’s largest reference dataset for artificial intelligence in the broad field of manufacturing.
120+ unique object classes
SORDI.ai offers a diverse and extensive object catalog with over 120 unique classes, enabling deployment across a wide range of regions, industries, and use cases. The dataset features high-fidelity digital twins of logistics and industrial assets—such as KLT boxes, stillages, and dollies—carefully modeled to VDA standards. Realism begins at the smallest details and extends to the dynamic structure of each scene: objects appear in randomized combinations of states, behaviors, and visual variations. This controlled twist of realism and randomness is our secret ingredient for building high-performance industrial AI models!
SORDI.ai provides extremely accurate synthetic annotations that greatly reduce the time and cost of manual labeling without compromising quality. These labels offer pixel-perfect precision and consistency across 2D, 3D, and time-series data, supporting a wide range of industrial use cases such as object detection, segmentation, classification, and forecasting. This rich and precise annotation ecosystem fuels cutting-edge solutions across industries, powering smarter, faster innovation in today’s most demanding manufacturing environments.
Accurately Annotated Dataset with Pixel-Level Synthetic Labels
The SORDI.ai Synthetic Data Generation Pipeline
-

1. Asset Preparation
-

2. Scene Construction
-

3. Data Capture
-

4. Quality Assessment
Digital First Approach.
The SORDI.ai dataset is made up of synthetic images generated using NVIDIA Omniverse. Through leveraging USD and MDL workflows, as well as connecting DCC tools from BMW Groups workflow twisted with Google Cloud’s Generative AI technologies, SORDI.ai is constantly expanding to include new models and classes.
Over 120+ Versatile Object Classes
-

Logistics
Containers, stillage, storage boxes, disposable and reusable handling materials, packaging, boxes.
-

Transportation
Manual and powered vehicles, pushbikes, scooters, production equipment.
-

Office
Office furniture, boards, displays, accessories.
-

Signage
Emergency signs, information signs, pictographs, text signs.
-

Tools
Safety tools, Mechanical and electrical tools, screws, bolts.
The SORDI.ai Asset Tree
The structured hierarchy and organization of all SORDI.ai digital assets form the foundation of how the SORDI.ai dataset is built, managed, and expanded—especially important given its descriptive levels, which break down asset configurations by deployment environment, task, type, variant, and behavior. The scalable SORDI.ai tree is fundamental for maintaining large-scale digital twin and configuring realistic randomizations.
Dynamic Industrial Assets
Our digital twins are not just visually rich; they are significant and comprehensive, built on deep metadata that captures all operational and behavioral logic. This architecture provides the twin with predictive intelligence, allowing it to accurately track an asset's full history and reliably forecast its future state.
Comprehensive Metadata
SORDI.ai assets are intelligent, operational objects, not just visual models. Our comprehensive metadata defines physical properties, operational state, and behavioral logic. This depth of data enables the generation of logically consistent scenarios, giving AI models the situational understanding required for real-world reliability.
Generative Textures
We leverage state-of-the-art generative AI to dynamically create photorealistic and procedural textures on demand. This pipeline generates a virtually infinite variety of surface conditions, including authentic signs of wear and tear and environmental effects. This extreme variability is crucial for dramatically boosting the robustness and generalization capabilities of vision-based models.
The SORDI.ai Scalable and Modular USD-based Scene Construction Pipeline
This pipeline masterpiece is the core of SORDI’s scalability to support extremely large scale digital twins! It consists of 9-layers based on Omniverse’s USD pipeline.
Automated Procedural Creation of Digital Stages
-

Layout Generation
-

Layout Tree for Scene Creation
-

Scene Construction & Data Generation
Dynamic Action-Ready Scenes ft. Comprehensive Human Worker Activities
These incredibly realistic human models simulate complex tasks’ performance and movements with high accuracy and speed, enabling the simulation of action-based real-world scenarios with dynamic events. With virtual humans, SORDI.ai is pushing the boundaries of what's possible in the industrial Metaverse and paving the way for more efficient and sustainable applications.
Real-World Inspired Scene Capturing Settings
The diverse SORDI.ai camera capture settings provide a fine-grained control over the complexity and realism of the synthetic environment, enabling the targeted creation of training datasets optimized for specific vision challenges, thus significantly improving the overall accuracy and reliability of the resulting computer vision systems.
-

Static Camera Capture
-

Fully-Randomized Capture
-

Constrained-Randomized Capture
-

Sequential (Path-Oriented) Capture
Compare SORDI.ai Multimodal Image Outputs
Click and slide to compare two different modalities and annotations:
Point Cloud 3D Annotations
Real-world 3D sensor data is often sparse, noisy, and notoriously difficult to label. We eliminate these challenges by providing pristine, synthetic point clouds that serve as the definitive 3D ground truth. Our data captures the geometric intricacies of industrial environments with perfect accuracy. Each point is meticulously labeled with its object class and instance, empowering developers to accelerate their training cycles and build AI models that can reliably understand and interact with complex physical spaces.
Industrial Use Cases
SORDI Generative .ai Pipelines!
-

AI-Generated SORDI 3D Assets and Textures
-

AI-Generated Scene Creation and Update
-

AI-Generated Photorealistic Datasets
SORDI.ai can be used to detect missing or wrong (colored) stitches in leather products and automate the visual inspection for valid leather and stitching color combinations. To showcase that we added a large number of images of leather products with different stitching patterns, textures, and colors, making it possible to train machine learning algorithms to identify missing or incorrect stitches in leather products automatically.
As a result, SORDI.ai enables manufacturers to ensure that their products are of high quality and meet the required standards.
SORDI.ai for Visual Inspection
Success Story
