Semantic Segmentation with Dense Prediction Transformers via Grasshopper
The Ambrosinus-Toolkit v1.2.9 has been implemented with Semantic Segmentation a new DPT tool, following the inspiration from René Ranftl’s research at Intel. It is another AI tool that brings artificial intelligence power inside the Grasshopper platform and completes (for now) my original idea to include DPT technology into ATk.
Dense Prediction Transformers (DPT) are a deep learning architecture utilized in computer vision applications like semantic segmentation, object detection, and instance segmentation. The fundamental concept of DPT involves generating dense image labels through the incorporation of global and local context information. An extensive explanation of Ambrosinus Toolkit and the integration of the AI Diffuse Models in the AEC creative design processes has been discussed in the official research document within the Coding Architecture book chapter 😉. DPT represents an advanced architecture in the field of visual processing, which breaks away from traditional convolutional networks (CNN) to embrace transformers, known for their effectiveness in processing sequential data. These models are particularly well suited for dense prediction tasks, such as monocular depth estimation and semantic segmentation, due to their ability to handle high-resolution representations and capture global contexts within images. Combining token representations at different resolutions and using a convolutional decoder to synthesize full-resolution predictions allows DPTs to provide more precise and detailed results. This technology has already been shown to significantly improve performance in various computer vision tasks, setting new benchmarks in the field. Imagine you need to label objects in an image. Here’s how they work:
Convolutional Networks (CNN): CNNs are like artists who specialize in analyzing specific parts of the image. They scan the image with filters to detect edges, textures and shapes. Example: If they see a car, they might say: “There’s a car!”.
Dense Prediction Transformers (DPT): DPTs are like poetry writers who understand the image as a whole. They look at the image and create a detailed map of what they see. Example: If they see a road, they might say: “This is an asphalt road, with trees on the sides and a parked car.”
In short, CNNs focus on specific parts, while DPTs understand the image as a whole
Requirements
To run the DPTSemSeg component (subcategory AI) some Python libraries are necessary as the other AI tools, for this reason, I have shared a “requirements.txt” file allowing the designer in this step in a unique command line from cmd.exe (Windows OS side). After downloading the file to a custom folder (I suggest in C:/CustomFolder or something like that) run the following command from cmd.exe after logging in the “CustomFolder”: pip install -r requirements.txt and wait till you see the start prompt string (see the image below on the right).
If you have already used/installed DPTto3D component, follow this instruction
From your CMD window viewport, you can simply launch this command: pip install atoolkitdpt (I recommend this option) — in this way, all necessary (MiDaS and DPT) libraries will be installed on your machine. For a clean installation you can uninstall the previous version in this way from CMD window viewport: pip uninstall atoolkitdpt
In particular, I have created the atoolkitdpt python library to run DPT estimation and the Semantic Segmentation. I have added all the MiDaS and Dense Prediction Transformers functions developed by Intelligent Systems Lab Org (Intel Labs) to atoolkitdpt 0.0.2 library. In this way, I have integrated the possibility to exploit the MiDaS pre-trained dataset and the DPT large and hybrid datasets shared by Intel researchers directly inside Grasshopper. For this reason, this Python package is available directly from my PyPI repository page at this link. All future updates will be publicly notified on my GitHub page AToolkitDpt. Fundamental is downloading these 2 weights models (dpt_large_ade20k ~1.3GB and dpt-hybrid_ade20k ~400MB) shared by Intel researchers. These pre-trained datasets can generate segmented images (see the GitHub page aforementioned for details). Finally, the GH CPython is still necessary for running the “DPTSemSeg” component properly, as for all the other AI tools coded in Python language.
Main features
The DPTSemSeg component, utilizing DPT technology, can perform semantic segmentation directly from a 2D RGB image. The ADE20k dataset includes 150 classes (label IDs), and a common issue is the identical color palette used for two specific categories: road and skyscraper. I have implemented an internal prediction re-mapping to address this issue (hopefully successfully). However, the detection of skyscraper classes remains imperfect, as they are only partially recognized and often conflated with the building category (which is not too problematic, in my opinion). Summarizing the component can:
- Generate the segmented image overlay on the original BaseIMG
- Generate the segmented image mask
- Setting the “Optimize” option to false may correct the greyish image produced by the model (alternatively, the dpt-hybrid model can be used, though it is less accurate)
- Generate a TXT file containing all data extracted from the segmented image, such as RGB colors, the percentage of the classes detected, and label IDs. This file will be saved in the “DirPath” folder.
Video demo
Files and extra info at bottom of the page here!
As always Stay runed! 😉