← Back

Image Generation Limitation

Image generation, just like LLMs, is based around deconstruction and interpretation. When given a prompt, a so-called 'diffusion model' will gradually add 'noise' to a blank image until it becomes something answering the prompt. The knowledge of what noise will result in which image is gathered by feeding a model millions of images. Some models are specifically trained to generate a certain type of image and are therefore fed a large amount of that type of image.
Quick Sketch Quick Render
01 // STATE OF THE ART

Current Proficiency

Current models such as Stable Diffusion 3, Midjourney v6, and DALL-E 3 are capable of producing images that outperform traditional rendering, compressing it to a task of mere minutes.

These systems encode a vast "latent space" of architectural history, allowing for the generation of hundreds of iterations of massing studies or facade concepts in moments.
Herein lies a interesting potential for architectural firms with a big archive of architectural drawings. This archive can be used to train private models to generate certain types of drawings.

In-Painting
02 // ADOPTION

In-Painting & Workflow

These models excel at "In-Painting"—changing materials or textures within specific boundaries. Having a solid anchor to hold them to the intent of the prompt helps with the final product.

While 66% of architects were satisfied with AI for early-stage conceptualization in 2024, satisfaction drops precipitously to below 30% for later design phases, delineating the boundary between "suggestion" and "definition" [7].

World Simulator
03 // THE WORLD SIMULATOR

THeoretical Horizon

Future models will not merely predict pixel color but will simulate light transport and material properties within the neural network itself, potentially replacing engines like V-Ray [8].

The goal is translating prompts directly into fabrication-ready BIM models complete with bolts, welds, and tolerances [9]. Deep research also implies "World Models" that function as latent simulators for real-time VR [10].
Yet none of these exist at the time.

Precision Gap
04 // OPERATIONAL FLAWS

The Precision Gap

Generative models remain "hallucinatory" engines that prioritize plausibility over truth. They frequently generate "impossible" geometry, such as stairs to nowhere or columns that don't reach the ceiling [3].

While improving, models like SD3 still struggle with fine-grained details, often rendering legible text as gibberish due to processing it as visual patterns rather than linguistic symbols.

Limitations
05 // FUNDAMENTAL LIMITS

Symbol Grounding Problem

AI lacks phenomenological understanding; it processes architecture as statistical tokens rather than felt realities and cannot judge the "atmosphere" of a room [11].

Furthermore, AI cannot hold legal authorship or liability. It cannot be sued for design failure, meaning it can never fully replace the professional architect's role in ensuring public safety [12].

References
[3]
"Trough of Disillusionment" & Consistency Problems in Generative AI. Ref [3] in source PDF
[7]
Survey on AI Satisfaction in Architecture (2024) - Early vs Late Stage Satisfaction. Ref [7] in source PDF
[8]
Research on Physics-Compliant Neural Rendering (2025) & Light Transport Simulation. Ref [8] in source PDF
[9]
Text-to-LOD 500: Fabrication Ready Modeling from Semantic Prompts. Ref [9] in source PDF
[10]
World Models and Latent Simulation for Real-Time VR. Ref [10] in source PDF
[11]
Phenomenological Understanding in Artificial Intelligence. Ref [11] in source PDF
[12]
Legal Authorship and Liability in AI Design Practice. Ref [12] in source PDF
[33]
Towards 3D Consistent Video Generators - Adobe Research. https://research.adobe.com/publication/towards-3d-consistent-video-generators
[34]
A Predictive Self-Healing Model for Optimizing Production Lines - MDPI. https://www.mdpi.com/2673-4591/97/1/6
[37]
Programmable matter - Wikipedia. https://en.wikipedia.org/wiki/Programmable_matter