←
AI Image Generation
The Client is a manufacturer of roof windows and skylights, focused on bringing daylight and fresh air into indoor spaces. Their brand is built on the transformative quality of natural light — how it shapes rooms, moods, and everyday life.
We were asked to explore the current state of AI image generation and its potential for the client's visual content. The goal was to understand what's possible, what's not, and where the technology could be heading.
Two Directions
We defined two creative directions to explore, both centered on the presence of natural daylight without depicting the product directly.
Direction 1
Abstract
Evoking the presence of daylight through close-up textures, materials, and atmospheric details — without showing a window or room. Think light cones falling on linen, ceramic, fruit.
Direction 2
Indicative
Full room scenes where the light source is implied but never shown. Hinting at the product through distinctive angled light patterns and shadows cast from above.
Starting Point
I started with a basic Flux Dev workflow in ComfyUI — text prompt to image generation. The initial outputs had reasonable composition but didn't feel right. The textures were plastic. The lighting was generic. Nothing matched the editorial, photographic quality of the client's existing imagery.
More importantly, the light patterns were wrong. Every image showed light from a vertical window — sharp horizontal beams across a wall. But roof windows cast fundamentally different light: angled rectangular patches on the floor, coming from above. This distinction is central to the brand, and the AI couldn't produce it.
3D simulations comparing vertical vs. roof window light cone geometry.
Prompt Architecture
Before resorting to model training, I spent significant time on prompt engineering — testing over 500 variations. I developed a structured prompt format with five layers: subject, camera position, light source, LoRA activators, and style prompts. Each layer controlled a different aspect of the output.
The style prompts referenced specific camera systems (Hasselblad X2D 100C), photographic qualities (ultra-high resolution, photorealistic textures), and editorial approaches (soft but directional lighting, editorial food photography). This pushed the aesthetic quality significantly, but couldn't solve the light cone geometry problem.
No amount of prompt engineering could make the model generate light from a roof window. The concept simply didn't exist well enough in the training data.
Training Custom LoRAs
When prompting hit its ceiling, I trained custom LoRA adapters using the client's existing image library from their DAM. The process involved multiple iterations — varying dataset size, training epochs, LoRA rank, and learning rate to find the right balance between style adherence and flexibility.
| Dataset | 48 curated images from client DAM |
| Training | 4 repeats × 7 epochs (1,344 steps) |
| LoRA Rank | 4 |
| Learning Rate | 1e-4 |
| Data Size | 1024px |
| Training Time | ~3 hours |
| Trigger Word | Case-sensitive brand style activator |
I also experimented with CLIP model selection. Switching from Clip L to Long Clip Vit-L made a noticeable difference — fewer artifacts, cleaner compositions, and better adherence to complex multi-layered prompts.
Results — Abstract
The LoRA transformed what the model could produce. Abstract outputs now matched the client's DAM imagery — photorealistic textures, correct directional daylight, and an editorial quality that felt intentionally photographed rather than generated.
Left: reference from the client's existing asset library. Right: AI-generated with custom LoRA + structured prompts.
Results — Indicative
The first LoRA iteration still struggled with the indicative direction — many outputs showed roof windows in the frame, which wasn't desired. The model needed to imply the light source without depicting it.
After refining the training dataset and adjusting the approach, the second LoRA iteration cracked it. The model could generate convincing attic and loft spaces — bedrooms, bathrooms, kitchens, home offices, dining rooms — all with characteristic roof-window light patterns, without showing the product.
Left: early LoRA showing window in frame (not desired). Right: refined LoRA with correct light but no visible product.
Inpainting & Refinement
Even fine-tuned outputs contain artifacts — garbled text on book spines, impossible object geometry, inconsistent shadows. Single-shot generation isn't production-ready. I built two additional ComfyUI workflows to close the gap.
Workflow 1
Object Inpainting
Mask and regenerate specific objects — fix broken geometry, replace garbled text, add or remove elements while preserving composition and lighting.
Workflow 2
Detail Pass
A second-pass workflow using differential diffusion to upscale and refine textures, sharpen edges, and add photographic detail without altering the base composition.
The Generator Tool
Beyond the R&D outputs, I built a web-based image generation tool for the client's team. The interface abstracts away the complexity of ComfyUI and prompt engineering, enabling creatives to generate brand-consistent imagery directly.
The tool supports three modes matching the creative categories (Abstract, Indicative, Edit), includes an AI-enhanced prompt toggle, and gives power users control over aspect ratio, composition locking, and CFG scale. A beta launched with a select group within the client organization.
What I Learned
Prompt engineering has a ceiling. When a concept doesn't exist well in training data, no amount of prompt craft will overcome it. Fine-tuning is the answer — and surprisingly accessible. A rank-4 LoRA trained on just 48 images for three hours fundamentally changed what the model could produce.
CLIP model selection matters more than expected. Switching encoders reduced artifacts and improved complex prompt adherence with no additional training. And the real production workflow isn't generation — it's generation plus inpainting plus a detail pass. The multi-step pipeline is what makes the output actually usable.
The LoRA didn't just learn a style — it learned the physics of roof-window light.