Recent advances in image generation models have demonstrated remarkable capabilities in creating photorealistic and imaginative visuals. However, a persistent challenge remains: accurately rendering reflections in mirrors. We anecdotally evaluate five image generation models and four video generation models using five prompts featuring both humans and objects. Our findings reveal that AI models frequently struggle with reflections, often generating distorted, inconsistent, or entirely incorrect images. Here is the data.
Generative image models, particularly those based on deep learning, have achieved impressive results in synthesizing realistic images of various scenes and objects. From generating human faces to creating fantastical landscapes, these models have shown a remarkable ability to learn complex data distributions and produce novel content. However, despite their progress, a seemingly simple element — the mirror — continues to pose a significant challenge. Reflections, governed by the precise laws of optics, often appear distorted, misplaced, or entirely absent in generated images. This article explores how mirrors pose a significant challenge for generative models and suggests that addressing this blind spot is crucial to achieve more realistic and physically plausible image synthesis.
We chose a range of generative models to assess how effectively popular image and video generation models can synthesize content with accurate mirror reflections. These models are readily available to the public.
Image generation models
We evaluated five image generation models including:
- Gemini which uses Imagen 3 as its generation backbone
- Adobe Firefly
- Bing which uses DALL-E 3
- Ideogram
- Freepik.com
These models were evaluated using the following prompts, some featuring humans and others containing only objects.
- An image of a young lady holding a pen in front of a mirror
- An image of two cats playing in front of a mirror
- An image of a chair in front of a mirror
- An image of a group of people in a room with a mirror in it
- An image of a kitchen with a mirror in it
The results from various models (some examples are shown below) show consistent patterns of reflection and perspective issues. The Gemini model struggles with incorrect or missing reflections and misjudged object placements, particularly with cats, chairs, and kitchen scenes. Some errors are subtle but noticeable.
The Ideogram model generally produces higher fidelity images, but also faces recurring issues. Hand reflections are often incorrect, and objects can appear inconsistently reflected. It particularly struggles with group images and faces, making significant errors in reflections and image coherence. Quality of faces in group images is poor.
Adobe Firefly has more severe errors, such as objects extending unnaturally outside mirrors and misaligned or missing reflections, leading to reduced realism.
Bing Image Creator often produces cartoonish images with significant reflection issues, misplacing or distorting elements.
Freepik-generated cat images show high visual quality but still suffer from similar reflection errors, highlighting a common challenge across models.
High-resolution versions of the generated images are available on the GitHub page associated with this article for further examination.
Video generation models
Additionally, we evaluated the following text-to-video generation models using only the first prompt from the previous subsection.
- veed.io
- pollo.ai (poolo 1.5)
- ltx.studio
- vidnoz.com
These models exhibit similar issues to those observed in the image generation models. In addition to errors in appearance and consistency, they also struggle with accurately generating motion in reflections. Reflected elements often move incorrectly or fail to correspond to the real-world physics of mirrored motion, further degrading the realism of the generated videos. As a result, their overall performance in handling reflections is particularly poor, making the generated videos noticeably flawed.