![]() |
Figure 1. Generated by Leonardo flow state. |
![]() |
Leonardo wardrobe malfunction. |
According to this article, Leonardo has some arrangement with Google that supports Leonardo's "flow state" image generation, which rapidly spews out hundreds of variations on an image. Of course, speed of image generation is not everything, as illustrated in Figure 1. You might ask: what happened to her head? The answer is below, in Figure 2.
![]() |
Figure 2. Generated by Leonardo flow state. Detachable head. |
![]() |
Figure 3. Generated by Leonardo. |
Here is how Claude explained this "detachable head" (shown above in Figure 2): "Let me explain how diffusion models can produce "non-cohesive" images like the disconnected head in your example.
Diffusion models work by gradually denoising random noise into an image, guided by the text prompt. While this process is mathematically sophisticated, it doesn't inherently guarantee physical coherence or anatomical correctness. Here's why:
1. Local vs. Global Coherence:
- Diffusion models often excel at generating locally coherent details (like realistic skin texture or facial features)
- However, they can struggle with global coherence (ensuring all parts of the image work together anatomically)
- This is because the model processes different parts of the image somewhat independently during the denoising process
2. Training Data Patterns:
- The model learns patterns from its training data, but these patterns are statistical rather than based on understanding of physical rules
- It might learn 'faces go with bodies' as a general pattern, but this isn't a hard constraint in the generation process
- When combining multiple concepts from a complex prompt (futuristic restaurant + green skin + specific clothing), the model might prioritize local details over global anatomical correctness
![]() |
Floating. Generated by Leonardo. |
- In processing your prompt, the model's attention mechanisms might focus strongly on certain elements ('pretty face,' 'beautiful hair,' 'pink halter top')
- This can sometimes lead to these elements being generated semi-independently, especially when the prompt has many specific details
4. Cross-Frame Consistency:
- During the diffusion process, different parts of the image are refined through multiple steps
- Without strong enough constraints between different regions, anatomical connections (like between head and body) can become disconnected."
![]() |
Long neck. |
![]() |
detached head; poor Pallis |
The top panel in Figure 3 shows one of the Lenoardo-generated images for which there were no people seated at the tables. If you zoom in, then you can see that for some of the images there appear to be people seated at a table who have a naked torso. In other images, people are sitting on a "table" that looks like a bed.
![]() |
Figure 4. Generated by Leonardo. |
Only rarely was there any text in these Leonardo-generated images. Figure 5, below, shows an example of one such image.
![]() |
Figure 5. Generated by Leonardo. |
Claude's Analysis. Upon "looking" at Figure 5, Claude suggested: "This appears to be a photo taken inside an aquarium. The image shows what looks like an information or display panel with what appears to be 'Cotopea Alig' visible in the text."
![]() |
Figure 6a. Generated by Leonardo. Where is everyone? Waiting for |
![]() |
Figure 6b. Generated by Leonardo. A right-leaning blond and a left-leaning brunette. |
![]() |
Figure 7. Maybe some guests at the Nautilus are aquans. |
Some of the Leonardo-generated images appeared to place people in the water (see Figure 7, to the right). The upper panel in Figure 7 also did not have sensible escalators.
The middle panel in Figure 7 has both an up and a down escalator (satisfying the fire code) but also seems to have a mirror on the ceiling creating a reflection of the escalators. The version of Pallis Atwrode in the middle panel shows that Leonardo will occasionally generate an image of a redhead.
Maybe the version of Pallis Atwrode in the lower panel of Figure 7 has some green tint around her eyes, but for many of these Leonardo-generated images there was little or no effort to match the text prompt directive for "green-toned skin".
![]() |
Figure 8. Generated by Leonardo. |
Shown in Figures 6 and 8 are some interesting asymmetries that were generated by Leonardo. In the top panel of Figure 8 one side of the room has a large glass panel while the other side seems to have an escalator going to the side. Maybe we finally found the kitchen of the Nautilus there towards the right.
In the middle panel of Figure 8 (to the left) there is an interesting light fixture and an opaque wall. Maybe this is special seating for people who prefer not to be constantly reminded that they are under water.
![]() |
Figure 9. Generated by Leonardo. |
The images in Figure 9 (to the right) show some of the variation in lighting that were generated by Leonardo. The top panel in Figure 9 almost seems to show a fish that is below the glass roof. This was one of the first images generated by Leonardo flow state for this text prompt and it has an interesting type of lighting from above.
I like the style of light fixtures in the second panel from the top in Figure 9 and also what looks like candles on the tables in the third panel.
The second panel from the bottom in Figure 9 seems to show bright Rigel light above the Nautilus. Maybe this a lunch time scene inside the Nautilus with Rigel still high in the sky above Avante.
![]() |
Figure 10. Generated by Leonardo. |
The images in Figure 10 (to the left) all center on Pallis Atwrode and often the escalator is off to one side, although the text prompt specified that the "escalator is in the middle of the room". The upper panel in Figure 10 has interesting red lighting and could have been included in Figure 9. Once in a while Leonardo allows a hint of a nipple to protrude through the fabric of Pallis' halter top, but usually Leonardo is content to show off her cleavage.
![]() |
Figure 11. Generated by Leonardo. |
In some of the panels of Figure 11 it is clear that Leonardo was uncertain which side of Pallis was her front side and which was her back side.
All of these Leonardo-generated images in Figures 1 - 11 were generated by the "flow state" software. I also experimented with the Phoenix software and the use of Figure 15 from my previous blog post as a reference image.
The first of the Phoenix-generated images shown below was generated without any reference image (see Figure 12, below). With no reference image, Leonardo preferred to put a flat see-through panel on the roof of the dining room.
With 10% dependence on the reference image, Leonardo seemed to struggle with the escalators (see Figure 13, below).
At 30% dependence on the reference image, Leonardo generated some long hair for Pallis (see Figure 15, below).
At 50% dependence on the reference image, Leonardo finally began to generated some of the smaller details from the reference image such as the "two copies of Tyhry" (see Figure 16, below).
At 75% dependence on the reference image, Leonardo finally captured the idea that the copy of Tyhry wearing the blue dress in the reference image should be looking at Pallis (see Figure 17, below).
I also tried the Leonardo "Flux Dev" image generator. In the Flux Dev-generated image shown below in Figure 18, several of the male diners have no shirt.
![]() |
Figure 12. Phoenix-generated with no reference image. |
![]() |
Figure 13. Phoenix-generated with 10% weight on the reference image. |
![]() |
Figure 14. Phoenix-generated with 20% weight on the reference image. |
![]() |
Figure 15. Phoenix-generated with 30% weight on the reference image. |
![]() |
Figure 16. Phoenix-generated with 50% weight on the reference image. |
![]() |
Figure 17. Phoenix-generated with 75% weight on the reference image. Tyhry watching Pallis. |
![]() |
Figure 18. "Flux Dev"-generated with no reference image. "We only serve naked men." |
![]() |
Figure 19. "Portrait Perfect"-generated with "normal" dependence on the reference image (Figure 15 from my previous blog post). Pallis is across the table from Gersen. Flying fish (left side windows). |
![]() |
Figure 20. Whisk-generated with the default "storyboard" description of the image. The shark is inside! |
![]() |
Figure 21. Whisk-generated; using my edited "storyboard" description of the image. Manual face painting. |
![]() |
Portrait Perfect model. |
I tried one more Leonardo image generator ("Portrait Perfect", see the image to the right) for this scene with Pallis Atwrode (see Figure 19, above). The Leonardo-generated green skin tones for her face and arms in Figure 19 are interesting. Sadly, Figure 19 was ruined by what looks like flying fish set against clouds and the surface of the ocean.
![]() |
Figure 22. A new Whisk "subject" image. |
![]() |
Green face by Whisk |
Next: alien anti-gravity technology
![]() |
Visit the Gallery of Movies, Book and Magazine Covers |
No comments:
Post a Comment