Feb 9, 2025

Leonardo in Avanté

Figure 1. Generated by Leonardo flow state.
 I began experimenting with the Leonardo image generation software last month. In my previous blog post, I used Google's Whisk to make some images depicting Pallis Atwrode in the city Avante on the planet Alphanor. Here. I share the related images that Leonardo was able to generate using this text prompt: "A stunningly detailed, full color, photo-realistic rendering of a futuristic restaurant that is below the surface of the ocean. A dinning room with glass walls and sea life beyond those transparent glass walls. Tables with white tablecloths are set, and diners are seated. A young woman with pale green-toned skin, and a pink halter top is gazing into the camera. The woman's pretty face and beautiful hair is rendered photo-realistically. An escalator is in the middle of the room, behind the woman with green skin".

Leonardo wardrobe malfunction.
 The image shown to the left illustrates one of the ways how Leonardo can get distracted (or cause a user like me to become distracted) while generating an image that depicts a pretty young woman (call her Pallis Atwrode). Sometimes Leonardo puts a warning on top of a newly-generated image warning that it might be NSFW or somehow "indecent". However, you can click right through these warnings and view the image.

According to this article, Leonardo has some arrangement with Google that supports Leonardo's "flow state" image generation, which rapidly spews out hundreds of variations on an image. Of course, speed of image generation is not everything, as illustrated in Figure 1. You might ask: what happened to her head? The answer is below, in Figure 2.

Figure 2. Generated by Leonardo flow state. Detachable head.
Figure 3. Generated by Leonardo.

Here is how Claude explained this "detachable head" (shown above in Figure 2): "Let me explain how diffusion models can produce "non-cohesive" images like the disconnected head in your example.

Diffusion models work by gradually denoising random noise into an image, guided by the text prompt. While this process is mathematically sophisticated, it doesn't inherently guarantee physical coherence or anatomical correctness. Here's why:

1. Local vs. Global Coherence:
- Diffusion models often excel at generating locally coherent details (like realistic skin texture or facial features)
- However, they can struggle with global coherence (ensuring all parts of the image work together anatomically)
- This is because the model processes different parts of the image somewhat independently during the denoising process

2. Training Data Patterns:
- The model learns patterns from its training data, but these patterns are statistical rather than based on understanding of physical rules
- It might learn 'faces go with bodies' as a general pattern, but this isn't a hard constraint in the generation process
- When combining multiple concepts from a complex prompt (futuristic restaurant + green skin + specific clothing), the model might prioritize local details over global anatomical correctness

Floating. Generated by Leonardo.
 3. Attention Mechanisms:
- In processing your prompt, the model's attention mechanisms might focus strongly on certain elements ('pretty face,' 'beautiful hair,' 'pink halter top')
- This can sometimes lead to these elements being generated semi-independently, especially when the prompt has many specific details

4. Cross-Frame Consistency:
- During the diffusion process, different parts of the image are refined through multiple steps
- Without strong enough constraints between different regions, anatomical connections (like between head and body) can become disconnected.
"

Long neck. 
Claude suggested a new text prompt: "High-fashion editorial photo of a futuristic underwater restaurant. Glass walls reveal vibrant sea life. White tablecloths are on the tables. A hyper-realistic floating head with green skin, dramatic makeup and styled blonde hair is above a mannequin wearing a pink halter top. Deep shadows, strong contrasts, film-like composition," which allowed Leonardo to make images of mannequins with either very long necks (see the image to the right) or completely detached heads. Usually Leonardo wanted to provide a way to hold up the head, such as a chain (see the image below).

detached head; poor Pallis
I have no idea how Leonardo decides on the length of hair for a woman in an AI-generated image when the text prompt only says "beautiful hair". As seen in Figure 2, above, and the series of images that are shown above in Figure 3, Leonardo often gave Pallis Atwrode short hair. Sometimes very short. 

The top panel in Figure 3 shows one of the Lenoardo-generated images for which there were no people seated at the tables. If you zoom in, then you can see that for some of the images there appear to be people seated at a table who have a naked torso. In other images, people are sitting on a "table" that looks like a bed.

Figure 4. Generated by Leonardo.
Leonardo can also change the "camera angle". In these Leonardo-generated images, most of the time the camera is close to being at the same level as Pallis. The first panel in the image to the right shows one of the rare images for which the camera was slightly lower than Pallis. The image to the right show that in some cases, the camera was positioned so as to be looking down on Pallis from above. In only one of these images with an altered camera angle did Pallis get a significant amount of the requested green color for her skin tone (the second panel from the bottom in Figure 4). That image was also unusual because of the "clouds" above her head.

Only rarely was there any text in these Leonardo-generated images. Figure 5, below, shows an example of one such image.

Figure 5. Generated by Leonardo.

 Claude's Analysis. Upon "looking" at Figure 5, Claude suggested: "This appears to be a photo taken inside an aquarium. The image shows what looks like an information or display panel with what appears to be 'Cotopea Alig' visible in the text."

Figure 6a. Generated by Leonardo. Where is everyone? Waiting for Godot Gersen.

Figure 6b. Generated by Leonardo. A right-leaning blond and a left-leaning brunette.
Figure 7. Maybe some guests at the Nautilus are aquans.

Some of the Leonardo-generated images appeared to place people in the water (see Figure 7, to the right). The upper panel in Figure 7 also did not have sensible escalators. 

The middle panel in Figure 7 has both an up and a down escalator (satisfying the fire code) but also seems to have a mirror on the ceiling creating a reflection of the escalators. The version of Pallis Atwrode in the middle panel shows that Leonardo will occasionally generate an image of a redhead.

Maybe the version of Pallis Atwrode in the lower panel of Figure 7 has some green tint around her eyes, but for many of these Leonardo-generated images there was little or no effort to match the text prompt directive for "green-toned skin".

Figure 8. Generated by Leonardo.

Shown in Figures 6 and 8 are some interesting asymmetries that were generated by Leonardo. In the top panel of Figure 8 one side of the room has a large glass panel while the other side seems to have an escalator going to the side. Maybe we finally found the kitchen of the Nautilus there towards the right.

In the middle panel of Figure 8 (to the left) there is an interesting light fixture and an opaque wall. Maybe this is special seating for people who prefer not to be constantly reminded that they are under water.

Figure 9. Generated by Leonardo.
The lower panel of Figure 8 has some unique pink light fixtures, including one very large one that might function as a load-bearing element and help support the roof against collapse.

The images in Figure 9 (to the right) show some of the variation in lighting that were generated by Leonardo. The top panel in Figure 9 almost seems to show a fish that is below the glass roof. This was one of the first images generated by Leonardo flow state for this text prompt and it has an interesting type of lighting from above.

I like the style of light fixtures in the second panel from the top in Figure 9 and also what looks like candles on the tables in the third panel.

The second panel from the bottom in Figure 9 seems to show bright Rigel light above the Nautilus. Maybe this a lunch time scene inside the Nautilus with Rigel still high in the sky above Avante.

Figure 10. Generated by Leonardo.
 More Tables Please. The lower panel in Figure 9 has an interesting vertically-oriented light fixture and a different kind of wall on the right side of the dinning room, another asymmetry that I could have put into Figure 8.

The images in Figure 10 (to the left) all center on Pallis Atwrode and often the escalator is off to one side, although the text prompt specified that the "escalator is in the middle of the room". The upper panel in  Figure 10 has interesting red lighting and could have been included in Figure 9. Once in a while Leonardo allows a hint of a nipple to protrude through the fabric of Pallis' halter top, but usually Leonardo is content to show off her cleavage.

Figure 11. Generated by Leonardo.
Many of the Leonardo-generated images had Pallis Atwrode positioned to one side of the image. However, as seen in Figure 11, Leonardo had trouble deciding if Pallis should be turned to the right or to the left. Sometimes Pallis was seriously turning her head and twisting her neck.

In some of the panels of Figure 11 it is clear that Leonardo was uncertain which side of Pallis was her front side and which was her back side.

All of these Leonardo-generated images in Figures 1 - 11 were generated by the "flow state" software. I also experimented with the Phoenix software and the use of Figure 15 from my previous blog post as a reference image.

The first of the Phoenix-generated images shown below was generated without any reference image (see Figure 12, below). With no reference image, Leonardo preferred to put a flat see-through panel on the roof of the dining room.

With 10% dependence on the reference image, Leonardo seemed to struggle with the escalators (see Figure 13, below). 

At 30% dependence on the reference image, Leonardo generated some long hair for Pallis (see Figure 15, below).

At 50% dependence on the reference image, Leonardo finally began to generated some of the smaller details from the reference image such as the "two copies of Tyhry" (see Figure 16, below). 

At 75% dependence on the reference image, Leonardo finally captured the idea that the copy of Tyhry wearing the blue dress in the reference image should be looking at Pallis (see Figure 17, below).  

I also tried the Leonardo "Flux Dev" image generator. In the Flux Dev-generated image shown below in Figure 18, several of the male diners have no shirt.

Figure 12. Phoenix-generated with no reference image.

Figure 13. Phoenix-generated with 10% weight on the reference image.

Figure 14. Phoenix-generated with 20% weight on the reference image.

Figure 15. Phoenix-generated with 30% weight on the reference image.
Figure 16. Phoenix-generated with 50% weight on the reference image.

Figure 17. Phoenix-generated with 75% weight on the reference image. Tyhry watching Pallis.

Figure 18. "Flux Dev"-generated with no reference image. "We only serve naked men."
Figure 19. "Portrait Perfect"-generated with "normal" dependence on the reference image (Figure 15 from my previous blog post). Pallis is across the table from Gersen. Flying fish (left side windows).

Figure 20. Whisk-generated with the default "storyboard" description of the image. The shark is inside!

Figure 21. Whisk-generated; using my edited "storyboard" description of the image. Manual face painting.
 
Portrait Perfect model.

 I tried one more Leonardo image generator ("Portrait Perfect", see the image to the right) for this scene with Pallis Atwrode (see Figure 19, above). The Leonardo-generated green skin tones for her face and arms in Figure 19 are interesting. Sadly, Figure 19 was ruined by what looks like flying fish set against clouds and the surface of the ocean. 

Figure 22. A new
Whisk "subject" image.
One of the Leonardo "Portrait Perfect"-generated images (not shown) had two women sitting together at a table. I used that image as a new "scene" image with Whisk, along with a Leonardo-generated image of a woman with dark hair. With those new inputs, Whisk then generated Figure 20

Green face by Whisk
 Face Painting. I modified the original Whisk-generated "storyboard" description from Figure 20 (specifying that Pallis have "long, red hair") and provided a new "subject" with a painted green face (the woman from Figure 19, see the image in Figure 22) as the Whisk "subject". Then Whisk generated Figure 21. In the Whisk generated image, there was no green at all in the face of Pallis Atwode. I had to manually make her face green. At the bottom of this blog post is one more Whisk-generated image similar to the one in Figure 21, with the same "green faced Pallis" from Figure 21 manually pasted in so as to cover up a bizarre looking Pallis with green patches on her face that had been generated by Whisk (see the image to the right).

Next: alien anti-gravity technology

Visit the Gallery of Movies, Book and Magazine Covers

No comments:

Post a Comment