|
Figure 1. Whisk-generated. Sam's General Store.
|
In my previous blog post, I showed some Whisk-generated images similar to the one that is shown to the right in Figure 1. In the scene depicted in Figure 1, people on an asteroid have to wear spacesuits in order to walk from their parked spaceships to the store. What if there was a dome covering both the store and the parking lot?
I wanted Whisk to create an image in which the entire store and the surrounding parking lots were covered by a transparent glass dome. In another blog post, Whisk had been able to make glass domes such as the one shown in Figure 2, below.
|
Figure 2. Whisk-generated. A glass dome.
|
|
SSS asteroid
|
However, when I provided Whisk with the "subject", "scene" and "style" images shown to the right, Whisk generated the image shown in Figure 3. Is she walking on the surface of an asteroid, on her way to Sam's store?
The "style" image sown to the right was generated by Whisk from this text prompt: "The overall style has a high degree of photo-realism, with a focus on rich colors and intricate details. The colors are vibrant and saturated. The image is sharp and clear, with a high level of detail visible in the subjects and the scope of the scene. The overall mood is futuristic evoking a spirit of adventure and wonder. The subjects are depicted photo-realistically. The level of detail is consistent throughout, creating a cohesive and immersive scene." I've used this exact same text prompt many times previously; only on this occasion did Whisk generate a desert scene (see the lower panel in the tall image to the right).
I have no idea what algorithm Whisk used to generate "style" images from text prompts. Since starting to use Whisk, I've always been appalled that Whisk generates a military scene for this "style" text prompt. Maybe in this case, Whisk was not able to see anything interesting like a person in my "subject" image and so it generated a desert scene under the influence of the flat parking area on the asteroid.
|
Figure 3. Whisk-generated "storyboard". Shopping on an asteroid?
|
Suspect AI. I suspected that maybe Whisk could not really "see" the "subject" image and appreciate the large parking area around the store. Here is how
Gemini described the "subject" image: "
The image depicts a large, sprawling general store named "Sam's General Store" situated on a celestial body, possibly a planet or moon. The store is surrounded by a vast parking lot filled with various vehicles, including cars, trucks, and even flying vehicles. The scene is filled with celestial bodies like planets and asteroids, suggesting an interplanetary or interstellar setting. The overall style of the image is reminiscent of retro-futuristic artwork with a touch of humor."
Here is the description of the "subject" image that was generated by Whisk: "
A long, rectangular building sits on a flat, light-grey surface, surrounded by numerous vehicles. The building has a grey roof and dark windows. Two tall, thin towers rise from either side of the building. The vehicles are predominantly white or light-grey, with a few yellow ones visible. They appear to be futuristic in design, with rounded shapes and sleek lines. The surface the building and vehicles rest on is surrounded by a dark, seemingly rocky edge, which is itself surrounded by a dark void speckled with stars. Several large, dark asteroids are visible in the void, along with a few smaller, lighter-colored spacecraft. The words "SAM'S GENERAL STORE" are written in large, light-blue capital letters at the top of the image. A faint, light-colored band stretches across the bottom of the image, curving around the dark void." When I asked Whisk to cover the store with a glass dome, Whisk generated
Figure 4.
|
Figure 4. Whisk-generated general store under a glass dome.
|
|
Figure 5. Whisk-generated general store on an asteroid.
|
|
Figure 6. Whisk-generated general store on an asteroid with some covered parking. |
|
Figure 7. Whisk-generated general store on an asteroid with a covered store. |
|
SSS dome "subject"
|
Whisk really likes to get up close the the store and shrink the size of the parking area (see
Figure 5). I was only able to get Whisk to cover part of the store (see
Figure 7, above) or part of the parking lot (see
Figure 6). I asked for spaceships to be parked near the store, but Whisk often found it difficult not to generate vehicles that look like cars (
Figure 6).
I changed the "subject" and "scene" images to those that are shown to the left. Whisk now generated images such as the one in Figure 8, below. It does not look like Whisk was concerned to provide spacesuits to any of the shoppers in the parking lot.
|
Figure 8. Whisk-generated. Who needs air? Click image to enlarge.
|
Ask Gemini. I had a discussion with Gemini in which I tried to gain insight as to why it is so hard for Whisk to put a dome over both the store and the parking spaces. I showed Gemini
Figure 1 from
this blog page. Gemini generated the text description of the image that is shown above on this page.
|
Figure 9. Generated by Gemini who claimed that in this image: "A large translucent glass dome covers the entire store and the adjacent parking areas."
|
Me: "
Gemini, imagine that Sam's General Store is located on an asteroid in deep space. The parking area around the store is full of futuristic spacecraft. Gemini, please generate a similar image in which a large translucent glass dome covers the entire store and the adjacent parking areas."
Gemini generated the image shown in Figure 9.
Me: "Gemini, in that image that you generated, the glass dome does not cover the entire store and the adjacent parking areas. In fact it looks like the dome might be in the distance behind the store. I want an image in which the arching structure of a transparent glass dome covers the entire general store building and also the nearby parking area." {I uploaded the image that is shown in Figure 2, above.} Gemini, can you see the glass dome in the image that I just uploaded? Please describe the image that I just uploaded.
Gemini: "
Sure, here is an image of a large, sprawling general store named "Sam's General Store" situated on a celestial body, possibly a planet or moon. The store is surrounded by a vast parking lot filled with various vehicles, including cars, trucks, and even flying vehicles. The scene is filled with celestial bodies like planets and asteroids, suggesting an interplanetary or interstellar setting. The overall style of the image is reminiscent of retro-futuristic artwork with a touch of humor. A large translucent glass dome covers the entire store and the adjacent parking areas: The image you uploaded shows a large, sprawling general store named "Sam's General Store" situated on a celestial body, possibly a planet or moon. The store is surrounded by a vast parking lot filled with various vehicles, including cars, trucks, and even flying vehicles. The scene is filled with celestial bodies like planets and asteroids, suggesting an interplanetary or interstellar setting. The overall style of the image is reminiscent of retro-futuristic artwork with a touch of humor. A large translucent glass dome covers the entire store and the adjacent parking areas. However, the dome appears to be in the distance behind the store, rather than covering the entire store and parking area."
Me: "
Gemini, do you realize that I have uploaded two different images during this current conversation?"
Gemini: "Yes, I do.
You first uploaded an image of a retro-futuristic illustration of a general store on a celestial body, surrounded by spaceships.
Then, you uploaded a different image of a futuristic city with a large glass dome covering it.
I have responded to both images and tried to fulfill your request for an image with a dome covering the store and parking lot.
Is there anything else I can help you with regarding these images or anything else?"
|
Figure 10. Generated by Gemini.
|
Me: "
The image that you generated during this conversation does NOT show a dome covering the entire store and the surrounding parking spaces. I wonder why Imagen 3 has so much trouble generating an image in which the dome covers the entire store and also the parking spaces."
Gemini: "Sure, here you go!" {At this point, Gemini generated the image that is shown in Figure 10.}
Me: "The edge of the dome should extend almost to the bottom of the rendered image. The parked space craft and the store should be visible through the transparent glass of the dome. In the image that you just generated, the dome fails to cover most of the parked spaceships."
|
Figure 11. Generated by Gemini. |
Gemini: "Sure, here it is!" {At this point, Gemini generated the image that is shown in
Figure 11.}
It was not clear in Figure 10 that the dome extended past the front of the store. In Figure 11 it is clear that the dome covers the entire store and spaceships that are parked to the sides of the store and in front of the store.
I then used the image from Figure 11 as a reference image for Whisk, which allowed Whisk to generate the image shown in Figure 12. Whisk was still struggling to not include cars and instead have spaceships parked near the store.
|
Figure 12. Whisk-generated glass dome over the store and part of the parking area.
|
|
Figure 13. A larger dome?
|
When I complained that I wanted a larger dome covering a larger store and larger parking area, Gemini generated the image shown in
Figure 13.
|
SSS red Sam's
|
Going through another iteration, I took the Whisk-generated image from
Figure 12 and and used it as a new "subject" image (see the image to the left). With these new inputs, Whisk generated images such as the one shown in
Figure 14.
|
Figure 14. Whisk-generated small store. 18 parked spaceships.
|
Exits. I wanted there to be an obvious way for spaceships to to go into and out of the dome, so I included in the image description, "There are two openings in the glass dome that allow spaceships to enter and exit the dome." Whisk then generated images like the one shown in Figure 15. I wanted the store and the parking are to be larger, so I specified, "The dome is ten kilometers across and covers nine hundred parked spaceships." Whisk then made a slightly larger parking area as shown in Figure 16, below. Still not satisfied with the size of the glass dome, I tried specifying, "The dome is twenty kilometers across and covers nine thousand parked spaceships," but Whisk seemed unwilling to make a significantly larger parking area (see Figure 17).
|
Figure 15. Lower right; an opening for spaceships to enter the dome. 19 parked spaceships.
|
|
Figure 16. A larger dome with 50 parked spaceships.
|
|
Figure 17. Sixty parked spaceships.
|
Here in my final text prompt: "
A photorealistic rendering of science fiction setting featuring "Sam's General Store" located on an large asteroid in deep space. An extremely large translucent glass dome covers the entire general store and the all of adjacent vast parking areas. The store is depicted as a very large, low-slung building with a dark roof and several antenna or communication towers protruding from it. The wide parking areas that surround the store are filled with many futuristic spacecraft, including a few yellow and red vehicles that stand out among the predominantly white and silver ones, all rendered in great detail and in a hyperrealistic fashion. The glass dome is twenty kilometers across and covers the entire general store and a hundred thousand parked spaceships that are in the vast parking areas around the store. The parked spacecraft can be seen through the clear glass of the covering dome.
The scene is set against a dark space background filled with stars, and there are multiple asteroids visible in the background. Some small spacecraft can be seen flying in the space around the store. The entire image has a distinctive sci-fi aesthetic that reflects a vision of a future with advanced technologies, with a futuristic general store in an outer space setting. The seemingly endless rows of parked spaceships surrounding the store are rendered in great detail and white photorealistic quality. All of the parked spaceships fit in the protected space under the glass dome. The edge of the glass dome extends to the bottom of the rendered image. There are two sophisticated air-lock openings that are shaped like arches located in the sides of the glass dome that look like short tunnels and allow spaceships to enter and exit the dome without allowing the air inside the dome to escape into outer space. A flying spaceship can be seen passing through the air-lock opening in the wall of the dome." Whisk finally generated the image that is shown in
Figure 18. I did some manual modifications of the image in
Figure 18 so as enhance the appearance of the "tunnels" that penetrate the dome.
|
Figure 18. Too many spaceships to count. |
I tried using Leonardo and got images such as the one shown in Figure 19 (below) using their "legacy mode". I started with their "new" image generating mode and the first Leonardo-generated image that I got is shown in Figure 20. Figure 20 was using their default setting for how much weight was given to my reference image. For Figure 21, I increased the weight given to my reference image. As seen in Figure 21, Leonardo made all of the same mistakes as did Whisk, initially failing to make a glass dome that would cover all of the parking area.
As shown in the image to the right, Leonardo sometimes seemed to ignore the idea that it should generate images on the surface of an asteroid. Also, Leonardo seems to prefer circular buildings rather than rectangular buildings.
|
Figure 19. Generated by "Classic" Leonardo.
|
|
Figure 20. Generated by Leonardo "Flux". Less weight to my reference image.
|
|
Figure 21. Generated by Leonardo "Flux". More weight to my reference image. |
Next:
No comments:
Post a Comment