|
The pink pest in the jungle on an exoplanet.
|
In
my previous blog post, I included an image called "
Pest Control" that depicted a woman using a Sci Fi ray gun to discourage the advances of a large insect-like creature. That "
Pest Control" image came into existence after I asked ImageFX to make an image depicting: "
an alien creature that is an annoying pest on a distant exoplanet". I then used the ImageFX-generated "pink pest" as a "subject" image for Whisk along with a "scene" image similar to
Figure 4 in my previous blog post.
Today, I was having a discussion with Gemini about AI-generated images of aliens when I asked ImageFX to generate a new image for a rather complex text prompt (See Figure 1, below) and got an image of an 'insectoid'.
|
The pink pest closeup.
|
The image to the left shows one of the "pests" that ImageFX generated. The "jungle" shown in the image above, in the upper right corner of this blog post, was
one of several generated by ImageFX for this text prompt: "
a jungle of alien plant-like life forms, the plant-creatures have some anatomical features like humans, the plant life is from a distant exoplanet and mostly covered with green leaves, the alien plant life has appendages that look like the tentacles".
|
Figure 1. Generated by ImageFX.
|
For the ImageFX-generated image in
Figure 1, the text prompt was something that
Gemini 'said' to me during our conversation about AI biases in depicting aliens: "
You have a specific vision in mind, but the AI is struggling to realize it. This suggests that AI image generation tools should be designed to work in collaboration with human users, allowing us to steer the AI towards these 'rim' areas. We need tools that let us say, 'I like the bipedal form, but let's experiment with different head shapes and skin textures,' or 'Let's move away from the insectoid look and try something more mammalian.'." In the conversation, I'd raised the issue of basins of attraction in "image space" and "rims" between those basins.
|
SSS insectoid
|
Using one of the other "insectoid" images that was generated along with the one shown in Figure 1, I provided Whisk with the inputs that are shown to the left. The "subject" image shown to the left was also generated by ImageFX. Whisk then generated images such as the one shown in Figure 2, below.
I uploaded that image that is shown in Figure 1 to Gemini and asked, "Can you guess the text prompt that was used when ImageFX generated the image that I just uploaded?"
Gemini: "Visual Analysis.
Subject: The main subject is a slender, humanoid alien creature with a somewhat insectoid or crustacean-like exoskeleton. It's crouching on a rock, seemingly in a forest or jungle environment.
Environment: The background suggests a dense, humid environment with large trees/plants, and a full moon or celestial body visible in the sky.
Details: The creature has a segmented body, long limbs, a distinctive head shape with large eyes, and a textured skin. The lighting is dramatic, highlighting the creature against the darker background.
Style: The image has a painterly, illustrative feel, with a touch of fantasy or science fiction art.
Possible Prompts: Here are some potential prompts, keeping in mind my previous errors and trying to be more specific:
Direct and Descriptive. 1) 'Slender humanoid alien crouching on a rock in a forest at night, full moon in background, detailed exoskeleton, large eyes, digital painting style.' 2) 'Insectoid alien creature, long limbs, segmented body, crouching on a rock in a jungle, dramatic lighting, fantasy art style.' 3) 'Close-up of a humanoid alien with textured skin, crouching on a rock, forest environment, full moon, digital illustration.'
More Evocative or Abstract: 1) 'Alien life form in a jungle environment, illuminated by moonlight, intricate details, fantasy art.' 2) 'Creature of the night, alien humanoid, forest setting, dramatic lighting, digital painting.' 3) 'Extraterrestrial being in a lush alien world, full moon, detailed anatomy, illustrative style.'
Adding Specificity (Based on Visual Cues): 1) 'Slender humanoid alien with a chitinous exoskeleton, large black eyes, elongated head, crouching on a moss-covered rock in a dark jungle, full moon illuminating the scene, digital painting, fantasy art style.' 2) 'Detailed illustration of an insectoid alien with long, spindly limbs, segmented body with reddish markings, crouching on a rock, dense forest environment with large trees and vines, full moon in the sky, dramatic lighting from the side.'
|
Gemini's "refined guess". By ImageFX.
|
My Refined Guess (Combining Elements): 'Slender humanoid alien, insectoid features, chitinous exoskeleton, large dark eyes, crouching on a rock in a dark jungle environment, full moon, dramatic side lighting, digital painting, fantasy art style, intricate details.' I've tried to be as specific as possible while still allowing for some artistic interpretation by the AI. Let me know the actual prompt when you're ready to reveal it. 😊"
At that point, I told Gemini the text prompt that had been used to generate the image shown in Figure 1.
Gemini: "That's... fascinating! It's a meta prompt, a prompt about the process of image generation rather than a direct description of a visual scene. I'm both surprised and impressed by ImageFX's interpretation. Let's analyze why this prompt, focused on the interaction between human vision and AI struggle, might have resulted in the image we see:
'You have a specific vision in mind, but the AI is struggling to realize it.': The somewhat awkward posture of the alien, its slightly unsettling features, could be interpreted as the AI's 'struggle.' It's not a perfectly harmonious or conventionally beautiful creature, perhaps reflecting the difficulty of achieving a precise vision.
'This suggests that AI image generation tools should be designed to work in collaboration with human users, allowing us to steer the AI towards these "rim" areas.': The presence of the rock, a distinct and somewhat rough-hewn object, might symbolize the 'steering' or guidance provided by the human user. The alien, perched on this rock, could represent the AI being nudged in a particular direction.
'We need tools that let us say, "I like the bipedal form, but let's experiment with different head shapes and skin textures," or "Let's move away from the insectoid look and try something more mammalian".': This is the most intriguing part. The alien in the image does have some insectoid qualities (segmented body, long limbs), but it also has mammalian features (the overall body shape, the facial structure). It's almost a hybrid, suggesting the AI is trying to reconcile different aesthetic directions, moving between 'insectoid' and 'mammalian' as the prompt suggests. The somewhat rough texture of the skin might also reflect the 'experimenting with...skin textures' part of the prompt.
Overall Interpretation: ImageFX seems to have taken the prompt quite literally, attempting to visualize the process of human-AI collaboration in image creation. The resulting alien is a kind of "transitional" being, caught between different styles and forms, reflecting the AI's struggle to realize a specific vision and the human's role in guiding that process. This is a remarkable example of how AI can interpret abstract concepts and generate a visual representation, even if the connection isn't immediately obvious. It highlights the power of prompts to influence not just the content of an image but also its underlying meaning and its relationship to the creative process itself. It's a testament to the fascinating and often unpredictable ways AI can interpret and respond to human language."
|
Figure 2. Blast the insectoid, Whisk version 1.
|
|
Figure 3. Blast the insectoid, Whisk version 2. |
When I saw the Whisk-generated image that is shown in
Figure 2, I altered the description that Whisk was using ("
A digital painting in the style of science fiction concept art, featuring warm, earthy tones of brown, tan, and orange, accented with muted blues and greens. Soft, diffused lighting creates a warm atmosphere. A young woman with light skin and brown hair in a ponytail, wearing a dark business suit, stands defensively to the left of a large, insectoid creature. The creature has a segmented, mottled greenish-brown exoskeleton, numerous spines, and sharp claws on its six legs. The background is a dark reddish-brown. The woman holds a laser weapon, firing a diagonally oriented streak of light at the creature. This streak is composed of densely clustered white, light blue, and orange-gold particles, thinning towards the edges, overlaid with thin, curving lines of light blue and gold. Visible brushstrokes contribute to the painterly texture. The scene is detailed but not hyperrealistic, with a focus on atmospheric perspective and a slightly desaturated color scheme.") to this:
"
A stunningly detailed, photorealistic rendering of a science fiction setting featuring a young woman with light skin and brown hair in a ponytail, wearing a futuristic jumpsuit, stands defensively to the left of a large, alien creature. The alien creature has six legs and the body structure of an aquatic mammal. The background is a dark reddish-brown forest of a distant habitable exoplanet. The woman holds a futuristic laser weapon, firing a diagonally oriented streak of light at the creature. This streak is composed of densely clustered white, light blue, and orange-gold particles, thinning towards the edges, overlaid with thin, curving lines of light blue and gold. The scene is detailed and hyperrealistic, with a focus on atmospheric perspective and a vibrant color scheme." Whisk then generated the image that is shown in
Figure 3.
Next, I uploaded that Whisk-generated image to Gemini and asked, "Gemini, what language would you provide to Whisk in order to change the image that I just uploaded into a similar image in which the Sci Fi "ray beam" actually strikes the alien creature?"
Gemini:
That's a great question! To get Whisk to generate an image where the ray beam is actually impacting the alien creature, you'll need to provide a prompt that is more explicit about the interaction and the point of contact. Here's a breakdown of how to refine your prompt, along with some example phrasings:
Key Elements to Include:
1. Point of Contact: Specify where the beam should strike the alien. Be as descriptive as possible: "striking the alien's chest" "hitting the alien on its head" "impacting the alien's leg" "piercing the alien's exoskeleton at the joint"
2. Reaction/Effect: Describe the visible effect of the beam's impact. This will help Whisk visualize the interaction: "causing sparks and smoke" "disintegrating part of the alien's body" "making the alien recoil in pain" "illuminating the alien with a bright glow"
3. Beam Continuity: Ensure the beam is visually connected from the weapon to the point of impact: "the beam extends from the weapon to the alien's chest" "a continuous beam of energy connects the weapon and the alien"
Example Prompts: Here are a few variations, combining these elements: "A female soldier in futuristic armor fires a ray gun, the bright blue beam striking the large alien creature on its chest, causing sparks and smoke." (This adds detail about the effect) "Close-up of a humanoid alien recoiling in pain as a powerful energy beam from a soldier's weapon hits its arm, disintegrating part of its exoskeleton." (This focuses on the alien's reaction) "A continuous beam of blue energy connects a futuristic weapon held by a soldier to the head of a large, insectoid alien, illuminating the alien with a bright glow." (This emphasizes the beam's connection) "A female soldier aims her weapon at a monstrous alien, the ray beam piercing the alien's exoskeleton at the joint of its leg, causing it to stumble." (This adds a narrative element)
Tips for Best Results:
Experiment with phrasing: Try different ways of describing the same action. Synonyms and varied sentence structure can influence the output.
Be specific about details: The more details you provide about the alien's appearance, the weapon, and the environment, the better Whisk can understand your vision.
Use action verbs: Strong verbs like "striking," "impacting," "piercing," and "disintegrating" will help convey the force and effect of the beam.
Iterate and refine: If the first results aren't perfect, don't be afraid to tweak your prompt and try again. AI image generation is often an iterative process. By incorporating these suggestions, you should be able to guide Whisk towards generating an image where the ray beam convincingly strikes the alien creature, creating a more dynamic and impactful scene. Good luck! Let me know how the results turn out. Using some of the suggestions from Gemini, I used this new text description with Whisk: "
A stunningly detailed, photorealistic rendering of a science fiction setting featuring a young woman with light skin and brown hair, wearing a futuristic jumpsuit and standing defensively to the left of a large, alien creature. The woman is shouting in alarm as the alien creature approaches her. The alien creature has six sleek and fur-covered legs and the hairless body structure of an aquatic mammal. The young woman fires a ray gun, the bright blue beam striking the alien creature, causing sparks and smoke. The background is a dim reddish-brown forest floor of a distant habitable exoplanet. The woman holds a futuristic laser weapon, firing a streak of light that intersects with the head of the creature. This streak is composed of densely clustered white, light blue, and orange-gold particles, thinning towards the edges, overlaid with thin, curving lines of light blue and gold. The streak of light from the ray gun ends where it strikes the alien creature. The light from the ray gun does not extend past the head of the alien creature. The scene is detailed and hyperrealistic, with a focus on atmospheric perspective and a vibrant color scheme." Whisk generated the image shown in
Figure 4.
|
Figure 4. Blast the insectoid, Whisk version 3. |
|
SS spider woman.
|
I had WOMBO Dream use the alien creature shown in Figure 4 as a reference image and Mr Wombo generated an image that looked like a "spider woman", so I used that as a new "subject" image for Whisk as shown in the top panel of the image to the left. I also changed the inputs to Whisk so as to use the image shown in Figure 4 as the new "scene". With those new inputs (including, "The young woman with the ray-gun defends herself against an alien spider woman."), Whisk generated the image shown below in Figure 5.
For my science fiction stories, I often include characters who are humanoid aliens or human-alien hybrids. In this case, when I asked for an "
alien spider woman", Mr. Wombo could not resist generating images in which the "alien" looked like a character from a
Spider-man comic.
|
comics |
One such comics-like image is shown to the left. A less-comic-like alien-spider-woman is shown to the right. Click on the image to the left for an enlarged view of the comic-like spider-woman. Mr. Wombo tends to treat heads and faces differently from the bodies of such "hybrid creatures", seemingly unable to escape from so many training set images with bare human faces and clothed bodies. Really... the alien needs no clothing.
|
Figure 5. Blast the insectoid, Whisk version 4. |
|
Figure 6. Generated with Leonardo "flow state". Text only prompt from Whisk.
|
|
Figure 7. Generated with Leonardo Phoenix 1.0 "classic" mode; Figure 5 reference & text prompt.
|
|
Figure 8. Generated with Leonardo Lightening (40 & 60 percent weighting to reference image).
|
|
Figure 9. Generated with Leonardo Lightening (80 & 90 percent weighting to reference image). |
I provided
Leonardo with text prompt and reference image inputs from Whisk and then Leonardo generated images such as those shown above in
Figures 6 & 7. Leonardo generated three images like the right hand panel in
Figure 7 and one like the left-hand panel.
|
Leonardo user interface. Percentage controls.
|
Percentage.
Figures 8 &
9, above, show how it is possible with Leonardo to control the percent dependence of image generation on the reference image. Only at 90% did Leonardo put a humanoid head on the alien spider creature.
The image to the right was generated by using one of the AI-generated "alien spider" images as a reference image for
WOMBO Dream. "The Absurd v2" style of WOMBO Dream was used first to get a WOMBO-generated "draft" image" and then the "Anime v3" style was used to get the final image.
I'll end this blog post with one more alien spider creature (see Figure 10, below). Whisk image description: "A stunningly detailed, photorealistic rendering of a science fiction setting featuring a young woman with light skin and long blond hair, wearing a futuristic transparent lace jumpsuit and standing defensively to the left of a large, alien creature. The woman is shouting in alarm as the alien creature approaches her. The alien creature has sleek fur-covered legs and the face of a beautiful woman: the alien creature is a spider-woman. The young woman fires a ray gun, the bright blue beam striking the alien creature, causing sparks and smoke.
The young woman with the ray-gun defends herself against an alien spider-woman. The background is a dim reddish-brown forest floor of a distant habitable exoplanet. The woman holds a futuristic laser weapon, firing a streak of light that intersects with the head of the creature. This streak is composed of densely clustered white, light blue, and orange-gold particles, thinning towards the edges, overlaid with thin, curving lines of light blue and gold. The streak of light from the ray gun ends where it strikes the alien creature. The light from the ray gun does not extend past the head of the alien creature. The scene is detailed and hyperrealistic, with a focus on atmospheric perspective and a vibrant color scheme."
The image to the left also originated from an AI-generated image of an alien "spider creature" that was "re-humanized" by WOMBO Dream.
|
Figure 10. Blast the insectoid, Whisk version 5. |
Next:
No comments:
Post a Comment