According to Gemini, the English language word "whisk" can be traced back to the Middle English word "wyske". It is possible that the plural of "wyske" was "wysken". In my previous blog post, I documented my first experience using Google's Whisk software to generate images. Here in this blog post, I describe my first efforts to use Whisk as a tool for making what I'm tempted to refer to as "wysken", my silly name for images that can illustrate my science fiction stories.
Note: the numbers used for the figures (below) continue from those used in the previous blog post (1-32).
Figure 33. Tyna as a subject.
Wysken, Take Three.
When I create images for my science fiction stories, I often have a
vision in my mind for a setting on a distant exoplanet. I continually
struggle with AI-image generating systems to find ways to produce images
as story illustrations that match what is in my imagination. For the
rest of this blog post, I'm going to try to use Whisk to generate
illustrations for two of my science fiction works in progress, 1) a story called "D*"
which I started in December 2023 and 2) The Nanites of Love, which I started in 2022 and have not yet been able to complete. I'm current working on Part 6 of "D*",
which involves a character named Tyna who has visions of the future.
These visions arrive by way of the Sedron Time Stream and often Tyna
does not understand the relationship of the vision to her life.
Figure 34. Exoplanet alien life.
That's Tyna in Figure 33, with some blue alien creatures lurking behind her. The images in Figure 33 and in Figure 34 were both generated by Wombo Dream. To generate Figure 34, I used this text prompt: "colorful
plants on a hillside, the hill is covered by plants, a creature that
looks like an alien dinosaur on the savanna of an exoplanet, alien plant
life" and the "Monster V3" style. For the Whisk style, I provided this text prompt: "The
overall style is reminiscent of magazine cover illustrations, with a
focus on rich colors and intricate details. The colors are vibrant and
saturated. The image is sharp and clear, with a high level of detail
visible in the subjects and design of the scene. The overall mood is
futuristic evoking a spirit of adventure and wonder. The subjects are
depicted photo-realistically. There is fractal complexity in the living
plants and animals of the scene. There is photo-realistic detail in the
hair and clothing of the subjects".
Figure 35. Whisk-generated style image.
The Imagen-3 software generated image that was automatically generated by Whisk to illustrate that requested style is shown in Figure 35.
I was rather startled to see two dudes with blasters in this "style
image". I almost never have weapons or violence in my stories and this
is a good example of how AI software often forces unwanted elements on
me. I was amused that Imagen 3 included some text in the Whisk-generated
image. The two Whisk generated images for this "subject", "scene" and
"style" are shown below in Figure 36.
Figure 36. The two similar "storyboards" that were generated for my third Whisk experiment.
SSS#3
Here is the description of these images that was also generated by Whisk: "A
digital painting in a painterly style, featuring a muted, earthy
palette dominated by greens, browns, and muted yellows. Soft, diffused
lighting creates depth and atmosphere. The style is reminiscent of
fantasy illustration, with a focus on detail and texture. A young woman
with long, wavy blonde hair and fair skin, wearing a light beige
blazer, stands looking off to the side with a serious expression. Behind
her, partially obscuring her shoulders, are two blue dragons, their
scales and textures highly detailed, but rendered in muted greens and
browns. Their eyes are a dull yellow. The dragons and woman are
positioned against a background of a vibrant, but desaturated, blue,
purple, and orange dinosaur standing on a gently sloping hill. The
dinosaur's spiky protrusions and fantastical plants are rendered in
muted purples, pinks, yellows, and oranges. Purple mountains and a pale
blue celestial body are visible under a bright, but desaturated, teal
sky. The overall mood is mysterious and slightly ominous, with a vintage
or nostalgic feel. The rendering is highly detailed, with a focus on
realism in the depiction of natural elements."
Here is the Gemini-generated description of the "subject" image that I uploaded to Whisk (see Figure 33, above): "A
painting of a young woman with long, wavy blonde hair and fair skin.
She is wearing a light beige blazer. She is looking off to the side,
with a serious expression. Behind her, partially obscuring her
shoulders, are two blue dragons with yellow eyes and sharp teeth. The
dragons are highly detailed, with scales and textures clearly visible.
The background is a plain off-white. The style is realistic, with a
focus on detail and light. The overall mood is mysterious and slightly
ominous".
I used the right hand panel from Figure 36 as a reference image for Mr. Wombo
and got the image that is shown to the left. I was amused that Mr.
Wombo seemed to show the alien creature bursting up from below the
ground.
Figure 37. A meaner alien has Tyna worried (Wombo).
And here is the Gemini-generated description of the "scene" image that I uploaded to Whisk (see Figure 34, above): "A
vibrant, colorful dinosaur, predominantly blue, purple, and orange,
stands prominently in the foreground. Its body is adorned with spiky,
brightly colored protrusions along its back and head. The dinosaur's
mouth is open, revealing sharp teeth. Its legs are powerful and clawed.
The dinosaur is positioned on a gently sloping hill, covered in a
variety of brightly colored, fantastical plants. These plants range in
color from deep purples and pinks to bright yellows and oranges, with
various textures and shapes. The background features a landscape of
purple mountains under a bright, teal sky. A pale blue celestial body,
possibly a moon, is visible in the upper left corner of the image. The
overall style is fantastical and surreal, with an emphasis on bold
colors and vibrant textures. The lighting suggests a daytime scene,
with the sun seemingly positioned behind the viewer".
I'll confess that I like the results in Figure 36
because for my story illustrations, I'm often trying to reproduce the
kinds of science fiction book cover illustrations that I grew up
enjoying back in the 1970s during my personal Golden Age of discovering
Sci Fi. At the "Google Labs" Discord server, I saw a user comment on how sensitive Whisk is to small changes in the "style". I tried to understand why Whisk created Figure 36 in a "painterly style". Gemini decided that my "subject" image (see Figure 33, above) was "A painting of a young woman". Maybe Gemini's interpretation of that image of Tyna as being a painting caused Whisk to render Figure 36 in a "painterly style". I tried editing out the "painterly style" from the text description.
My new description: "A
photo-realistic depiction of an exoplanet, featuring an earthy palette
dominated by greens, browns, and yellows. Soft, diffused lighting
creates depth and atmosphere. The style is reminiscent of science
fiction illustration, with a focus on detail and texture. A young woman
with long, wavy blonde hair and fair skin, wearing a light beige
blazer, stands looking off to the side with a serious expression. Behind
her, partially obscuring her shoulders, are two blue alien creatures,
their scales and textures highly detailed, but rendered in muted greens
and browns. Their eyes are a dull yellow. The blue creatures and the
woman are positioned against a background of a vibrant, but desaturated,
blue, purple, and orange dinosaur standing on a gently sloping hill.
The dinosaur's spiky protrusions and fantastical plants are rendered in
muted purples, pinks, yellows, and oranges. Purple mountains and a pale
blue celestial body are visible under a bright, but desaturated, teal
sky. The overall mood is mysterious and slightly ominous, with a vintage
or nostalgic feel. The rendering is highly detailed, with a focus on
realism in the depiction of natural elements".
Figure 38. The original images from Figure 36 were updated to be more photo-realistic and science fictionish.
Alien first contact.
I was particularly intrigued by the right hand panel in Figure 38.
In my imagination, there are now two bipedal aliens in front of Tyna in
this image. There is a larger version of that image
that was slightly processed so as to enhance the colors (Figure 39, below).
Alternate science fiction cover illustration.
The image shown to the right is a related image that was generated by
WOMBO Dream and then manually turned into a book cover illustration by
me. In the original image generated my Mr. Wombo, there was what looked
like an observation tower on top of the mountain. I slightly enlarged
that tower and added a red beacon and a red laser beam aimed into outer
space from the mountain top.
Figure 39. Generated by Whisk; enlarged from Figure 38, color adjusted.
Figure 40. A talking alien.
There seems to be a mysterious darkened hemisphere towards the left side of the left hand panel in the image shown in Figure 38,
above. In my experience, AI image generators often cannot resist
placing multiple moons and planets in the sky if the word "exoplanet" is
mentioned. My guess is that the darkened hemisphere was almost another
moon.
Talking alien. I could not resist making a version
of the "First Contact" book cover illustration in which the alien has an
open mouth and I can imagine that this sentient alien is talking to
Tyna (see Figure 40, the image to the right).
Take Four. In Part 3 of my science fiction story The Nanites of Love,
there is a visit to the Erre District on the planet Tar'tron, near
the Galactic Core. I asked Gemini to generate an image depicting a:
Figure 41. Image generated by Gemini.
"science
fiction scene on an Earth-like exoplanet called "Erre" that is only
5,000 light-years from the center of our galaxy. Imagine an outdoors
marketplace on Erre at night, with the bright stars of the galactic core
glowing above in the sky. The market has a disorganized maze of stalls
and shops where various oddments of futuristic technology are on display".
Here is my text prompt for the new Whisk "style": "The
overall style is reminiscent of magazine cover illustrations, with a
focus on night-time colors and intricate details. The colors of the
subjects and scene are vibrant and saturated, but viewed under the dim
illumination of night.
Figure 42. Whisk's image illustrating the requested style.
The
image is sharp and clear, with a high level of detail visible in the
subjects and design of the scene. The overall mood is futuristic evoking
a spirit of adventure and wonder. The beautiful subjects are depicted
photo-realistically. There is fractal complexity in the market place
stalls of the scene, the walls of buildings and the tiled floor. There
is photo-realistic detail in the hair and clothing of the subjects".
Figure 43.
Gemini's description of subject #1 (she is shown in the top panel of the image to the left, generated by Mr. Wombo.): "A
digital painting of a young woman in profile view, facing left. The
woman has long, bright blue hair styled in a way that suggests a
ponytail pulled back from her face. A silver metallic band or headband
is visible in her hair, near the top of her head. Her skin tone is fair,
almost porcelain-like, and her eyes are a light blue. Her lips are
painted a light pink or rose color. She appears to be wearing a garment
with a gold and light blue patterned sleeve or shoulder piece, which has
a shimmering or sparkly texture. The background is blurry and dark,
with hints of blue and purple tones, suggesting a nighttime or
futuristic setting. The overall style is highly stylized and painterly,
with a focus on smooth gradients and soft lighting." The uploaded image for subject #2 (see the second panel in Figure 43) was also described as a "digital painting" by Gemini.
Gemini's description of the uploaded scene (the image was generated by Mr. Wombo, Figure 43): "A
digital painting depicts a cobblestone street at night, under a vibrant
Milky Way galaxy. The street is lined with stalls or shops on either
side. The stalls are dark, with some having illuminated signs or
displays. The stalls on the left side of the street appear to be lit
with a cool, bluish light, while the stalls on the right are lit with a
warmer, yellowish light. The stalls appear to sell various items,
including what looks like technological devices and artifacts. Three
figures are visible in the distance walking down the street. One figure
is in the center, wearing a long, dark robe. The other two figures are
slightly behind and to the sides of the central figure, also appearing
to wear dark clothing.
Two
additional figures are visible closer to the viewer. One is standing
near a stall on the left, appearing to be a statue-like figure with
light-colored skin and what appears to be a robe or cloak. The other
figure is also near a stall on the left, appearing to be a slender
figure with light-colored skin. The buildings lining the street are
tall and dark, with pointed tops, giving the scene a medieval or fantasy
feel. A tall, slender tower is visible in the background, also dark in
color. The Milky Way is a prominent feature in the sky, with its
bright, swirling colors contrasting with the dark buildings and street.
The stars are numerous and clearly visible against the dark sky. The
overall color palette is dark, with cool blues and purples dominating
the left side of the street and warmer oranges and yellows on the right.
The cobblestones are a mix of dark and light tones." One of the two Whisk-generated "storyboards" is shown below...
Figure 44. A Whisk-generated storyboard created using the input that is shown in Figure 43.
Here is Gemini's description of the Whisk-generated storyboard shown in Figure 44: "A
painterly digital painting in a warm, slightly desaturated fantasy art
style. The scene depicts a cobblestone street at night, illuminated by
the vibrant Milky Way. Warm oranges, reds, and yellows dominate the
lighting from the market stalls, contrasting with cooler blues and
purples in the shadows and background. Two young women stand on the
street. One has long, bright blue hair styled in a ponytail, fair skin,
light blue eyes, and rose-colored lips. She wears a shimmering gold and
light blue garment. The other has long, wavy purple hair, light skin,
blue eyes, and wears a teal robe with gold accents and blue gemstone
jewelry. Her expression is serious. The street is lined with dark
stalls selling technological artifacts, lit with cool blue and warm
yellow light. Tall, dark buildings with pointed tops line the street,
and a slender tower is visible in the distance. Three figures in dark
robes walk down the street in the distance. Two additional light-skinned
figures stand near stalls on the left. Visible brushstrokes and varied
textures create a sense of depth and atmosphere."
I tried editing the AI-generated description to make this more of a science fiction setting: "A
photo-realistic, slightly desaturated science fiction art style. The
scene depicts a futuristic tiled street at night, illuminated by the
many bright stars that surround this exoplanet near the center of the
galaxy. Warm oranges, reds, and yellows dominate the lighting from the
market stalls, contrasting with cooler blues and purples in the shadows
and background. Two beautiful young women are seen shopping for a new
digital language translation device. One cute woman has long, bright
blue hair styled in a ponytail, fair skin, light blue eyes, and
rose-colored lips. She wears a shimmering gold and light blue garment.
The other pretty girl has long, wavy purple hair, light skin, blue eyes,
and wears a teal robe with gold accents and blue gemstone jewelry. Her
expression is serious. The street is lined with dark stalls selling
technological artifacts, lit with cool blue and warm yellow light.
Tall, dark buildings with pointed tops line the street, and a slender
tower is visible in the distance. Three figures dressed in futuristic
metallic jumpsuits walk down the street in the distance. Two additional
light-skinned figures stand near stalls on the left. The
photo-realistic depiction of the two human subjects and varied textures
create a sense of depth and atmosphere". The updated storyboard image is shown below in Figure 45.
Figure 45. Whisk-generated; updated storyboard, more of a science fiction scene.
The
Whisk-generated "figures dressed in futuristic metallic jumpsuits" in Figure 45 are
rather strange and not what I was expecting. In my experience, Ai image generating software tends to trot out some pretty lame "standardized" versions of robots and aliens and insert then into images, regardless of what users actually want. It is often hard work to avoid these powerful attractors. I had Mr. Wombo generate
the alternative jumpsuits shown in Figure 46.
I tried to have Whisk, "Change the human figures in the background to dress them in flashy metallic jumpsuits, walking Victoria's Secret catwalk style," but Whisk then entirely changed the entire scene as shown in Figure 47.
Figure 47. By Whisk; flashy metallic jumpsuits, catwalk style.
From the new Gemini-generated description of Figure 47: "Three
figures in flashy metallic jumpsuits walk down the street, Victoria's
Secret catwalk style. Two additional figures stand near stalls on the
left." I had to re-edit the storyboard description in Whisk once again (as above, for Figure 45) and got the updated storyboard that is shown below in Figure 48.
Figure 48. Whisk generated two figures in metallic jumpsuits (now on the right hand side).
This (Figure 48, above) is not too bad for an illustration of the Erre District on Tar'tron. I could not resist making some alternative versions of this scene in which there was a more unusual device being held, such as the one shown in the image to the right (and the image at the top right corner of this blog post). Some other variants on these themes by both Whisk and Mr. Wombo are shown here.
Whisk options.
After playing with Whisk for two days and working through the creation of four "storyboards" (why not call them wysken?), it is clear that Whisk has substantial advantages over working with the free version of Gemini, which still does not like to generate human images.
To Do: crack the code on the secret of using other aspect ratios for the images that are created by Whisk.
Next:
Bonus: AI-generated music made with MusicFx ..... Music text prompt: "science fiction music composed by a technologically advanced sentient humanoid alien on a distant exoplanet".
No comments:
Post a Comment