Jan 5, 2025

Making Wysken

On Tar'tron (see Making Vendela)
 According to Gemini, the English language word "whisk" can be traced back to the Middle English word "wyske". It is possible that the plural of "wyske" was "wysken". In my previous blog post, I documented my first experience using Google's Whisk software to generate images. Here in this blog post, I describe my first efforts to use Whisk as a tool for making what I'm tempted to refer to as "wysken", my silly name for images that can illustrate my science fiction stories.

Note: the numbers used for the figures (below) continue from those used in the previous blog post (1-32).

Figure 33. Tyna as a subject.
 Wysken, Take Three. When I create images for my science fiction stories, I often have a vision in my mind for a setting on a distant exoplanet. I continually struggle with AI-image generating systems to find ways to produce images as story illustrations that match what is in my imagination. For the rest of this blog post, I'm going to try to use Whisk to generate illustrations for two of my science fiction works in progress, 1) a story called "D*" which I started in December 2023 and 2) The Nanites of Love, which I started in 2022 and have not yet been able to complete. I'm current working on Part 6 of "D*", which involves a character named Tyna who has visions of the future. These visions arrive by way of the Sedron Time Stream and often Tyna does not understand the relationship of the vision to her life.

Figure 34. Exoplanet alien life.
That's Tyna in Figure 33, with some blue alien creatures lurking behind her. The images in Figure 33 and in Figure 34 were both generated by Wombo Dream. To generate Figure 34, I used this text prompt: "colorful plants on a hillside, the hill is covered by plants, a creature that looks like an alien dinosaur on the savanna of an exoplanet, alien plant life" and the "Monster V3" style. For the Whisk style, I provided this text prompt: "The overall style is reminiscent of magazine cover illustrations, with a focus on rich colors and intricate details. The colors are vibrant and saturated. The image is sharp and clear, with a high level of detail visible in the subjects and design of the scene. The overall mood is futuristic evoking a spirit of adventure and wonder. The subjects are depicted photo-realistically. There is fractal complexity in the living plants and animals of the scene. There is photo-realistic detail in the hair and clothing of the subjects".

Figure 35. Whisk-generated style image.
The Imagen-3 software generated image that was automatically generated by Whisk to illustrate that requested style is shown in Figure 35. I was rather startled to see two dudes with blasters in this "style image". I almost never have weapons or violence in my stories and this is a good example of how AI software often forces unwanted elements on me. I was amused that Imagen 3 included some text in the Whisk-generated image. The two Whisk generated images for this "subject", "scene" and "style" are shown below in Figure 36.
Figure 36. The two similar "storyboards" that were generated for my third Whisk experiment.

SSS#3
Here is the description of these images that was also generated by Whisk: "A digital painting in a painterly style, featuring a muted, earthy palette dominated by greens, browns, and muted yellows. Soft, diffused lighting creates depth and atmosphere. The style is reminiscent of fantasy illustration, with a focus on detail and texture.  A young woman with long, wavy blonde hair and fair skin, wearing a light beige blazer, stands looking off to the side with a serious expression. Behind her, partially obscuring her shoulders, are two blue dragons, their scales and textures highly detailed, but rendered in muted greens and browns. Their eyes are a dull yellow. The dragons and woman are positioned against a background of a vibrant, but desaturated, blue, purple, and orange dinosaur standing on a gently sloping hill. The dinosaur's spiky protrusions and fantastical plants are rendered in muted purples, pinks, yellows, and oranges. Purple mountains and a pale blue celestial body are visible under a bright, but desaturated, teal sky. The overall mood is mysterious and slightly ominous, with a vintage or nostalgic feel. The rendering is highly detailed, with a focus on realism in the depiction of natural elements."

Is that a worm?
Here is the Gemini-generated description of the "subject" image that I uploaded to Whisk (see Figure 33, above): "A painting of a young woman with long, wavy blonde hair and fair skin. She is wearing a light beige blazer. She is looking off to the side, with a serious expression. Behind her, partially obscuring her shoulders, are two blue dragons with yellow eyes and sharp teeth. The dragons are highly detailed, with scales and textures clearly visible. The background is a plain off-white. The style is realistic, with a focus on detail and light. The overall mood is mysterious and slightly ominous".

I used the right hand panel from Figure 36 as a reference image for Mr. Wombo and got the image that is shown to the left. I was amused that Mr. Wombo seemed to show the alien creature bursting up from below the ground.

Figure 37. A meaner alien
has Tyna worried (Wombo).
And here is the Gemini-generated description of the "scene" image that I uploaded to Whisk (see Figure 34, above): "A vibrant, colorful dinosaur, predominantly blue, purple, and orange, stands prominently in the foreground.  Its body is adorned with spiky, brightly colored protrusions along its back and head. The dinosaur's mouth is open, revealing sharp teeth.  Its legs are powerful and clawed. The dinosaur is positioned on a gently sloping hill, covered in a variety of brightly colored, fantastical plants. These plants range in color from deep purples and pinks to bright yellows and oranges, with various textures and shapes.  The background features a landscape of purple mountains under a bright, teal sky.  A pale blue celestial body, possibly a moon, is visible in the upper left corner of the image.  The overall style is fantastical and surreal, with an emphasis on bold colors and vibrant textures.  The lighting suggests a daytime scene, with the sun seemingly positioned behind the viewer".

Generated by Mr. Wombo.
I'll confess that I like the results in Figure 36 because for my story illustrations, I'm often trying to reproduce the kinds of science fiction book cover illustrations that I grew up enjoying back in the 1970s during my personal Golden Age of discovering Sci Fi. At the "Google Labs" Discord server, I saw a user comment on how sensitive Whisk is to small changes in the "style". I tried to understand why Whisk created Figure 36 in a "painterly style". Gemini decided that my "subject" image (see Figure 33, above) was "A painting of a young woman". Maybe Gemini's interpretation of that image of Tyna as being a painting caused Whisk to render Figure 36 in a "painterly style". I tried editing out the "painterly style" from the text description. 

Generated by Mr. Wombo.
My new description: "A photo-realistic depiction of an exoplanet, featuring an earthy palette dominated by greens, browns, and yellows. Soft, diffused lighting creates depth and atmosphere. The style is reminiscent of science fiction illustration, with a focus on detail and texture.  A young woman with long, wavy blonde hair and fair skin, wearing a light beige blazer, stands looking off to the side with a serious expression. Behind her, partially obscuring her shoulders, are two blue alien creatures, their scales and textures highly detailed, but rendered in muted greens and browns. Their eyes are a dull yellow. The blue creatures and the woman are positioned against a background of a vibrant, but desaturated, blue, purple, and orange dinosaur standing on a gently sloping hill. The dinosaur's spiky protrusions and fantastical plants are rendered in muted purples, pinks, yellows, and oranges. Purple mountains and a pale blue celestial body are visible under a bright, but desaturated, teal sky. The overall mood is mysterious and slightly ominous, with a vintage or nostalgic feel. The rendering is highly detailed, with a focus on realism in the depiction of natural elements".

Figure 38. The original images from Figure 36 were updated to be more photo-realistic and science fictionish.

Alien first contact.

I was particularly intrigued by the right hand panel in Figure 38. In my imagination, there are now two bipedal aliens in front of Tyna in this image. There is a larger version of that image that was slightly processed so as to enhance the colors (Figure 39, below).

 Alternate science fiction cover illustration. The image shown to the right is a related image that was generated by WOMBO Dream and then manually turned into a book cover illustration by me. In the original image generated my Mr. Wombo, there was what looked like an observation tower on top of the mountain. I slightly enlarged that tower and added a red beacon and a red laser beam aimed into outer space from the mountain top.

Figure 39. Generated by Whisk; enlarged from Figure 38, color adjusted.
Figure 40. A talking alien.

There seems to be a mysterious darkened hemisphere towards the left side of the left hand panel in the  image shown in Figure 38, above. In my experience, AI image generators often cannot resist placing multiple moons and planets in the sky if the word "exoplanet" is mentioned. My guess is that the darkened hemisphere was almost another moon.

 Talking alien. I could not resist making a version of the "First Contact" book cover illustration in which the alien has an open mouth and I can imagine that this sentient alien is talking to Tyna (see Figure 40, the image to the right). 

 Take Four. In Part 3 of my science fiction story The Nanites of Love, there is a visit to the Erre District on the planet Tar'tron, near the Galactic Core. I asked Gemini to generate an image depicting a:

Figure 41. Image generated by Gemini.
 "science fiction scene on an Earth-like exoplanet called "Erre" that is only 5,000 light-years from the center of our galaxy. Imagine an outdoors marketplace on Erre at night, with the bright stars of the galactic core glowing above in the sky. The market has a disorganized maze of stalls and shops where various oddments of futuristic technology are on display". 

Here is my text prompt for the new Whisk "style": "The overall style is reminiscent of magazine cover illustrations, with a focus on night-time colors and intricate details. The colors of the subjects and scene are vibrant and saturated, but viewed under the dim illumination of night. 

Figure 42. Whisk's image illustrating the requested style.
 The image is sharp and clear, with a high level of detail visible in the subjects and design of the scene. The overall mood is futuristic evoking a spirit of adventure and wonder. The beautiful subjects are depicted photo-realistically. There is fractal complexity in the market place stalls of the scene, the walls of buildings and the tiled floor. There is photo-realistic detail in the hair and clothing of the subjects".

Figure 43.
Gemini's description of subject #1 (she is shown in the top panel of the image to the left, generated by Mr. Wombo.): "A digital painting of a young woman in profile view, facing left.    The woman has long, bright blue hair styled in a way that suggests a ponytail pulled back from her face. A silver metallic band or headband is visible in her hair, near the top of her head. Her skin tone is fair, almost porcelain-like, and her eyes are a light blue. Her lips are painted a light pink or rose color. She appears to be wearing a garment with a gold and light blue patterned sleeve or shoulder piece, which has a shimmering or sparkly texture. The background is blurry and dark, with hints of blue and purple tones, suggesting a nighttime or futuristic setting.  The overall style is highly stylized and painterly, with a focus on smooth gradients and soft lighting." The uploaded image for subject #2 (see the second panel in Figure 43) was also described as a "digital painting" by Gemini.

Gemini's description of the uploaded scene (the image was generated by Mr. Wombo, Figure 43): "A digital painting depicts a cobblestone street at night, under a vibrant Milky Way galaxy.    The street is lined with stalls or shops on either side. The stalls are dark, with some having illuminated signs or displays. The stalls on the left side of the street appear to be lit with a cool, bluish light, while the stalls on the right are lit with a warmer, yellowish light. The stalls appear to sell various items, including what looks like technological devices and artifacts.   Three figures are visible in the distance walking down the street. One figure is in the center, wearing a long, dark robe. The other two figures are slightly behind and to the sides of the central figure, also appearing to wear dark clothing.    

Generated by Mr. Wombo.
 Two additional figures are visible closer to the viewer. One is standing near a stall on the left, appearing to be a statue-like figure with light-colored skin and what appears to be a robe or cloak. The other figure is also near a stall on the left, appearing to be a slender figure with light-colored skin.   The buildings lining the street are tall and dark, with pointed tops, giving the scene a medieval or fantasy feel. A tall, slender tower is visible in the background, also dark in color.    The Milky Way is a prominent feature in the sky, with its bright, swirling colors contrasting with the dark buildings and street. The stars are numerous and clearly visible against the dark sky. The overall color palette is dark, with cool blues and purples dominating the left side of the street and warmer oranges and yellows on the right. The cobblestones are a mix of dark and light tones." One of the two Whisk-generated "storyboards" is shown below...

Figure 44. A Whisk-generated storyboard created using the input that is shown in Figure 43.

Generated by Mr. Wombo.
Here is Gemini's description of the Whisk-generated storyboard shown in Figure 44: "A painterly digital painting in a warm, slightly desaturated fantasy art style.  The scene depicts a cobblestone street at night, illuminated by the vibrant Milky Way.  Warm oranges, reds, and yellows dominate the lighting from the market stalls, contrasting with cooler blues and purples in the shadows and background.  Two young women stand on the street. One has long, bright blue hair styled in a ponytail, fair skin, light blue eyes, and rose-colored lips. She wears a shimmering gold and light blue garment. The other has long, wavy purple hair, light skin, blue eyes, and wears a teal robe with gold accents and blue gemstone jewelry.  Her expression is serious.  The street is lined with dark stalls selling technological artifacts, lit with cool blue and warm yellow light.  Tall, dark buildings with pointed tops line the street, and a slender tower is visible in the distance.  Three figures in dark robes walk down the street in the distance. Two additional light-skinned figures stand near stalls on the left.  Visible brushstrokes and varied textures create a sense of depth and atmosphere."

Generated by Mr. Wombo.
I tried editing the AI-generated description to make this more of a science fiction setting: "A photo-realistic, slightly desaturated science fiction art style.  The scene depicts a futuristic tiled street at night, illuminated by the many bright stars that surround this exoplanet near the center of the galaxy.  Warm oranges, reds, and yellows dominate the lighting from the market stalls, contrasting with cooler blues and purples in the shadows and background.  Two beautiful young women are seen shopping for a new digital language translation device. One cute woman has long, bright blue hair styled in a ponytail, fair skin, light blue eyes, and rose-colored lips. She wears a shimmering gold and light blue garment. The other pretty girl has long, wavy purple hair, light skin, blue eyes, and wears a teal robe with gold accents and blue gemstone jewelry.  Her expression is serious.  The street is lined with dark stalls selling technological artifacts, lit with cool blue and warm yellow light.  Tall, dark buildings with pointed tops line the street, and a slender tower is visible in the distance.  Three figures dressed in futuristic metallic jumpsuits walk down the street in the distance. Two additional light-skinned figures stand near stalls on the left.  The photo-realistic depiction of the two human subjects and varied textures create a sense of depth and atmosphere". The updated storyboard image is shown below in Figure 45.

Figure 45. Whisk-generated; updated storyboard, more of a science fiction scene.
Figure 46. By Mr. Wombo.

The Whisk-generated "figures dressed in futuristic metallic jumpsuits" in Figure 45 are rather strange and not what I was expecting. In my experience, Ai image generating software tends to trot out some pretty lame "standardized" versions of robots and aliens and insert then into images, regardless of what users actually want. It is often hard work to avoid these powerful attractors. I had Mr. Wombo generate the alternative jumpsuits shown in Figure 46.

I tried to have Whisk, "Change the human figures in the background to dress them in flashy metallic jumpsuits, walking Victoria's Secret catwalk style," but Whisk then entirely changed the entire scene as shown in Figure 47.

Figure 47. By Whisk; flashy metallic jumpsuits, catwalk style.



From the new Gemini-generated description of Figure 47: "Three figures in flashy metallic jumpsuits walk down the street, Victoria's Secret catwalk style.  Two additional figures stand near stalls on the left." I had to re-edit the storyboard description in Whisk once again (as above, for Figure 45) and got the updated storyboard that is shown below in Figure 48.

Figure 48. Whisk generated two figures in metallic jumpsuits (now on the right hand side).

futuristic device by Mr. Wombo.

 This (Figure 48, above) is not too bad for an illustration of the Erre District on Tar'tron. I could not resist making some alternative versions of this scene in which there was a more unusual device being held, such as the one shown in the image to the right (and the image at the top right corner of this blog post). Some other variants on these themes by both Whisk and Mr. Wombo are shown here.

Whisk options.
After playing with Whisk for two days and working through the creation of four "storyboards" (why not call them wysken?), it is clear that Whisk has substantial advantages over working with the free version of Gemini, which still does not like to generate human images.

 To Do: crack the code on the secret of using other aspect ratios for the images that are created by Whisk.

Next

Bonus: AI-generated music made with MusicFx .....

 Music text prompt: "science fiction music composed by a technologically advanced sentient humanoid alien on a distant exoplanet".

Visit the Gallery of Movies, Book and Magazine Covers

No comments:

Post a Comment