Jan 4, 2025

Whisk

An imaginary science fiction novel
called Pharism. image source
 Near the beginning of "Trullion" by Jack Vance, the star-drive that is used by spaceships in Alastor Cluster is casually referred to as "whisk". At labs.google/fx Google now has a generative imagery experiment called "Whisk". Here is how Google describes the Whisk image-generating software: "Behind the scenes, the Gemini model automatically writes a detailed caption of your images. It then feeds those descriptions into Google’s latest image generation model, Imagen 3." During the past year I've been occasionally using Gemini to generate images, but Whisk has a specific formalized system for uploading two reference images ("subject" and "scene") and combining them with a specific "style" (see Figure 1, below).

  Figure 1. The Whisk formula for image creation.

Following Google's link to Whisk, I arrived at the "subject" upload user interface page shown in Figure 2, below.

Figure 2. The Whisk landing page.
I dragged one of my existing AI-generated images (see Figure 4, below) into the Whisk "subject" field and this is what was generated (Figure 3):

Figure 3. After uploading a "subject" image.

 

 

 

 

 

 Note: the first time I tried to use Whisk I got the standard "login with Google" prompt. That's my Google user icon in the upper right corner of Figure 2.

Figure 4. A "subject" reference
image by Mr. Wombo.

The "subject" image (Figure 4) that I uploaded (drag and drop) into Whisk was an old AI-generated image that was made by Mr. Wombo (WOMBO Dream) back in 2024.

 Excremental. My guess is that the hideous yellow background color that was selected for the Whisk user interface is intended to be a color that most people would not include in their own reference images. So although there might be a good reason for using such a bizarre color, this strikes me as a choice that is just as bad as using the name "Bard" for a chatbot. After releasing the Bard chatbot, Google quickly changed the name to Gemini. I won't be surprised if Google switches Whisk to a new background color that is easier on the eye.

Figure 5. Add image.
There is a mysterious "add your own image" field in the Whisk interface (Figure 5), but when I used that to upload a "subject" image, Whisk did not process it; the uploaded image was just left there doing nothing (see Figure 6).

Figure 6. After upload using the
"ADD YOUR OWN IMAGE" field.
I suppose that since this is all just an "experiment", we should not expect much from the Whisk user interface. However, I find "drag and drop" to be very annoying. Fortunately, you do not have to drag and drop where it says "DROP AN IMAGE HERE". If you click there, then you can use a standard file selection dialog window to select and upload your image.

Figure 7. Two subjects from one upload.

Here is what the Whisk FAQ says about "subject" reference  images: "That’s what the image is about! Character, objects or a combination of such. An old rotary phone! A cool chair! A cardboard movie display. A mysterious renaissance vampire." When I uploaded the image from Figure 6 into Whisk, it generated two new "plushies" (see Figure 7).

Figure 8.
Another "subject".

The image that I uploaded into the "ADD YOUR OWN IMAGE" field is shown in Figure 8. As can be seen in Figure 7, the AI-generated "plushie" came with a black shirt (that shows her belly button), blue pants and blond hair, all correctly matching the reference image in Figure 8. I'm not sure how Whisk selected the hair color/style for the "vampire" plushie (to the right in Figure 7). Maybe the hair for the vampire was "borrowed" from the lady in Figure 8. It does look like the plushie for the tall blond is slightly taller than the vampire plushie.

Figure 9. Enamel pin example.
Currently, there are only three "styles" available in the drop-down menu. An example with the "enamel pin" style is shown in Figure 9. I uploaded the plushie from the left side of Figure 7 as a new reference image and Whisk converted it into a "pin".

Figure 10. The "tool" interface.
As seen in Figure 9, as soon as Whisk generates a new image, there is a new button: "OPEN IN TOOL". Apparently that "tool" is the "real" user interface. The interface shown in Figure 2 is apparently just a simplified "starter page" for new users. When in the "tool" user interface (see Figure 10) there is a tab on the left that allows you to add a "scene", either by providing a text prompt or by uploading another image. At this point, I tried to change the hair of the vampire with this text prompt: "Change the vampire subject's hair color to black hair."

Not shown in Figure 10 is the fact that Whisk generates two versions of each image. 

Figure 11. Two edited versions of the plushies.  Click on image to enlarge.
The image in Figure 11 shows the two Whisk-generated plushie images side-by-side (upper row) and also the two edited versions (lower row) in which the vampire was given black hair. There were also some modifications to the belly button plushie: she was allowed to stand up and she was given just the slightest hint of cleavage.

Figure 12. Whisk image buttons; upper left, REFINE. Upper right, delete.
I then uploaded a background image for the "scene", but nothing happened (see Figure 15, below). Shown in Figure 12 are the buttons that appear when you hover over an image (this Whisk-generated image originated from Figure 3, above). Clicking on the "REFINE" button zooms you in on that one image and allows you to apply a text prompt that will modify the image.

13
Clicking on the other button (the icon looks like a printed page and a pencil) that is just to the right of the "REFINE" button, gives you access to a Gemini-generated text description of that image. 

No scene.
Gemini-generated image description example for Figure 12: "A photograph of a chibi plushie of a young woman with shoulder-length, wavy blonde hair and fair skin.  The plushie is made of soft, cuddly fabric with soft, button eyes and a friendly expression. She wears a light gray, short-sleeved crop top and a dark gray, short A-line skirt. A dark brown belt is cinched around her waist. The plushie is sitting on a table, centered and uncropped against a plain white background. The lighting is even and soft. The background is a detailed depiction of a futuristic spaceship interior, rendered in a chibi style. The walls are dark blue and metallic, with various technological elements such as control panels and screens, all appearing soft and plush. A large circular window with orange accents is visible. A tall, slender metallic structure extends from the floor to the ceiling. The floor is dark with glowing lines.  The overall aesthetic is soft, cuddly, and friendly, with a focus on the plushie's features and the spaceship's details rendered in a similarly soft and plush style." (Note: the free version of Gemini refused to generate an image when provided with this text prompt, saying: "Generating images of people is only available in early access with Gemini Advanced".)

Figure 13. Whisk's pink hotdog fingers (and buns).
That Gemini-generated description of Figure 12 is rather confusing. First it mentions "a plain white background" then says, "The background is a detailed depiction of a futuristic spaceship interior". I tried altering the text description to, "A photograph of a chibi plushie of a young woman with pink hotdog fingers jutting out from the tip of each hand". The edited images are shown in Figure 13. I was thinking of "Everything Everywhere All at Once".

Figure 14. New hotdog fingers.

I tried to edit ("refine") with this text prompt" "Change the subject's fingers to look like the hotdog fingers of Michelle Yeoh in the film "Everything Everywhere All at Once", but Whisk refused to make the edit. Suspecting that mention of a real world person was blocking the edit, I tried altering the text prompt to: "Change the subject's fingers to look like the hotdog fingers in the film "Everything Everywhere All at Once". Whisk then generated the altered image that is shown in Figure 14. (Note: some AI-generated (WOMBO Dream) images of Michelle Yeoh with hotdogs are shown here.)

15
Returning the mystery of how to get the Whisk software to combine the subject and scene images according to the specified style, I eventually realized that you have to click on the arrow icon. I really despise the "modern" software user interface where there are meaningless icons, some of which are hidden until you know the magic trick to get them to even appear on the screen.

Figure 16. Alien vampire hunter "storyboard". In my imagination, the lady on the left is an alien.
Figure 17.


Before letting Whisk generate the image shown in Figure 16, I first edited the style by providing this text prompt: "A photograph of the subject as a beautiful alien creature from outer space. The alien creature is similar in appearance to a human female. The alien creature is tall and slim". The subject, scene and style that were used to generate Figure 16 are shown in Figure 17. Notice that the woman with red hair and a red dress that was in the original "scene" image got included in Figure 16. It is particularly amusing that the vampire is sunk into the ground. 

Here is Gemini's description of Figure 16: "A photograph of two alien creatures, styled as beautiful extraterrestrial beings, against a backdrop of a miniature scene.  The top creature is a tall, slender, fair-skinned female with a visible abdomen, wearing a black, open jacket and low-rise jeans. The bottom creature has pale skin and black, pointed ears, wearing a wide-brimmed hat adorned with pink and red roses and a dark burgundy jacket.  Dark eyeshadow and lipstick accentuate their serious expression, and small, pointed fangs are visible.  

Figure 18. The original Whisk-generated
image (from Figure 16) was modified by Mr. Wombo.
 Both figures are positioned in front of a dark, textured background featuring stylized, bioluminescent blue tendrils resembling wet roots or vines.  A miniature figurine of a young woman with light skin and shoulder-length reddish-brown hair, wearing a rust-colored dress, stands facing away in the background. The base is textured mud with small, dark green plants and moss. The lighting emphasizes the creatures against the darker background. The overall aesthetic is a mystical, fantasy scene with an otherworldly feel".

Figure 19. Wombo-generated.

Once again, there are some interesting errors in Gemini's description of the image. Sadly, while Gemini claims that "pointed fangs are visible", I don't see the fangs. Gemini also incorrectly says that the redhead is, "facing away in the background". While my uploaded image did have her facing away, Figure 16 has her facing the camera. Some fangs that were generated by Mr. Wombo are visible in Figure 19, but these small fans are hard to see where I pasted them into Figure 18. Mr. Wombo was even able to generate some nipples for Figure 18, which is rather rare for the free version of Wombo Dream.

20
One of the icons in the Whisk user interface provides a link to the "Google Labs" Discord server. The whole process of getting started with Discord was confusing, because as soon as I put in my email address (you also have to provide your phone number), the software indicated that I already had a Discord account, which I did not (Discord support).

Figure 21.
I've never used Discord before, so when I clicked on that button I was presented with an "invite" to the Google Labs Discord server (Figure 21). In the support pages for discord it shows users how to go to User Settings > Privacy & Safety. However, there is no such tab in my user settings.

Figure 22. Data and Privacy.
As shown in Figure 22, there is a tab called "Data & Privacy" which has three green toggle buttons for restricting the user data that is kept by Discord.

Figure 23. Discord email spam.
Sadly, emails from Discord seem to go into my primary Gmail inbox while emails form other online sites like DeviantArt go into the "promotions" or "social" inboxes. So, I have to wonder how much email and phone spam I will be getting from Discord. I set my Discord screen name to be "JWSAISCIFI", which, aster the leading initials of my name, means artificial intelligence science fiction. I will learn if the discussions on the Discord server are useful.

Figure 24. Server rules.
image by WOMBO Dream

 It is free - you get what you pay for. A word of warning about using text prompts to generate images with Whisk. This Gemini-based system will often refuse to generate an image corresponding to what you ask for. If you compose an intricate text prompt in Whisk, be sure to save it in a location outside of Whisk so that you don't loose your work. You might only need to change one word in your text prompt to get Whisk to generate the image, so there is no point in re-composing a long prompt from scratch while repeatedly trying to get Whisk to generate your image. Also, sometimes Whisk will act like it is generating an image, but it has crashed and will never complete the image no matter how long you wait. Other image generating systems will give you an error message if there is a problem, but Whisk can just hang, with no indication that anything is wrong. As a Macintosh user, I use Apple's Notes application to save copies all of my text prompts while making AI-generated images.

Figure 25. Introductions (right side) and major discussion categories (left side).
Figure 26.
When I found and used the "start from scratch" option (lower center in Figure 2), I was able to enter new subject and scene images (see Figure 26) and there was a fourth holiday themed style with this description: "A glass ornament, hanging from a Christmas tree, is depicted in a close-up shot. The background is blurred, focusing attention on the ornament. The lighting is soft and warm, creating a gentle glow around the ornament. The overall style is reminiscent of traditional Christmas decorations, with a focus on rich colors and intricate details. The image has a slightly vintage feel, suggesting a handcrafted or antique aesthetic. The colors are vibrant and saturated, with a focus on reds, greens, and golds. The image is sharp and clear, with a high level of detail visible in the ornament's texture and design. The overall mood is festive and cheerful, evoking the spirit of the Christmas season.
Figure 27. Useful information.

If a location is provided, incorporate characters into a creative festive scene with their location AS the ornament. The characters and scene should be made out of fun 3D materials to form an intricate little sculpture on of the ornament.
 OTHERWISE any characters should be drawn as a single blown glass Christmas ornament. Show the final product hanging on a cute Christmas tree branch
".

Figure 28. Edited alien.
After "starting from scratch", there was now the useful tip (about clicking on the arrow button) that is shown in Figure 27. After clicking on the "generate" button, there was no image generated and no message provided. In the Discord discussions, I saw mention of there being some limit on how much a user can generate with Whisk in one day. Had I reached the limit? Also, I know that Gemini will refuse to generate images for some text prompts.

The "subject" image that I used (top panel of Figure 26) was the image shown in Figure 18. Here is Gemini's description of that image: "Three female figures are posed in a diorama.

The figure on the left is tall and slender with pale skin, long blonde hair, and pointed ears. She is topless, wearing a black leather jacket and blue jeans. She has a necklace and what appears to be a jeweled bracelet on her left wrist.

Figure 29. From bust to burial.

The central figure is a bust of a female with pale skin and dark, pointed ears. She wears a wide-brimmed burgundy hat adorned with pink roses, and a dark burgundy jacket. Her makeup is dark and dramatic, with dark lipstick and eye makeup.

Whisk's bust.
 The figure on the right is a woman with long, auburn hair and fair skin. She wears a long, brown dress.
The background of the diorama is dark and moody, with teal, branch-like structures. The figures are positioned on a base of dark soil and moss".

I removed the nipples (see Figure 28) and made certain that Whisk did not convert the vampire into a bust and I continued with the new "subject" that is shown in Figure 30, below. 

Figure 30. Updated "subject image".
Using that new "subject" image (Figure 30, above), Whisk generated the new "storyboard" shown in Figure 31, below...
Figure 31. More of a science fiction storyboard, but still with an alien vampire.
Figure 32. Is her hand
passing through glass?
The Whisk "storyboard" in Figure 31 has more of a science fiction flavor than what Whisk generated for Figure 16, above. I don't know how Whisk decided to put the vampire inside the glass globe and leave the other two figures outside. Whisk did have some truble dealing with the bottle, with hands sometimes seemingly passing through the glass (see the image to the right).

alternate alien

It was a lucky turn of events that Whisk interpreted one of my uploaded images (Figure 18) as being a diorama and there was the holiday themed "style" that looked like an ornamental tree bulb. That led to me placing the vampire character inside a glass bottle (Figure 31), a classic pulp Sci Fi topic. In my next blog post, I'll continue my exploration of Whisk as a tool for making story illustrations. Above on this page, I got caught up with the "renaissance vampire" that is present as an example on the introductory page for Whisk. My personal interests are in the domain of science fiction. Next, I'll turn my efforts to using Whisk for creating illustrations for science fiction stories.

Next: Making Sci Fi Wysken.

Visit the Gallery of Movies, Book and Magazine Covers

No comments:

Post a Comment