Apr 5, 2025

Whisk Animate

Image generated by ImageFX.

 I've begun to experiment with Whisk Animate. See my YouTube playlist. I wanted to have Whisk generate an image of a cat playing piano. In the first video that is shown below, the cat seems to be controlling the woman as she plays the piano. Video text prompt: "Science fiction. Telepathic training of a cat. The white cat stands on hind legs and places its front paws on the keyboard. The cat plays the piano. The cat's paws move over the piano keys. The woman talks to the cat. A flashing red laser beam connects from the woman's hands to the cat's paws, animating the cat as it plays the piano."

As part of a long struggle to visually depict when telepathy is being used, asking for "red laser beam" connecting the woman to the cat seemed like a simple approach. Maybe this cat actually Manny the bumpha in disguise (in the video, above). In my imagination, the woman is Tyhry and if Manny objects to Tyhry's playing, telepathic communication can be used to correct the perceived problem is Tyhry's piano playing. This video has a music track that you can turn on.

In the next video, shown below, it is easier to believe that the woman is training the cat. However, Whisk went to the trouble of placing the cat inside what I will call a nanite containment chamber, only to have the cat reach right through the wall of the chamber to reach the piano.

Whisk seems to have a preference for long-haired cats. As part of my endless struggle to make depictions of the invisible, the red light ray was meant to represent a telepathic signal emanating from the woman and controlling the behavior of the cat. I was surprised when Whisk decided to put the cat and the woman on opposite sides of the keyboard (above).

In the next video, below, Whisk placed her to the side of the piano and returned to a single beam of light.

In the next video, below, the static red light beams in the periphery might be a visual representation of the telepathic impulses that are being used to train the cat to play the piano. My text prompt specified a, "flashing red laser beam", for which Whisk found it very easy to ignore the "flashing" part.

In the next video, below, I tried to have Whisk create a video that would be a clear demonstration of telekinesis, but the woman failed to make the floating sphere move to the shore, as requested in my instructions. If anything, the floating sphere seems to drift even further from the shore. I was surprised that Whisk tried to generate what looks like falling snow. For "The Visitor at Dawn", I imagine the setting is a very warm summer on Earth, warm enough that Elyrix the Grendel could be barefoot.

In an earlier blog post, I showed an image depicting a large snake-like alien on a distant exoplanet. The video below shows an animation that was made by Whisk in which the snake creature threatens a spaceship. The pilot runs out of the scene and the automated systems of the spaceship defend against the snake creature.

In the next video, below, the spaceship pilot first runs away from the snake creature, then turns back towards her spaceship.

In the next video, I was able to get Whisk to generate a scene in which the automated spacecraft rises off the ground. I also asked that the spaceship turn to point at the snake creature, but Whisk did not turn the spaceship. I also tried to draw Whisk's attention to the guns near the front of the spaceship, but Whisk seemed content to make the defense systems of the spaceship fire ray beams from just about any structural element of the spaceship, including the engines.

I found the next video (below) to be amusing. It looks like she was as frustrated as I was in trying to get the spaceship to turn and point at the alien creature.

Use the lasers, Luke.

Text prompt: "A chaotic action scene in which a space-woman confronts a giant alien snake. A gigantic snake emerges head-first from the sand and moves towards the spaceship. As the snake moves it creates clouds of dust. The woman turns away from the spaceship and runs away from the snake. The woman is shouting at the spaceship and waving her arm in the direction of the approaching snake. When the snake emerges from underground, the automated spaceship rises a few feet into the air and hovers over the sand. Then the spaceship turns to aim its blasters towards the snake and the spaceship’s guns fire a series of pulsing blue laser blasts at the approaching snake. The blue laser blasts emerge from the laser guns that are at the front end of the spaceship. The snake’s tongue moves in rhythm with the undulations of its lashing body."

Video text prompt: "Scene: A woman composes music for a science fiction film. A tall, athletic woman faces an upright piano while she writes music depicting a spaceship battling a colossal alien snake.  Initial Shot: A Medium Shot dramatically reveals the snake's monstrous head, its forked tongue flicking rhythmically as its massive, scaled body undulates. Simultaneously, the sleek, metallic front half of a futuristic spaceship is visible in the frame. Snake's Action: The immense snake, its scales shimmering with an iridescent, otherworldly green, immediately lunges towards the spaceship with astonishing speed. Its lashing movements send billowing clouds of fine, orange desert dust swirling into the air. Camera Movement: A rapid Dolly/Tracking Shot pulls back, revealing the the woman, who then remains centered in the frame while she used the piano keyboard. Woman's Reaction: The woman, her face thoughtful and filled with determination, turns her head sharply from the keyboard to the photograph of the snake alien battle scene that rests against the upright piano’s raised face. She speaks frantic commands, her voice echoing in the small room as she speaks to herself and sings the song being composed. Then frantically waves an arm towards the the photograph of the snake, her movements conveying desperate urgency. Spaceship's Response: Within mere seconds, the spaceship's advanced automated systems engage. It swiftly rises a few feet out of the photograph, its metallic surface gleaming under the lamp light of the room. Spaceship's Attack: The spaceship unleashes a devastating barrage of intense blue laser blasts from its forward-mounted laser guns. The pulsing energy beams, crackling with visible power, strike the woman’s hands where they rest on the keyboard. Final Shot: The woman, now visibly relieved, stops speaking and grabs hold of the spaceship. The spaceship's laser guns, now silent, faintly glow with residual energy."

I asked Gemini for help in constructing text prompts for videos. 

 Me: Gemini, are familiar with Whisk Animate and the creation of video using Google's Whisk?

Gemini: Yes, I am familiar with Google's Whisk and Whisk Animate. Here's a breakdown of what they are:
Google Whisk:
     This is an experimental AI tool from Google Labs that focuses on image generation and remixing.
     It allows users to create new images by using existing images as prompts, rather than relying solely on text prompts.
     It leverages Google's Gemini model to understand the content of uploaded images and Imagen 3 to generate new visuals based on those interpretations.
     The goal is to provide a more intuitive and visual way to explore creative ideas.  

Snake battle scene #1. Generated by ImageFX.
 Whisk Animate:

     This is an extension of the Whisk project that adds video generation capabilities.
     It utilizes Google's Veo2 video generation model to transform static images created in Whisk into short, animated video clips, typically around 8 seconds long.
     This allows for the creation of dynamic content from still images, opening up new possibilities for creative expression.
     It is in a preview phase, and access to it is limited.

In essence, Whisk helps you create images using other images, and Whisk Animate brings those images to life with motion. This reflects Google's ongoing advancements in generative AI, particularly in the areas of image and video creation.

 Me: I have begun experimenting with Whisk Animate. I find that the software can have difficulty interpreting my text instructions for a video that I want to have generated. Do you have general advice for how to verbally instruct Whisk Animate to create a video that I desire? 

Snake battle scene #2. Generated by ImageFX.
Gemini: You're right, even with advanced AI, getting precise video results from text instructions can be tricky. Here's advice tailored to help you guide Whisk Animate more effectively, especially considering its current experimental nature:
 1. Focus on Clear, Concise, and Action-Oriented Language:
 a) Avoid Ambiguity: Whisk Animate needs to understand exactly what you want to happen. Instead of "a scene of nature," try "a dense forest with sunlight filtering through the trees, creating dappled shadows." b) Use Action Verbs: Emphasize movement. Instead of "a bird," say "a bird *flies* across the screen." c) Specify Direction and Speed: If you want a specific movement, use terms like "slowly," "quickly," "left to right," "upwards," or "spinning clockwise." d) Describe the Camera Movement: If you want the "camera" to move, specify "pan left," "zoom in," "tilt up," or "rotate." e) Be Specific About Timing: If you want an action to happen at a specific point, try phrases like "then," "after," or "while.  

Snake battle scene #3. Generated by ImageFX.
2. Break Down Complex Scenes into Smaller Steps:
 a) Instead of trying to describe an entire complex scene in one go, start with a basic image using Whisk, and then use Whisk Animate to add motion to specific elements.
 b) If you want a character to walk through a forest, first create the forest image, then instruct Whisk Animate to "add a human figure walking from right to left through the forest."

 3. Use Visual References:
 Whisk, and therefore Whisk Animate, is designed to use images as prompts. If you have a specific visual style in mind, try providing an image that exemplifies that style.
  For example, if you want a video that looks like a watercolor painting, upload a watercolor image.
 4. Experiment with Iteration:
  a) Generative AI is inherently iterative. Don't expect to get the perfect result on the first try. b) Use the initial output as a starting point and refine your instructions based on what you see. c) If the animation is close but not quite right, try rewording your instructions or adding more specific details.

  5. Focus on Key Elements: Instead of trying to animate everything in the scene, focus on animating the key elements that will have the most impact.
  For example, if you have a scene with a still background and a moving character, focus on describing the character's movement.

Snake battle scene #4. Generated by ImageFX.
 6. Use Simple Cinematic Language: Think about basic camera shots. Words like "close up", "wide shot", "establishing shot", and "overhead view" can help. Example:

 Instead of: "A dragon flying."
 Try: "A large, green dragon *flies quickly upwards* from the bottom of the screen to the top, its wings *beating rapidly*."

By focusing on clear, specific, and action-oriented language, and by breaking down complex scenes into smaller steps, you can significantly improve your chances of getting the desired results from Whisk Animate.


Snake battle scene #5. Generated by ImageFX.
 Me: You suggested: "Use Simple Cinematic Language". Create a glossary of key terms that are part of "Cinematic Language".

 Gemini: It's very useful to have a grasp of cinematic language, especially when trying to direct AI tools like Whisk Animate. Here's a glossary of key terms:

 Camera Shots:
Extreme Long Shot (ELS):
       A very wide shot, often used to establish the setting.
      Example: A vast landscape, a city skyline.
 Long Shot (LS):
      Shows the subject from a distance, placing it in context.
      Example: A full view of a person, a building.
 Medium Shot (MS):
      Shows the subject from a moderate distance.
      Example: A person from the waist up.
 Close-Up (CU):
      Shows a detailed view of the subject.
     Example: A person's face, a small object.
 Extreme Close-Up (ECU):
     A very detailed view, focusing on a small part of the subject.
     Example: A person's eyes, a single drop of water.
 Establishing Shot:
     A shot that sets the scene. Often an extreme long shot or long shot.
 Point-of-View Shot (POV):   Shows what a character sees.

 Camera Movements:
 Pan: Horizontal movement of the camera.
 Tilt:     Vertical movement of the camera.
  Dolly/Tracking Shot:  The camera moves physically along a track or on wheels.
 Crane Shot: The camera moves on a crane, allowing for high-angle or sweeping shots.
 Zoom:  The lens changes focal length, making the subject appear closer or farther away.

Snake battle scene #6. Edited by ImageFX.
 Editing:
Cut: An instantaneous transition from one shot to another.
 Dissolve: A gradual transition where one shot fades into another.
 Fade: The image gradually fades to black (fade-out) or from black (fade-in).
  Montage: A sequence of shots edited together to convey a concept or passage of time.
 Cross-Cutting:
    Alternating shots of two or more lines of action occurring in different places, usually simultaneously.

 Other Important Terms:
Mise-en-scène:
     The arrangement of everything within the frame, including actors, props, and lighting.
 Depth of Field: The range of distances within which objects are in focus.
 Framing: The way a subject is positioned within the frame.
 Lighting: The use of light to create mood and emphasize elements.
 Angle:  The position of the camera relative to the subject (e.g., high angle, low angle).

By using these terms in your instructions to Whisk Animate, you can give the AI a much clearer picture of what you want to achieve.

Snake battle scene #7.
 I experimented with some of these terms. My original prompt was: "A chaotic action scene in which a tall woman is threatened by the appearance of a giant alien snake. At the start of the video, the gigantic snake emerges head-first from the sand and moves towards the spaceship. The snake’s tongue moves in rhythm with the undulations of its lashing body. As the snake moves it creates clouds of dust. Taking note of the snake, the woman turns away from the spaceship and runs away from the snake. The woman is shouting at the spaceship and waving her arm in the direction of the approaching snake. When the snake emerges from underground, the automated spaceship responds and quickly rises a few feet into the air. Then the spaceship hovers above the sand. Then the spaceship turns to the right and uses its laser blasters to defend against the approaching snake. Then the spaceship’s guns fire a series of pulsing blue laser blasts at the approaching snake. The blue laser blasts emerge from the laser guns that are at the front end of the spaceship. At the end of the video, the woman stops running and turns back to look at the triumphant spaceship which has vanquished the alien snake creature."

Snake battle scene #8.
Modified prompt: "Mise-en-scène: a chaotic action scene in which a tall woman is threatened by the sudden appearance of a giant alien snake. At the start of the video, a Medium Shot shows a gigantic snake when it emerges head-first from below the the surface of the sand. At the start of the video, only the from half of the spaceship is in frame. Then the snake immediately moves towards the spaceship. The snake’s tongue moves in rhythm with the undulations of its lashing body. As the snake moves aver the surface of the sand, it causes clouds of dust to rise. Dolly/Tracking Shot: the camera pulls away from the snake so that the entire spaceship can be seen in frame. The woman is always in frame. Taking note of the snake, the woman turns away from the spaceship and runs away from the snake and she runs towards the camera. While running, the woman is shouting and giving verbal commands to the spaceship autopilot. As she runs, she is waving one arm in the direction of the approaching snake. As soon as the snake emerges from below the surface of the sand, the automated spaceship responds and quickly lifts itself a few feet into the air. Then the spaceship hovers above the sand and retracts its landing gear. Then the spaceship quickly turns to the right and points its front end directly at the approaching snake creature. The spaceship fires its laser blasters to defend against the approaching snake. Then the spaceship’s guns fire a series of pulsing blue laser blasts that strike the approaching snake. The blue laser blasts emerge from the laser guns that are at the front end of the spaceship. The snake creature is struck by the laser blasts and collapses. Near the end of the video, the woman stops running away from the snake and she turns back to look at the spaceship which has vanquished the alien snake creature."

Whisk can only generate videos in the landscape aspect ratio, not with the square or portrait options.
 I showed my text prompt (above) to Gemini and it suggested the following: "Scene: A high-stakes desert action sequence. A tall, athletic woman faces a sudden, terrifying threat: a colossal alien snake.

  Initial Shot: A Medium Shot dramatically reveals the snake's monstrous head bursting from the sand, its forked tongue flicking rhythmically as its massive, scaled body undulates. Simultaneously, the sleek, metallic front half of a futuristic spaceship is visible in the frame.

Snake battle scene #9.
 Snake's Action: The immense snake, its scales shimmering with an iridescent, otherworldly green, immediately lunges towards the spaceship with astonishing speed. Its lashing movements send billowing clouds of fine, orange desert dust swirling into the air.

 Camera Movement: A rapid Dolly/Tracking Shot pulls back, revealing the entire streamlined spaceship and the woman, who remains centered in the frame.

 Woman's Reaction: The woman, her face contorted with terror and determination, turns sharply from the spaceship and sprints directly towards the camera. She screams frantic commands at the ship's autopilot, her voice echoing across the desolate landscape. She frantically waves an arm towards the rapidly approaching snake, her movements conveying desperate urgency.

 Spaceship's Response: Within mere seconds, the spaceship's advanced automated systems engage. It swiftly rises a few feet, retracts its landing gear with a mechanical whir, and pivots sharply to face the snake, its metallic surface gleaming under the harsh desert sun.

 Spaceship's Attack: The spaceship unleashes a devastating barrage of intense blue laser blasts from its forward-mounted laser guns. The pulsing energy beams, crackling with visible power, strike the snake's thick hide, causing it to convulse violently and collapse into the sand.

 Final Shot: The woman, now visibly relieved, stops running and turns back to witness the spaceship's victorious stance. The spaceship's laser guns, now silent, faintly glow with residual energy, and the dust from the snake's collapse begins to settle.

By using the text prompt suggested by Gemini (above) I was able to get Whisk to generate videos like the ones shown below, but I had the feeling that these long prompts were just confusing Whisk's video generator.



Kung Fu snake fight. Below is one of the stranger videos generated by Whisk. Watch the human figure in the background, under the spaceship.

Related ReadingAI video and Whisk.
Next: visit my library of Whisk-generated images.

Poetry by Gemini. Images by Mr. Wombo. Visit the Gallery of Movies, Book and Magazine Covers

No comments:

Post a Comment