Feb 3, 2024

Comic Hallucination

Bard-generated image.

 The Bard large language model (LLM) chatbot had one of its periodic updates on February 1st, 2024. Bard can now generate images upon request using Google's Imagen 2 software. 

Bard Humor. For the image that is shown to the right, I asked Bard to generate: "a science fiction-oriented gag cartoon illustrating a humorous solution to the problem of AI hallucinations that is discovered in the year 2035".

Maybe AI hallucinations are dark attractors in the state-space of the network models that hold the trained-up parameters of the models... maybe that is what is on the display screen in the image that Bard generated.

The solution to AI hallucinations.

Hoping that Bard might be able to explain that image (above), I then asked Bard to provide a "humorous caption" for the image. I was imagining that Bard would simply print out such a caption, but instead Bard went ahead and generated an entirely different image that is shown to the left.

I suppose that most of the "gag cartoons" that were included in the Imagen 2 training set were in black and white, not color. In my experience, it is very common for AI-powered image generators to include a robot in their images any time that "science fiction" is included in a user's text prompt.

Flat Earth, by Bard.
In this blog post (below), I'm going to explore two AI-powered tools that might be useful for creating images for science fiction stories: AI Comic Factory and Microsoft's Designer software. I'll also continue to experiment with Bard and Imagen 2.

The first image that Bard generated for me is shown to the right. My text prompt was: "an alternate universe where planets are flat and life-forms are made of neutronium, two life-forms with large eyes are seen in profile view flying above and below a flat Earth, fractal complex star-field above the flat Earth, a small flat Moon is seen above the flat Earth" (see this chat with Bard).

Flat Earth by Mr Wombo.
When I then showed Bard Figure 5 from the bottom of this blog post (which has a "flat Earth" image that was generated by DALL-E) Bard told me: "I cannot see the image you uploaded, and it is against Google's AI Principles for me to generate responses that are related to conspiracy theories or the spread of misinformation".

The image to the left was generated with the help of WOMBO Dream. Mr. Wombo's original "flat Earth" image did not include any neutronium life-forms, so I asked Mr. Wombo to use this text prompt: "an alternate universe where life-forms are made of neutronium, two neutronium life-forms with large eyes are seen in profile view". I then pasted the neutronium life-forms that Mr. Wombo into the image with the flat Earth to create the composite image that is shown to the left. As was the case for DALL-E, Mr. Wombo could not resist including spherical planets and moons in its image of a flat Earth.

a space elevator by Bard

 Bard Wants a Spherical Earth. I also asked Bard to generate an image depicting: "a space station in geosynchronous orbit above Earth with a space elevator seen linking from the orbital station to the surface of Earth, below" (see this). One of the two resulting images is shown to the right.

I suppose there were not many images of space elevators in the training set for Imagen 2. I'm guessing that maybe the linear yellow feature is supposed to be the space elevator, but there is also a mysterious object above the space station that almost looks like a metalic roadway rising from the surface of the planet.

see Recruiting Telepaths
 AI Tools for Comic Generation. About a year ago (see Cover Nanites), I began exploring how to use AI software as an aid for illustrating my science fiction stories. 

I'm not a fan of comics, but in the past, I have occasionally dabbled with comic-format presentation of a science fiction story (for example, see Recruiting Telepaths from April 2021). 

Here in January 2024, I finally experimented with AI Comic Factory and asked for "Lato John an exobiologist is studying an alien creature in the ocean of an exoplanet". The raw AI-generated output from the free version of AI Comic Factory is shown below in Figure 1:

Figure 1. This is the raw AI Comic Factory output, using their "flying saucer" style.

 

Fig. 1a. numbered panels.
 I used WOMBO Dream to make some modifications to the raw AI Comic Factory output (the raw output is shown above in Figure 1) and added some words that seemed to go along with the AI-generated images (see Figure 2, below). 

Editorial decision; When making Figure 2 (below), I swapped the original panel #1 from Figure 1 with panel #4. Panel 4 had some text on it that seemed like what should be on the first panel of a comic. Also, I wanted to introduce the submarine at the end.

Figure 2. See this large image on DeviantArt: Lato Johns Exobiologist Comic.
REDRAW button

One of my pet peeves about computer software is that many programmers have seemingly lost interest in providing instructions for their software. Yes, I only used the free version of AI Comic Factory and you could say that I got what I paid for (Figure 1, above). There was a "REDRAW" button on each of the four panels of my comic, but clicking on those buttons did nothing except erase the panel.

To make Figure 2 (above), I used WOMBO Dream to make the character Lato Johns look more like what I imagine her physical appearance to be. Using Mr. Wombo, I was able to correct problems such as the three legs that AI Comic Factory gave to Lato in the image to the right.

Designer logo; image source
 Designer. While on the subject of annoying software, poor documentation and terrible user interfaces, I also tried to use Microsoft's Designer software which is now linked to from Bing's Image Creator (see Figure 3, below). Microsoft's FAQ for Designer says: "Designer makes it easy, fast, and fun to design, no design experience required. From helping you get started with your ideas or content like generating custom layouts or even new images from your ideas (with DALL.E 3 and GPT 3.5 integrated into the design experience from the start), Designer makes it easier than ever to simply type what you want to create and get customized designs with compelling messages included that you can further personalize and refine."

Figure 3. Bing's text-to-image software ("Image Creator") that uses DALL-E and links to Designer.

The user interface for Designer.
 I used Designer's "Remove background" tool to process one of the "red Martian" images shown in Figure 3 (above).

As part of Microsoft's effort to make a horrible user interface, you need to click on your image before you can see the options for altering an image. In order to download an image from Designer, you click on the "Download" button and select which of your Designer pages you want to download. When you do this, the drop-down list of your pages covers up the "Download" button and you can not download your image until you discover the magic place on the screen where you must click in order to make the "Download" button re-appear.

Figure 4. Designer image tools.


transparent background
Martians in New York.

The image to the left shows how the Designer software took out the "background" from the image shown above in Figure 4. I then saved that processed image with the option for a transparent background. 

Next, I pasted these red Martians onto another image of New York City (see the image that is shown to the right).

neutronium aliens and a flat Earth
 Smart Erase. As a test of the "generative erase" tool of Designer, I started with the image shown to the left which DALL-E created for this text prompt: "an alternate universe where planets are flat and life-forms are made of neutronium, two life-forms with large eyes are seen in profile view flying above and below a flat Earth". This image is a good illustration of the stupidity of these AI image-generating systems. While I can't fault the "flat Earth" that DALL-E created, I do have to ask: why did DALL-E then sprinkle in all those spherical planets? Maybe in DALL-E's imagination, large planets fall flat while smaller moons can retain a spherical shape.

Modified image with the spherical planets removed.
I had to use the "generative erase" tool 16 times to create the modified image that is shown to the right. Unfortunately, the Designer interface is rather slow, so doing that series of 16 edits was a tedious process.

Figure 5. add an alien below
And...

For the image that is shown to the left, I used Designer to add an additional neutronium alien (from an image in this blog post) below the flat Earth.

Next: the Claude chatbot as a story-writing collaborator.

Five AI-generated images for this text prompt: "A full color scientific illustration of the idea that AI hallucinations are dark attractors in the state-space of the network models that hold the trained-up parameters of the models, large language models can get stuck in those dark attractors". The three narrower images were all generated with different "styles" for WOMBO Dream. The second image from the left was made by DALL-E (Designer). The second image from the right was made by Bard (Imagen 2). Visit the Gallery of Movies, Book and Magazine Covers

No comments:

Post a Comment