Feb 21, 2023

What are the Rules?

Figure 1. OpenArt community
 For the past month (start here), I've been experimenting a cloud-based CLIP-guided image generating system (WOMBO Dream) as a tool for helping me make illustrations for my science fiction stories. Here in this blog post, I'm going to get serious about trying to discover the rules for making good text prompts!

The image to the right was generated with this text prompt: "This prompt book is brought to you by OpenArt, a platform and community dedicated to AI-native content and written by members from the community". The WOMBO Dream training set of images seems to have been very rich in absurd robot images. I've learned that a request for an image depicting a space alien is likely to result in something like Figure 1. Also, some WOMBO Dream-generated images randomly include text. What if I want to remove that text?

image source
 Inpainting. My first goal was to find the secret of AI-based "inpainting"; the removal/correction of specific parts of images. The interface for the OpenArt Stable Diffusion cloud-based system (image to the left) explicitly includes a "Negative Prompt" field. 

image processing
I tried to get the OpenArt Stable Diffusion software to selectively remove the words (text) from Figure 1

Sadly, as seems to be the industry standard, there are no instructions for how to use the "Negative Prompt" feature.

Figure 2. inpainting fail

 Figure 2 shows the image that I got when I tried to alter the image shown above in Figure 1. Clearly, I need help in figuring out how to efficiently modify AI-generated images either by means of a "Negative Prompt" or by "inpainting".

 The Rules. There are content rules for the OpenArt community. All images are supposed to be "rated G".

 Rule #1. As for sensible rules for making good text prompts, I suppose Rule #1 is: earlier items in a text prompt are supposed to be given greater priority than items that come towards the end.

Figure 3.
To test this idea of priority being given to the linear order of items in a text prompt, I used: "science fiction, a red humanoid alien from outer space, a woman in the left side of the scene, they are looking at each other, handshake, exoplanet background" (see Figure 3). I was now using WOMBO Dream as my Stable Diffusion user interface. I'll anthropomorphize this software by calling it "Mr. Wombo".

handshake
Based on Figure 3, you might wonder just how much the Stable Diffusion database was loaded up with sample images of handshakes. The image to the right was generated by: "people shake hands, handshake, a man, a woman, they are looking at each other". 

Figure 4.

 All Aliens Club. I changed the order of the words for my robot handshake scene to: "science fiction, handshake, a red humanoid alien from outer space, a woman in the left side of the scene, they are looking at each other, exoplanet background" and got my alien handshake (Figure 4). The poor woman was left out of it.

Figure 5. blond woman

I tried to re-emphasize the woman by adding more information about her: "science fiction, handshake, a red humanoid alien from outer space, a woman in the left side of the scene, they are looking at each other, the woman has long blond hair, exoplanet background". 

I've previously concluded that this software does not know left from right and the "prompt book" says nothing about using the terms "left" and "right" in text prompts.

I was impressed by the proliferation of people in Figure 5. This image was made by the pre-defined WOMBO Dream Flora v2 style.

not HDR
 Modifiers. There is a large collection of image "modifiers" here.

The "modifiers" HDR, UHD and 64K seemed interesting based on what I read in the Openart prompt book. However, I suspect that "modifiers" such as these are already included in WOMBO Dream "styles" such as "Realistic v2".

Working with the "Soft Touch" style of WOMBO Dream, I used the "Two women, Dana Scully, Monica Reyes, The X-Files, alien biology research laboratory, Monica in a white lab coat, Monica has black hair, Dana wears a blue jumpsuit, alien body parts" (see the image to the left).

with HDR
I then tried, "HDR, UHD, 64K, Two women, Dana Scully, Monica Reyes, The X-Files, alien biology research laboratory, Monica in a white lab coat, Monica has black hair, Dana wears a blue jumpsuit, alien body parts, alien body parts", but I see no difference (image to the right).

Figure 6. OpenArt not HDR
I tried this same comparison at OpenArt Stable Diffusion and got similar results (Figure 6).

setting used for Figure 6
The settings used for Figure 6 are shown to the right.

The image run with "HDR, UHD, 64K" in the prompt are shown in Figure 7.

Figure 7. OpenArt with HDR.
Again, I don't see a difference between Figure 6 and Figure 7. Since the instructions suck, I'm going to guess that "HDR, UHD, 64K" is an attempt to specify a greater weight on training images that were tagged as high resolution images.

Alternatively, there might only be one trained neural network, and "HDR, UHD, 64K" is an attempt to modify the "diffusion" process and bias it towards generating higher resolution images.

Figure 8. 50 steps
I tried increasing the number of steps for image production and that made no difference either (see Figure 8).

Another possibility is that since the Stable Diffusion algorithm was trained on 512 x 512 pixel images, the "HDR, UHD, 64K" might only effect larger generated images.

Figure 9. full sized image here
By playing around with Mr. Wombo and using Photoshop, I was able to generate the image shown in Figure 9.

Figure 10. HDR, 50 steps
I fed the Figure 9 image into the OpenArt system and got Figure 10. This was using their default of 75% for the "strength" if my reference image.

gang of three
The image to the left shows Mr. Wombo's preference for groups of three, not two. This image was with the "Buliojourney v2" style. This was generated by text prompt and without a reference image. 

I'm tempted to invent a story for the image to the left. I don't know who the third woman is, but maybe this is simply waiting for a stall in the hospital's restroom. A more interesting story line would be if they were waiting for the results of an alien DNA analysis.

I then put the Figure 9 image into "Buliojourney v2" style as a reference image.

original text prompt
add "alien heads" to start
With the original text prompt, I got the image to the left. After including an addition "alien heads" at the start of the text prompt, I got the image to the right.

Sometimes I wonder if WOMBO Dream devotes more processing steps to a job if you keep repeating it.

Mr. Wombo has three levels of dependency on reference images; weak normal and strong. Examples for "weak" and "strong" are shown below.

weak image-to-image
strong image-to-image
The image to the left really emphasized the "alien heads" part of the text prompt. The image to the right fairly accurately reproduced the reference image (Figure 9).

Below, in its entirety, are the "instructions" for "inpainting" that are in the OpenArt prompt book. I put "instructions" in quotes, because they don't really explain how to do inpainting with their software. This is the industry standard for crappy instructions.

"instructions" for "inpainting" that are in the OpenArt prompt book.
Zombie Leyla Harrison?
transparent
One of the images from Mr. Wombo with three people, not two, is shown to the left. I tried this with Mr. Wombo and a text prompt saying: "Dana Scully, Monica Reyes, The X-Files, alien biology research laboratory, Monica in a white lab coat, Monica has black hair, Dana wears a blue jumpsuit".

I could not figure out how to get Mr. Wombo to do this inpainting. However, working with the OpenArt interface and putting "inpaint and do not show a third person" in for the "Negative Prompt" and using the offending part of the image made transparent, I got the result in Figure 11.

Figure 11. gone Leyla
This little bit of inpainting resulted in the longest lab coat in history, but it basically worked.

setting for inpaint example
The OpenArt interface settings that I used for this example of inpainting are shown to the right. 

I changed the "seed" from 17 to 23 and got a somewhat different scene (Figure 12, below), now with a return of black hair and with Dana the redhead closer to the camera, blocking our view of the lower parts of Monica.

Figure 12. seed changed to 23

In Figure 12, we don't have to look at an amazingly long lab coat. 

redhead cloning project
I could not figure out how to get Mr. Wombo to do inpainting, but I did get the image to the right in which the irregular transparent "Leyla area" was replaced by a white rectangle. I edited the AI-generated image to add some text to the white area.

I guess Mr. Wombo really likes redheads, even when I specify "black hair".

Next: changing the ages of characters

visit the Gallery of Movies, Book and Magazine Covers


No comments:

Post a Comment