AI Art Application and Improvements Handbook

This AI Art Application and Improvements Handbook is intended to help people create free useful media for the public domain using AI art generators in practice with a focus getting things done in practice at all skill-levels. It informs about notable potential and existing applications and equips the reader with information about how to best implement these specific applications.

Launching edit

There are many ways you can use these tools. Main ways include:

You can install Stable Diffusion locally if you have a good graphics card. Whether that is a good idea depends on your hardware and needs. If you do so, AUTOMATIC1111 WebUI is probably the most advanced software to use but alternatives are listed here and also have benefits.
You can use a web platform like playgroundai.com to use it online (on many sites that is possible for free)
You can use an extension for an art software like for Krita or for Photoshop

Prompts edit

Which prompts work best differs by AI generator. The promptomania prompt builder is a great place to get started with prompts and to have a cheatsheet of different art styles one could use. It is missing many styles but may become more complete over time and be good enough for learning purposes. Many sites such as openart.ai and playgroundai.com let you see many other filterable/searchable images along with their prompts which you could build upon and learn from.

Here is a further comprehensive resource and here a list of resources for Stable Diffusion. You can use style studies (selected comprehensive ones: 1 2 3 4) to learn more about which styles you could use and could combine multiple styles. However, which style to use it not the tricky part or necessary to learn, you could just add phrases like "comic style", "3D render", "matte painting" to the prompt. When sites offer pre-made styles they usually just attach several terms to the end of the prompt.

Misgenerations and creating improved versions edit

As you can see below there still are some issues with these images. People who have better AI art skills may be able to generate much better images. Usually one may need to do slight manual editing.

Moreover, over time these images could be improved by their uploaders or other people using for example tools including:

the the Clipdrop cleanup tool
inpainting (requires some skills)
AI art web platforms' [# face restoration]
upscaling features
manually editing the images in image editors like GIMP or Photoshop
recreating the image using the same or similar prompts (example)
AI text removal tools (example)
…

If you can improve an existing image or an image you uploaded earlier on Wikimedia Commons, upload it as a new version, not as a separate new file. If the image has text, it can be removed via the listed ways. However, to prevent text from being anywhere in the image is best to use negative prompts, albeit that can be problematic for example when you'd like to generate a street scene with store texts being visible in the background. This is a good example of a specific skill to learn when generating AI art: creating texts that fit neatly into the image.

You need to continuously adjust the prompt until you get good results, sometimes and at some point it is better to just generate a new image from the same prompt rather than adjust the prompt (make sure the seed is set to random and not always the same except if you want to make the image look like the one just generated).

You can also generate a new image from the image just generated via img2img and then put it underneath the newly generated image as a layer in GIMP. Then cut out the upper layer to have the former visible at the places where you'd like it to (example).

Negative prompt edit

If you see things in your generated image that you don't want there or anticipate that the AI generator may add them or misunderstand your prompt in certain ways add these as negative prompt terms.

Examples of useful negative prompt terms to use when you generate…

humans: extra fingers (TBA)
rooms: picture frame, frames

Add more terms as unwanted things show up when you generate to exclude them from the next images. You can also use a result in img2img and try to remove the unwanted parts e.g. by using the prior prompt but an additional negative prompt term if cleanup tools don't remove such well.

Parameters edit

Some images have their parameters specified. A step count of around 40 often yields best results. Setting the prompt strength too high such as over 10 makes it more difficult to get a good picture.

Differences between generators edit

Stable Diffusion is open source so that one is recommended and focused on here. However, Midjourney may as of 2023 often generate better images in many cases and DALL-E probably as well in some or many cases. A difference between SD and DALL-E for example is that in SD the prompts are phrased like tags separated by commas, not whole sentences or similar. See these pages for a comparison between software results for the same prompts as well as the style studies linked above.

Applications where AI art can be useful edit

Paleoart of the ancient past edit

AI art can be used to create realistic-looking scenes that depict the past either how we it may have looked like to the best of our knowledge, for example including high-resolution depictions of extinct ancient organisms. For accuracy, substantial skills are required. For such images, img2img techniques can be used.

Base image 1 (see WMC cat Paleoart)
Base image 2
Base image 1

In the first example parts of the leg were cut out so it looks like the organism is walking through the fern.

It may also be possible to use tools like DreamBoth to train AIs on a set of images or even 3D models that depict ancient organisms like a species of dinosaurs.

Do not use images in Commons:Category:Inaccurate paleoart as a base and add Commons:Template:Factual accuracy to any image you know is inaccurate. Note that the generated images could also turn out to be inaccurate even if the base image is thought to be inaccurate where they require that template as well. Good paleo knowledge combined with good AI generation skills may be required to be able to generate or transform base images to realistic paleoart.

Most currently available paleoart only depicts the extinct organism (e.g. a dinosaur), but does not place them into an environment of flora and fauna that is theorized to have existed at the same time. Those images that do are usually low resolution. One exception is this image which shows how such scenes could look like.

Ancient and archaic humans lived in caves and/or did not have civilizational lifestyle for hundreds of thousands of years. Despite of that, there was not even one high-resolution image in the public domain that depicts how daily life may or is thought to have looked like or could have looked like for most of human existence, or at least none on WMC. This started to change by the emergence of advanced AI image generators in the 2020s, the two images below are two of three in Commons:Category:Ancient and archaic humans in art where mere facial reconstructions are not included:

Base image 1

Good anthropological knowledge may be required to be able to create an image that is not clearly inaccurate and likely a realistic depiction. For example, a major flaw is that AI art generators are likely to generate hairstyles that were impossible to highly unlikely in the deep past of pre-humans and ancient humans. See also "Inaccurate paleoart" on WMC. Crowd-reviewing systems and practices may evolve that provide feedback so that AI art engineers can modify their images according to best available scientific knowledge. Future developments may enable combination of paleontological data and tools and paleoart techniques with AI art software to enable more accurate and useful images. For now, if you do not have good anthropological knowledge try to collaborate with somebody who has before putting your image out in the public domain for other people to use.

Caricatures and public characters edit

In the 2020s it became more easily possible to create artworks using public characters due to the emergence of AI art generators like Stable Diffusion.

This

it democratized the creation of caricatures and political art
enabled problematic online misinformation
enabled humorous art using known characters, including fictional characters (prime example: 'Harry Spotter')

It works well with some specific public characters without any kind of extra training. Some of these are well-known to be easily generatable in realistic-looking ways such as Vladmir Putin.

One example use-case is to generate a portrait of a person and the background in ways that is linked to that person such as art illustrating scientific theories for scientists or art styles for artists like the Vincent van Gogh image kind of hints at.

At the same time it can be a problem to use specific characters in specific environments, for example the generators then generate that person multiple times rather than only once or also make the person show up in picture frames. This may change with future generators where you e.g. can specify where the person is located or how often. Keep that in mind when creating your prompts; there also are many options to solve such issues beyond negative prompts such as cutting the generated person out and placing the person into an image.

Others require fine-tuning using tools and techniques like DreamBooth, the first image below is made with Stable Diffusion/Imagine without any kind of extra training and the second used DreamBooth, where the second looks much more realistic regarding Jimmy Wales' face:

Humorous societally- and historically- critical art
After DreamBooth training

The reasons for why some famous characters do not look realistic with current models without extra training are unknown and that may change over time.

It also allowed the creation of videos with public characters:

These abilities have scratched the sensitivities of some religious people and worries of political elites regarding democratized political art.

And it also enables democratized artistic depictions of historic public characters which can e.g. be used for humorous images, higher-resolution portraits, innovative/creative combinations, or realistic AI art for historical scenes:

After a prior deletion, ~first digital image showing anachronism in art
Napoleon
Rendlesham Forest UFO incident

It can also be used to create art depicting people not commonly featured in high-quality art such as specific scientists which are usually not the subject of art and fiction with an exception of e.g. the movies 'The Theory Of Everything' and 'Oppenheimer':

Mendeleev
Aristotle

Historical scenes edit

AI art can be used to create realistic-looking scenes that depict the past either how we it may have looked like to the best of our knowledge or how stories depict it. The latter may also include images for imaginary stories of the past, illustrating how imaginaries of past people may have looked like in more visual ways.

Whether or not there are still some minor glitches may not matter very much when you're interested in visualizing for example how ordinary daily life experienced by average people may have looked like in high resolution or when creating the first image of some historical events that are in the public domain rather than locked away.

Using tools like DreamBoth one can train AIs on a set of images based on a historical figure. Below are some examples which may deviate somewhat from how Ferdinand II of Naples (Ferrandino d'Aragona) looked like at an older age according to the artistic drawing that is the first image here and the second image that was drawn a whole hundred years after he died:

AI art generators usually make people look better so as you can see it may often deviate from existing images of a character. However, if you provide more training hidata or the AI generator is well trained on it (which is sometimes the case for some currently famous people), then the characters may look more realistic.

Instead of making the file focus on the character it would be better to focus on the historical event or the historical scene. For example, the image could portray how a village in the Middle Ages may have realistically looked like at high resolution.

It can also be used to create high-resolution realistic images of historical figures in realistic or unrealistic settings.

As just explained, AI generators still have problems with generating faces and other issues. Please keep that in mind since correcting that can require significant skills and may limit the usefulness or realism of the images.

Images can also focus on historical events entirely without any kind of historic character, realistic or not, in the foreground.

Educational games edit

AI art can be used to generate the images for board games, for example for the cards. These can be educational games or otherwise useful. Note that in such cases you should only generate the image, not full cards because the text for example will be gibberish.

How educational children's books could be styled

Objects and topics for which no free media is available edit

For example it can show how pulp science fiction comics looked like or how what a science fiction subgenre is about or what the styles and themes of it are. It can illustrate how a certain style or object looks like and other things but it requires a disclaimer that the image is AI-generated. One way this can be useful is showing people which media is currently missing but would be useful in terms of the concept.

Illustrating contents of books edit

Illustrating the world of the book 'The Windup Girl'
Children's book "13th prophecy"
Ubik
Ubik (similar to some covers and depicting a main subject of the book with the text gibberish fixed)

For the last image, text was removed with a text removal tool as listed above and then added via GIMP.

Styles merged and adopted edit

Art meant to depict a civilization adopting an old art style in a modern scifi context
Beksinski style adopted and merged with other styles to a) illustrate contents of a book and b) depict a surreal broken-reality-type Philip K Dickean dreamworld as a subgenre of surreal art

Illustrating technologies, ideas and concepts edit

Especially useful if no other or only low-quality images are available for the concept

First illustration of artifacts / colonies on comets in the context of technosignatures
Illustration of a technological innovation (could also be used to illustrate prototypes / mock-ups)
Illustration of a mythological being adopted to modern scifi as done earlier in some studies, AI golem waiting for tasks
Illustration of 'science fantasy' and arcology, among my first HD illustrations of the genre
Nearly first illustration of Solarpunk and sustainable city design illustration
High-resolution illustration of contemporary post-apocalyptic art
Concept of robotic aliens
Illustration of the cyberpunk genre, a street scene without e.g. neon lights which are present in most if not all comparable free media depicting the genre
Concept of a collapsed civilization on another planet (the being is either a human visitor, could be removed, or a convergent evolution bipedal alien)
Green city urban planning & solarpunk illustration
First online library / shadow library artistic representation
Cooking robot, and slightly humorous, thereby possibly best image for Cooking robots
Same but less realistic and more gritty; used in many WP articles
First illustration of an embodied or metaphorical AI using a computer
Illustration of embodied or metaphorical AI being airgapped and engaging with humanity & human science+philosophy
First commons illustration of vampire Dracula after an earlier image by the same user was deleted
Almost first commons illustration of the post-apocalyptic art scifi genre
(same)

Entertainment and creative thinking edit

Creative children's games and sketches edit

Pro imagination creative AI art kids game

Children could make drawings, then use these drawings as image input for img2img generation, describing what the image is intended to show. The child's description is then used for the prompt that is added to the sketch input. This may enable children to build up their creativity and imaginative skills.

There could be an app for that where voice input is possible or adults could help kids where the kids first make the sketch and the adult takes a photo and asks what it's supposed to show so that the AI art generates images which the child can refine and use as inspiration for further images, for modifications to the image and feedback and so on. It reduces the level of cognitive and technical minimum skills required for artistic engagement enabling novel ways of imaginative play, especially for children.

As part of games edit

Beyond more accessible card art design and similar applications, the AI art generation itself could be part of games. These are simply entertaining and could also raise skills of AI art generation.

Sketch Wars

Multiple (e.g. two) players take turns at generating an AI art image by altering the prompt or writing a new prompt. The starting player draws a scene, a being, or something similar. The second tries to generate an image where what is depicted is turned on its head or is changed in another specified way such as being destroyed or successfully defeated. One can either take turns and the first image where the specified intention was achieved wins that round or the second player could have multiple tries with the best outcome being a successful image at first try. This works best when the prompt is only changed and not completely replaced so that the object is similar, one may also specify the seed to remain the same.

Concept Guessing

Similar to the word guessing game Taboo, people must create an image that enables others to quickly correctly guess the concept they are trying to depict. Multiple specified terms can't be used in the prompt. Only e.g. three tries are allowed for the image and the concepts aren't as simple as "tree" but relatively difficult to visualize.

Known problems and current state of avoiding them edit

There are many ways known problems could get identified and fixed or mitigated. These include:

Updates to the AI txt2txt image generator software
Models specifically tuned for specific purposes, especially Stable Diffusion ones since that software is open source; see Citivai Models
Manual improvements during prompting or via inpainting, img2img, and image editor software

Often models make basic conceptual misunderstandings such as intermingling fungi with nuclear mushroom clouds when prompting 'nuclear mushroom'
Putting the prompted things into a picture frame and/or interchanging contents of an image
Creating the same person multiple times
Screwed-up hands, duplicate limbs, and unrealistic device screens

Whether or not and which of such problems will persist is unknown and has not yet been thoroughly investigated. At one point it may be possible to for example use Wikidata items instead of words. For example there is work on user-provided concepts (like an object or a style) learned from few images so that these concepts (e.g. objects or styles) can be incorporated via the newly associated word/s.