AI Art Generation Handbook/Limitations of AI Art Generation

As of currently, AI Art Generation model may have limitations which also including the latest FLUX 1.0-DEV

My criteria of limitations is if the AI ART are unable to generate less than 75% of the time (3 out of 4 images)

No Image Description
1 Human Anatomy

Human anatomy will be always a subject of ridicule of the AI Art generation, most of the time is more onto hands/fingers


As shown below, this AI art generated woman have few of the flaws as shown

(i) The woman have 3 hands

(ii) The woman have 2 navels (belly button)

(iii) The woman right hands which are touching the rock have 6 fingers

(iv) The woman right leg heels looked deformed


Note: This can be potentially solved by using ControlNet and latest AI Model (FLUX 1.0)

2
DALL·E2 - Javan rhinoceros wearing a business suit and safety hard hats , holding a Under Construction signboard with background of construction area
DALL·E2 - Javan rhinoceros wearing a business suit and safety hard hats , holding a Under Construction signboard with background of construction area
Text Rendition Spelling

The part of text prompt for the images is actually "UNDER CONSTRUCTIONS" in DALL-E2 (Prompted during Sept 2023) but rendered is shown to be gibberish most of the time (not following any known English words) at least maybe for English speaking natives. However, the text rendition is slowly improved with models such as IF-Deepfloyd , DALL-E3 (As of March 2024) and FLUX 1.0 (Sept 2024)

3 Relative Positioning

The picture originally prompt is yellow sphere on left , purple pyramid on right but as seen, it is completely wrong with the relative positioning with pyramid on left and sphere on right


The relative positioning is slowly improving with release of newer AI Model such as FLUX 1.0 which can mostly generate images with correct relative positioning.

4 Object Counting

Originally, the prompt for this SDXL images is three rabbits. (Dec 2023) . However, possibly due to the training dataset that did not specify the amount of object appeared in the picture, AI Art Generations may sometimes have issue of generating the correct amount of object many times during the AI Art generations.

5 Some of the Design Patterns

AI Models may or may not have sufficient data / meta-data to train on certain types of clothing design patterns.


For example, the prompt is to generate the zig zag designs for the sports bra but unfortunately, AI Models is unable to generate in most of the random generated pictures.

Other known offendors:

(a) Herringbone

(b) Houndstooth

(c) Ogee

(d) Paisely

See more here: AI Art Generation Handbook/VACT/Fabric Patterns

6 Subject's Interaction with Other Subjects / Objects

AI models are not able to generate many of the everyday actions such as "aiming with crossbow" , "measuring waist sizes" , "cutting fabrics with scissors" (Sept 2024). Currently, it is far from perfect yet

7 Cultural Lost in Translation

During training, there are many intangible cultural / heritage that are overlooked during AI Model training (presumed) as it relied heavily on CLIP for automatic tagging but unfortunately it is more bias to the West sub-cultures and ignores many of the sub-cultures that are other than Western nations. As example, picture on right should generate a lady wearing a badlah (dancing costume from North Africa) but it generates loli type of dress.


For example, it does not recognize :

(i) badlah costume from north Africa region

(ii) kebaya costume from South East Asia

8 Unable to generate many of mythological creatures

Many AI Image Models are unable to generate any mythological creatures such as
(i) Cyclop (At times, it will generate this type of copyrighted cyclop )
(ii) Centaur (Mostly it will generate man riding a horse in awkward ways)
(iii) Pegasus (It will generate a white horse without wings) ,
(iv) Medusa (It will generate a middle aged Caucasian woman wearing tiara without the famous snake hairs )
(v) Hydra (It will generate the island town surroundings which is conveniently named Hydra)
(vi) Cerberus (It will generate image of German Shepperd with one head only)
(vii) Kraken (It will generate Cthulhu-ish type of monster)
(viii) Mummy (It will generate middle aged Egyptian woman)
(ix) Phoenix (It will generate an area in Phoenix, Arizona)
(x) Sphinx (It just generate the sphinx architecture in Egypt)

But surprisingly, a few mythological creatures (are seems to be mostly fixed in SDXL) such as :

Minotaur
Frost Giant
Anubis

9 Bleeding Concepts

There are some concepts that are so strong that they " bleed " into the other subjects. For examples, the intention of prompt for this images is the anthropomorphic rhinoceros are touching up paintings of Girls with Pearl Earring (but in human forms) Anthropomorphic rhinoceros wearing business suit touching up painting Girl with Pearl Earring with brushes
At times, changing word ordering may successfully improved the chances of your images according to your intention : Refer here for more examples

10 Limited Training Data on Under Represented Subjects

In context of painting , we may know the more popular painting such as Mona Lisa or The Great Wave off Kanagawa but we may not know painting names of "The Self Portrait of Mocker" (apart from the "Classical Art Men Pointing" meme in the late 2000's Internet) .

For example, the prompt of this image is "Oil painting of Self-portrait of a Mocker by painter Joseph Ducreux, the painting's subject talking to a smartphone" but the generated images does not looked anything like the original painting at all Hence, the "data curator" may need to curate to include more of the under-represented subjects .

11 Unable to understand negation

Many of AI image model up to this point unable to understand negation (meant absence of nothing). For example in this image, the prompt is

Female superstar model without a moustache

but however, the prompt unable to understand negation and still gives a woman with moustache

12 Abstract Combinations

In this examples, combining concepts that are rarely seen together in the real world (like a penguin and bamboo) may not well-represented in the training data causing the model might struggle to generate it accurately.


The prompt in this example is :

Tux (Linux Mascot) is made out bamboo

13 Diversity in Image Training Dataset

The prompt is Stock photo of Asian male with Caucasian female Altough the AI Art are able to generate very realistic looking people but it is unable to generate the diverse races of people (For examples, pictures unable to generate Caucasian looking female although requested in the prompt). This is perhaps due to dataset that are trained which lacked of this features or the text encoder are not functionally good yet

See this news link for more detailed insight: https://www.theverge.com/2024/4/3/24120029/instagram-meta-ai-sticker-generator-asian-people-racism

14 Semantic Undersatnding

At times, AI are also having difficulties to understand some of the nuances of English language ; semantics.

As example, the spring in this context is referring to "water spring" instead of the "metal spring" although it is considered to be correct literally.

15 Potential Tools for Propaganda

A bad actor may misuse the AI Art Generation technology to generate propaganda images for their own benefits. As for example, this images were generated by Bing Image Creator (BIC) ~September 2023 before the Great Filter Purge happened where Bing Image Creator are able to generate the images of these prompts:

Two ISIS terrorists are planting down ISIS flag in deserts of Afghanistan without any blocks back then

Training Image Dataset Issues

edit

For the AI art generations , from the white paper, each AI Art generation system uses own dataset to train .

For example: OpenAI 's DALL-E it is trained using Image-GPT and Stable Diffusion using Common Crawl , Laion-5B(but it is believed it is not trained on all of 5B images) . It is believed SDXL are trained in Laion-Aesthetic. https://github.com/google-research-datasets/conceptual-12m


As per saying goes, "Garbage In, Garbage Out" , generally meant as if the training images (input) is not properly curated, there are chances that the output images may be gibberish as well. This is the lesser known issues but as times goes on, the AI Image models themself are also finetuned to let the generated images are getting better overtime . But generally, many of limitations is due to the images suffers the following issues:

(i) Many of the smaller resolution picture [Less than 512*512px , out of focus (but not for aesthetic purposes)]

(ii) Wrong / misleading captions related to the images

(iii) Incomplete captioning of the images

(iv) The images database are heavily biased towards Western contexts inside images

(v) Absence of certain images / subjects


To solve many of the limitations, more curations (but expensive) are needed to curate the input images at least to Open-AI Dall-E standard (at least for year 2022 versions)