AI Art Generation Handbook/Limitations of AI Art Generation
Currently, there are some known limitations of AI Art Generation . (Including latest SDXL 1.0)
My criteria of limitations is if the AI ART are unable to generate less than 75% of the time (3 out of 4 images)
No | Image | Description |
---|---|---|
1 | Human Anatomy
Human anatomy will be always a subject of ridicule of the AI Art generation
(i) The woman have 3 hands (ii) The woman have 2 navels (belly button) (iii) The woman right hands which are touching the rock have 6 fingers (iv) The woman right leg heels looked deformed
| |
2 | Text SpellingThe text rendered is shown to be incorrect (not following any of the English words) but it seems gibberish, at least maybe for English speaking natives.
| |
3 | Relative PositioningThe picture originally prompt is yellow sphere on left , purple pyramid on right
but as seen, it is completely wrong with the relative positioning with pyramid on left and sphere on right | |
4 | Object Counting
Possibly due to the dataset that did not specify the amount of object for the dataset training, AI Art Generations may sometimes have issue of generating the correct amount of object many times during the AI Art generations. | |
5 | Some of the Design Patterns
Stable Diffusion may or may not have sufficient data / meta-data to train on certain types of clothing design patterns.
Other known offendors: (a) Herringbone (b) Houndstooth (c) Ogee (d) Paisely See more here: AI Art Generation Handbook/VACT/Fabric Patterns | |
6 | Cultural Lost in Translation
During training, there are many intangible cultural / heritage that are overlooked during Stable Diffusion training (presumed) as it relied heavily on CLIP for automatic tagging but unfortunately it is more bias to the West sub-cultures and ignores many of the sub-cultures that are other than Western nations. As example, picture on right should generate a lady wearing a badlah (dancing costume from North Africa) but it generates loli type of dress.
For example, it does not recognize : (i) badlah costume from north Africa region (ii) kebaya costume from South East Asia | |
7 | Unable to generate many of mythological creaturesStable Diffusion are unable to generate any mythological creatures such as (i) Cyclop (At times, it will generate this type of cyclop ) But surprisingly, a few mythological creatures (are seems to be mostly fixed in SDXL) such as : Minotaur | |
8 | Potential Tools for Propaganda
A bad actor may misuse the AI Art Generation technology to generate propaganda images for their own benefits. As for example, this images were generated by Bing Image Creator (BIC) ~September 2023 before the Great Filter Purge happened where Bing Image Creator are able to generate the images of these prompts: "Two ISIS terrorists are planting down ISIS flag in deserts of Afghanistan" without any blocks back then |
Steps to overcome edit
For the AI art generations , from what I understand, each AI Art generation system uses own dataset to train .
For example: OpenAI 's DALL-E it is trained using Image-GPT and Stable Diffusion using Common Crawl , Laion-5B. It is believed SDXL are trained in Laion-Aesthetic
As per saying goes, "Garbage In, Garbage Out" , while the more images are better but generally, many of limitations is due to the images suffers the following issues:
(i) Many of the smaller resolution picture (Less than 512*512px , out of focus (but not for aesthetic purposes)
(ii) Wrong / misleading captions related to the images
(iii) Incomplete captioning of the images
(iv) The images database is more to Western contents.
To solve many of the limitations, more curations (but expensive) are needed to curate the images at least to Open-AI Dall-E standard (at least for year 2022 versions)