AI Art Generation Handbook/Prompting in Stable Diffusion Style/GUI Interface
Assuming you successfully installed Stable Diffusion after following instructions from here, you can see the following screen as followed:
Tabs | Functions / Descriptions |
---|---|
text2img | This tab is where the text typed in (known as prompt from here onward) are magically turned into images that more or less fit the descriptions (The results will usually looked different/similar totally from what you imagined) |
img2img | This tab is where usually a simple sketch accompanied by prompt to guide how the image will looked like later on (Precursor to ControlNet) |
Extras | This tab is were the images that are to be enlarged can be done here |
PNG Info | This tab is where you can recover the info of settings used (prompt, seed, CFG Scale, etc...) that are used based on metadata
Note1: If the image source is from Reddit/Facebook, the metadata is usually striped clean and no usable info will be retrieved Note2: This will only works from images generated from Automatic1111 or SD.Next |
Checkpoint Merger | If you want to merge multiple models without doing any model training , you can use this tabs |
Train | This is to be trained using TI (Textual Inversion) methods |
Settings | This tab is where all of SD settings are here |
Extensions | This tab is where the extensions is managed . See here for more details |
To start to do AI art generation in Stable Diffusion, just type any (yes, anything) that you had in mind. inside the first field text.
Just remember that there are a maximum limit of 75 tokens for Stable Diffusion.
So, you may wonder what is a token?
A token is a sequence of characters that represents a single unit of meaning in a text. It is a fundamental concept in NLP, as most NLP models operate on a token level, meaning they process text one token at a time.
To understand tokens, let's consider the following sentence: "The quick brown fox jumped over the lazy dog." In this sentence, each word is a token. Each token has its own meaning, and together they convey the meaning of the entire sentence, where each word is a separate token.
In the context of AI language models, tokens are often created by a process called tokenization, which involves breaking down a text into individual tokens or words. This process can involve removing punctuation, lowercasing the text, and dealing with other special cases, such as contractions.
Once a text has been tokenized, the tokens can be further processed and analyzed by an AI language model.
Face Restoration
editFor the images generations of human faces, it is highly recommended to use Codeformer (instead of GPFGAN)
Template
editHere are some of my sample template of generating the following
Change the name of parameters enclosed in < > brackets.
Target | Sample prompt | Negative prompt |
---|---|---|
Plausible realistic human face | A realistic photographs of <ethnicity> wearing <type of wardrobe wear> , <describe activity> in <describe place> | Cartoon, anime, drawing, sketches, CGI |
Product photoshoot as seen in e-commerce website | High quality professional studio product photoshoot of <products> product , ((white backgrounds isolated)) , (isometric view) | cluttered, off centered, cropped, collage, montage, grid, series, human |
Mermaid with underwater effect shots | realistic photo of beautiful <rthicity> mermaid tail, , partial underwater shot, lower body in water , lower half frame underwater. upper half frame sky, blue sky | nude, leg ,upper legs , lower legs , split tails, conjoined tails |
Images set in outer space | cinematic, dark lighting, high resolution, sharp focus |