Stable Diffusion WebUI

How open source Stable Diffusion WebUI displaces all paid API alternatives.

DRL Team
AI R&D Center
30 Jul 2023
10 min read
Stable Diffusion WebUI

GitHub user AUTOMATIC1111 has created a web interface using the Gradio library for Stable Diffusion, fostering an efficient local testing and validation environment. This tool, compatible with local machines and Google Colab, ensures a smooth installation process, accelerating project initiation.

This web interface has many customizable options that can be tailored to fit unique project requirements and integrate with diverse models, extensions, or scripts. It extends default capabilities by offering features like fine-tuning models, merging checkpoints, and more. This versatility and adaptability enhance the tool's utility, making it a valuable asset for developers engaged in diverse project scenarios.

Core modules and functionality

  • Checkpoints refer to pre-trained Stable Diffusion model weights required for generating images of a general type or specific to a certain genre. Models can be conveniently located on CivitAI, which can be downloaded and utilized in the web user interface. Everyone can create their dataset and train a new checkpoint with their unique style.
  • VAE (variational autoencoder) is a component of the neural network model responsible for encoding and decoding images to and from a smaller latent space. By using a VAE, you can enhance the overall quality of the generated image and reduce the likelihood of unwanted artifacts. The web interface allows for flexibility, enabling you to choose any VAE of your liking instead of being limited to the default option.
  • LoRA models are small, Stable Diffusion models that apply tiny changes to standard checkpoint models. They are usually 10 to 100 times smaller than checkpoint models. That makes them very attractive to people having an extensive collection of models. You can utilize any LoRA model in conjunction with any pre-installed checkpoint.
  • Extensions tab provides an interface to configure and manage extensions for the Stable Diffusion WebUI, which can be sourced from a URL. These extensions provide additional features, enhancing your user experience. Look at the examples in the extensions section for an idea of what's possible.
  • Scripts serve as supplementary algorithms that operate within the Stable Diffusion WebUI. They augment the image generation functionality. Once installed, these custom scripts can be accessed through the dropdown menu at the lower left of the txt2img and img2img tabs.
  • Checkpoint Merging facilitates the creation of multiple mergers using distinct models to perfect your AI images. You can blend up to three models, including models you've trained yourself. After you've combined your chosen checkpoints, the final merger will be created in the checkpoint directory.
  • Training tab enables you to construct your unique Stable Diffusion checkpoint derived from your dataset. This feature lets you preprocess your images, train an embedding or Hypernetwork, and adjust hyperparameters according to your requirements, creating a unique model.
  • Extras tab provides a convenient way to enhance the resolution of a single image or a batch of images. This can be done using one of the pre-installed neural network upscalers, or by installing a different one that best suits your specific needs.
  • PNG info tab allows for the efficient retrieval of generation parameters. Useful even for non-generated images, it swiftly sends the image and its dimensions to a specified page.

Checkpoints

Numerous checkpoints are available for generating various objects, people, and styles. These checkpoints can be found on various platforms, particularly huggingface.co and civitai.com.

Below, we would like to present the checkpoints that our team found appealing:

DreamShaper model is tailored to generate portrait illustrations that balance photorealistic and computer graphics aesthetics. Merging with fantasy models creates an alluring fantasy illustration style.

(🤓) ↦ “(masterpiece), (extremely intricate), fantasy, (((photorealistic photo of an evil hermit, male, villain, anti hero, evil face, masculine face, medium hair, Maroon hair, wicked, cruel, sinister, malicious, ruthless, masculine, athletic))), (((dark bloody clothing, intricate details on clothing))), (perfect composition:1.4), aspect ratio 1:1, beach, deviantart hd, artstation hd, concept art, detailed face and body, award-winning photography, margins, detailed face, professional oil painting by Ed Blinkey, Atey Ghailan, Jeremy Mann, Greg Manchess, Alex Gray, trending on ArtStation, trending on CGSociety, intricate, high detail, sharp focus, dramatic, award winning matte drawing cinematic lighting octane render unreal engine volumetrics dtx”; Negative Prompt: “(worst quality, low quality:1.4), canvas frame, cartoon, 3d, ((disfigured)), ((bad art)), ((deformed)),((extra limbs)),((close up)),((b&w)), weird colors, blurry, (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), video game, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, 3d render”

(🤖) ↦

Fig. 1: DreamShaper model result.

Fig. 1: DreamShaper model result.

Deliberate v2 model empowers you to craft whatever you desire, specializing in rendering lifelike illustrations, mainly photographs. To obtain better results, it is important to possess extensive knowledge across various subjects. In other words, achieving perfection necessitates offering a meticulously detailed prompt rather than relying solely on a brief statement.

(🤓) ↦ "a young woman, street, laughing, ponytails, (hdr:1.3), (muted colors:1.2), dramatic, complex background, cinematic, filmic, (rutkowski, artstation:0.8), soaking wet”; Negative Prompt: “(deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, (mutated hands and fingers:1.4), disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation"

(🤖) ↦

Fig. 2: Deliberate v2 model result.

Fig. 2: Deliberate v2 model result.

Realistic Vision v2 specializes in generating anything with a natural appearance. This model is a merge of various models focusing on photorealism, so with it, you can create beautiful images of nature, people, cars, houses, etc. For the best result, you should add additional tags to the prompt, for example, “8k uhd, dslr, high quality, Fujifilm XT3, full focus background”.

(🤓) ↦ "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot”; Negative Prompt: “(semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck”

(🤖) ↦

Fig. 3: Realistic Vision v2 model result.

Fig. 3: Realistic Vision v2 model result.

Protogen v2.2 Anime is adept at producing illustration and anime-style visuals. It's a blend of illustration, fantasy, and anime models. With camera prompt commands, your results can be more customizable for different types of camera shots and types of angles, as examples: (from_above:1.3), (from_below:1.3),(from_side:1.3),(from_behind:1.3) (learn more by the guide).

(🤓) ↦ "chalk and vermillion canvas oil painting of an old west prospector (sitting:1.2) next to a dying campfire late at night, midnight:1.5, (Johnson Moonlight Technique), clear skies, American Old West, by (Frank Tenney Johnson:1.6), by Frederic Remington, by HW Hansen, by Charles Marion Russell, by William Herbert Dunton, dramatic, beautiful and diffuse soft moonlit lighting, shadows, immersive, cool color palette”

(🤖) ↦

Fig. 4: Protogen v2.2 Anime model result.

Fig. 4: Protogen v2.2 Anime model result.

GhostMix has been trained in the style of Ghost in the Shell, a seminal 90s anime. It's particularly adept at generating cyborg and robot images. We recommend using booru tags for better interaction.

(🤓) ↦ "1mechanical girl,((ultra realistic details)), portrait, global illumination, shadows, octane render, 8k, ultra sharp,metal,intricate, ornaments detailed, cold colors, egypician detail, highly intricate details, realistic light, trending on cgsociety, glowing eyes, facing camera, neon details, machanical limbs,blood vessels connected to tubes,mechanical vertebra attaching to back,mechanical cervial attaching to neck,sitting,wires and cables connecting to head”; Negative Prompt: “NSFW,3d, cartoon, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, young, loli, elf, 3d, illustration”

(🤖) ↦

Fig. 5: GhostMix model result.

Fig. 5: GhostMix model result.

Inkpunk Diffusion is a model trained by Dreambooth with a unique illustration style. Vaguely inspired by Gorillaz, FLCL, and Yoji Shinkawa. Use nvinkpunk in your prompts.

(🤓) ↦ "nvinkpunk, masterpiece, best quality, ((sks woman)), ((detailed face)), ((award winning)), (High Detail), Sharp, 8k, trending on artstation, intricate, ((big anime eyes))”; Negative Prompt: “(((blurry))), (((artifacts))), (((duplicate))), ((mole)), ((blemish)), ((morbid)), ((wrinkles)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg ,signature, watermark, username, blurry, artist name”

(🤖) ↦

Fig. 6: Inkpunk Diffusion model result.

Fig. 6: Inkpunk Diffusion model result.

Anything v3 (v5) is one of the most popular, specifically designed models for otaku. This model is intended to produce high-quality, highly detailed anime-style with just a few prompts. It also supports booru tags to generate great images.

(🤓) ↦ "1boy, white tshirt, brown short, sitting in a deckchair under an umbrella on the beach, full body shot, looking at viewer, see in background, coconut tree on the beach, beautiful face, highly detailled eyes, masterpiece, absurdres”; Negative Prompt: “easynegative, sketch by bad-artist, worst quality, bad quality, blurry, bad anatomy”

(🤖) ↦

Fig. 7: Anything model result.

Fig. 7: Anything model result.

Waifu Diffusion is a Japanese anime-style model finetuned on 680k text-image booru samples. To generate an image, you would prompt it out with booru tags. A full list can be found here.

(🤓) ↦ "1girl, black eyes, black hair, black sweater, blue background, bob cut, closed mouth, glasses, medium hair, red-framed eyewear, simple background, solo, sweater, upper body, wide-eyed” #style

(🤖) ↦

Fig. 8: Waifu Diffusion model result.

Fig. 8: Waifu Diffusion model result.

dvArch are the multi-prompt architecture-tuned models. A great solution for visualizing interiors and exteriors. Highly recommended for use with keywords such as: dvArchModern, dvArchGothic, dvArchVictorian.

(🤓) ↦ “dvArchModern, 85mm, f1.8, portrait, photo realistic, hyperrealistic, orante, super detailed, intricate, dramatic, sunset lighting, shadows, high dynamic range”; Negative Prompt: “signature, soft, blurry, drawing, sketch, poor quality, ugly, text, type, word, logo, pixelated, low resolution, saturated, high contrast, oversharpened” #buildings

(🤖) ↦

Fig. 9: dvArch model result.

Fig. 9: dvArch model result.

OpenJourney v4 is a model for transforming basic prompts into impressive digital art. Trained on 124k+ Midjourney v4 images and return very close Midjourney AI results. Good option if you want to try the open-source equivalent of Midjourney.

(🤓) ↦ “pixar portrait 8 k photo, beautiful shiny white rich galactic prima ballerina clowncore russian cyborg college girl, golden ratio details, sci-fi, fantasy, cyberpunk, intricate, decadent, highly detailed, digital painting, ever after high, octane render, artstation, concept art, smooth, sharp focus, illustration, art by artgerm, loish, wlop”; Negative Prompt: “easynegative” #style

(🤖) ↦

Fig. 10: OpenJourney v4 model result.

Fig. 10: OpenJourney v4 model result.

Modelshoot is a Dreambooth model focusing on full to medium-body shots, emphasizing a fashion-shoot aesthetic. Use modelshoot style in your prompt to incite the effect. It can be used to create new, unusual, and interesting images of celebrities.

(🤓) ↦ “modelshoot style Julia Stiles in 10 things i hate about you” #style

(🤖) ↦

Fig. 11: Modelshoot model result.

Fig. 11: Modelshoot model result.

VinteProtogenMix model produces vivid, colorful images. It renders sharply defined colors and creates visually appealing characters. This is a mix of some powerful models like Protogen v2.2 and vintedois-diffusion.

(🤓) ↦ “colorpop style, a photographic mugshot portrait of the sun surrounded by orbiting planets moons, golden spiky hair, natalie shau, mark ryden, alberto seveso, brooke shaden, anna dittmann, flora borsi, 8k resolution, perfect composition, milky way”; Negative Prompt: “canvas frame, cartoon, 3d, ((disfigured)), ((bad art)), ((deformed)),((extra limbs)), ((extra barrel)),((close up)),((b&w)), weird colors, blurry, (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), (((tripod))), (((tube))), Photoshop, video game, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, 3d render, (((umbrella)))” #character

(🤖) ↦

Fig. 12: VinteProtogenMix model result.

Fig. 12: VinteProtogenMix model result.

Dreamlike Diffusion 1.0 is a model fine-tuned on high-quality art by Dreamlike.art. This model works better with non-square aspect ratios to create cool portrait and landscape photos. Add dreamlikeart if the art style is too weak.

(🤓) ↦ “dreamlikearta girl in biopunk style, photo realistic, 8k resolution, masterpiece, octane render, unreal 5, 35mm film lense, 50-megapixels, cinematic lighting, midjourney style”; Negative Prompt: “frame, duplicate, watermark, signature, text, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft, weapons, heterochromia, dots” #style

(🤖) ↦

Fig. 13: Dreamlike Diffusion 1.0 model result.

Fig. 13: Dreamlike Diffusion 1.0 model result.

Mo Di Diffusion is a Disney-style model trained on screenshots from a famous animation studio. Using this model, you can reproduce cult characters in the animation style or transfer new ones to it. Use the tokens modern disney style in your prompts for the effect.

(🤓) ↦ “modern disney, baby lion”; Negative Prompt: “person, human” #style

(🤖) ↦

Fig. 14: Mo Di Diffusion model result.

Fig. 14: Mo Di Diffusion model result.

RPG model card focuses on Role Playing Game portrait similar to Baldur's Gate, Dungeons and Dragons, Icewindale, and a more modern style of RPG characters. The model results from various iterations of the merge pack combined with Dreambooth Training.

(🤓) ↦ “beautiful (swedish:1.3), Full body armor made of demon (demon armor:1.3), 1 man, (insanely detailed:1.5), light particle, (mist on the floor:1.3), ((solo)), (highest quality, Alessandro Casagrande, Greg Rutkowski, Sally Mann, concept art, 4k), (colorful), (high sharpness), ((detailed pupils)), red eyes, ((painting:1.1)), (digital painting:1.1), detailed face and eyes,Masterpiece, best quality, highly detailed photo:1, 8k, detailed face,photorealistic, dark and gloomy, fog, thunder background, By jeremy mann, by sandra chevrier, by maciej kuciara,((male demon)), sharp, ((perfect body)), realistic, real shadow, 3d, ((squatting warrior)), ((dark and gloomy castle background:1.3)), thunder sky, thunder, (by Michelangelo), king pose:1,1, side view:1,1”; Negative Prompt: “(bad art, low detail, pencil drawing:1.6), (plain background, grainy, low quality, mutated hands and fingers:1.5), (watermark, thin lines:1.3), (deformed, signature:1.2), (big nipples, blurry, ugly, bad anatomy, extra limbs, undersaturated, low resolution), disfigured, deformations, out of frame, amputee, bad proportions, extra limb, missing limbs, distortion, floating limbs, out of frame, poorly drawn face, poorly drawn hands, text, malformed, error, missing fingers, cropped, jpeg artifacts, teeth, unsharp” #style

(🤖) ↦

Fig. 15: RPG model result.

Fig. 15: RPG model result.

Fantassified Icons generates icons inspired by fantasy games with mostly plain backgrounds, useful for creating assets. Not about people, for cool items and nice icons.

(🤓) ↦ “a potion containing void magic, containing cosmic nebulae, high quality, 8k” #objects

(🤖) ↦

Fig. 16: Fantassified Icons model result.

Fig. 16: Fantassified Icons model result.

Product Design is trained images with a “minimalistic” style. 75% were products, and 22% were transportation, most of these are speakers, smart robots, water bottles, and others. This tool is very useful for the design process (shapes-ideas generation).

(🤓) ↦ “3D product render, futuristic kettle, finely detailed, purism, ue 5, a computer rendering, minimalism, octane render, 4k”; Negative Prompt: “EasyNegative, (worst quality:2), (low quality:2), (normal quality:2), lowres, ((monochrome)), ((grayscale)), cropped, text, jpeg artifacts, signature, watermark, username, sketch, cartoon, drawing, anime, duplicate, blurry, semi-realistic, out of frame, ugly, deformed, ((multiple objects))” #objects

(🤖) ↦

Fig. 17: Product Design model result.

Fig. 17: Product Design model result.

fking_scifi is trained on multiple sci-fi/fantasy concepts for a general but limited sci-fi model. It has been trained on armor concepts, futuristic backgrounds, androids, robots, and some fantasy stuff, like werecreatures.

(🤓) ↦ “fking_scifi, award-winning photo portrait of a werecreature wereturtle turtle, wearing a black and silver armor, forest swamp mangrove trees background, large head, intricate details”; Negative Prompt: “drawing, painting, ugly, deformed award-winning photo portrait of a werecreature wereturtle turtle, wearing a black and silver armor, forest swamp mangrove trees background, large head, intricate details” #concept

(🤖) ↦

Fig. 18: fking_scifi model result.

Fig. 18: fking_scifi model result.

Pulp is a high-quality model trained in Public Domain, CC0 licensed images from pulp magazine covers of the 30s and 40s with titles like Startling Mystery, Dime Detective, Horror Stories, and Uncanny Tales. It excels at producing images with multiple characters, shocked expressions, and action but can be coaxed into applying its visual styles to various subjects.

You can swap or combine various magazine titles in your prompts to adjust color palettes and tones.

(🤓) ↦ “sd-pulp a giant rock monster attacking a city, throwing rocks, startling mystery, cover art, illustration”; Negative Prompt: “logo, wordmark, signature, title, text, words, magazine cover, cartoon, bad eyes” #action

(🤖) ↦

Fig. 19: Pulp model result.

Fig. 19: Pulp model result.

Conclusions

The Stable Diffusion Web UI is a powerful platform that provides an easy-to-use interface for model training, inference, and integration of various extensions or scripts. In addition to generating images, it can also create videos. The tools we discussed earlier are just a small part of what this platform can do, highlighting its vast potential. With its extensive range of features and flexibility, it offers users limitless opportunities to explore diffusion models.

Have an idea? Let's discuss!

Book a meeting
Yuliya Sychikova
Yuliya Sychikova
COO @ DataRoot Labs
Do you have questions related to your AI-Powered project?

Talk to Yuliya. She will make sure that all is covered. Don't waste time on googling - get all answers from relevant expert in under one hour.
OR
Send us a note
Optional
File requirements pdf, docx, pptx
dataroot labs logo
Copyright © 2016-2024 DataRoot Labs, Inc.