Infopost | 2022.11.19

Stable diffusion elden ring

Last time I got Stable Diffusion's Hello World going (with some video card and Python IDE side quests). This time: prompts and parameters.
Strength

Stable diffusion Valentino Rossi Ducati Laguna Seca

The strength parameter is a 0.0-1.0 value that determines how creative the model will be with the image/text inputs. Here's the above photo of Vale with a descriptive prompt demonstrating strength values from 0.1-0.9:

Stable diffusion Valentino Rossi Ducati Laguna Seca Stable diffusion Valentino Rossi Ducati Laguna Seca
Stable diffusion Valentino Rossi Ducati Laguna Seca Stable diffusion Valentino Rossi Ducati Laguna Seca
Stable diffusion Valentino Rossi Ducati Laguna Seca Stable diffusion Valentino Rossi Ducati Laguna Seca
Stable diffusion Valentino Rossi Ducati Laguna Seca Stable diffusion Valentino Rossi Ducati Laguna Seca
Stable diffusion Valentino Rossi Ducati Laguna Seca

As the machine learning model is given more freedom to redraw the scene, Vale winds up back on a Yamaha, but on a bike that doesn't look quite right. Of course, this is just one run at each strength, another strength 0.9 run might give you something entirely different (but derived from the input image and generally following the prompt).
Prompts

Both txt2img and img2img use textual prompts to re/create images. Lexica.art has a wealth of examples that help provide empirical guidance on prompt writing.

Artist

Stable diffusion reddit meme Will Smith Greg Rutkowski

The Andy's Blog post from last time mentioned that he saw significantly better results when supplying one or more specific artists in the prompt. Either because his name yields good results or maybe because it's a meme, a large chunk of Lexica uses Greg Rutkowski in its prompt. I'd never heard of Greg Rutkowski until trying out Stable Diffusion, but sure enough he's quite prolific on Artstation which I think was one of Stable Diffusion's sources of tagged imagery.

Format

Stable diffusion meme reddit prompt

I've read a variety of things about comma-separated tokens, natural language, and ordering by importance. The near-nondeterminism of deep learning makes it difficult to draw any conclusions on this.
The joy of AI painting using txt2img

With the basics in mind, I tried a bunch of different prompts, mostly producing 512x512 images that I've downscaled a little bit for this site. In the context of Style Transfer and Dall-e mini, the takeaway from the image dimensions is that the output images are significantly larger than a 256x256 but still require additional work to be postcard size.

An abandoned factory

Starting simple...

ink illustration of an abandoned factory.  retrofuturistic, photorealistic.


Stable diffusion abandoned factory Stable diffusion abandoned factory

One looks like a traditional black and white photo, the other looks like architecturalish drawing. Both have signs of abandonment. One looks modern, the other looks brutalist.

A yellow starfighter

Next I juiced it up a bit, specifying subject color and a background. I pulled a random artist name from Lexica and whatever cgsociety is. And I went rectangular, probably along the wrong axis (as starfighters tend to be horizontal not that it matters in space).

python scripts/txt2img.py 
   --prompt "A highly detailed painting of a futuristic starfighter.  
             Yellow body, glowing cockpit.  Concept art by ian mcque, 
             cgsociety.  Sci-fi.  Stars and nebulas can be seen in the 
             background." 
   --n_samples 8 
   --H 640 --W 320

Stable diffusion yellow starfighter specart Stable diffusion yellow starfighter specart Stable diffusion yellow starfighter specart
Stable diffusion yellow starfighter specart Stable diffusion yellow starfighter specart

Some of these look like 70s sci-fi novel cover art. Cool.

CGI tools

Some of the prompts specified a rendering platform (Unreal Engine, Maya, etc.), I think rendering and postprocessing tools are one of the tags in Artstation.

octane rendering of a colorful wormhole.  soft lighting, cyberpunk, 
hyper realistic.

Stable diffusion wormhole

It even has a watermark.

Vocabulary

Six months ago the PUBG crew was wowed that Dall-e knew about battle royales. Stable diffusion does too:

Stable diffusion pubg battlegrounds uaz Stable diffusion pubg battlegrounds uaz
Stable diffusion pubg battlegrounds uaz Stable diffusion pubg battlegrounds dacia
Stable diffusion pubg battlegrounds uaz

Skaggy style

Likewise, SD knew about Claptrap and Borderlands art style. I used another random Artstation creator from Lexica.

unreal engine rendering of claptrap from borderlands in a field.  
grainy, solarpunk, smooth render. in the style of andrei riabovitchev.

Stable Diffusion claptrap borderlands Stable Diffusion claptrap borderlands
Stable Diffusion claptrap borderlands Stable Diffusion claptrap borderlands
Stable Diffusion claptrap borderlands

Looking at Andrei's work the Borderlands elements seem to have dominated the style input.

Twist of the wrist

Trying some more subject/scene interplay:

source filmmaker cgi rendering of an aprilia motorcycle with blue accents
in a postapocalyptic waterfall.  cyberpunk, photorealistic.

Stable diffusion motorcycle waterfall cyberpunk

It doesn't look like a Prilly, but SD nailed the postapocalyptic waterfall. Other than some subtle details, the bike looks really good. Another:

cross-processed photograph of a ducati in a town square.  dark, futuristic,

clean composition.

Stable diffusion motorcycle Stable diffusion motorcycle

Go with what it knows

SD (as I understand it) is trained on a boatload of real images with human-applied tags. If you've seen the Silicon Valley where Erlich enlists Big Head's college class to tag datasets for him, it's unrewarding work. The neat thing about using Artstation (and others?) as a datasource is that the imagery there is tagged and ready for deep learning consumption.

So while Stable Diffusion might have to guess what a banana with a goatee looks like, it can recreate isomorphic things with considerably more certainty. Likewise, if every instance of something looks very different ("draw me a vehicle") an SD prompt might yield a mashup of those things. But a more refined command like "draw me a Jaguar XJ220" will give it more specific criteria to generate from.

Further, let's consider a Ferrari, here's some mild hyperbole that probably dicatates how well a deep learning algorithm can understand one:
So 'car' is bad, 'Bugatti' is good, 'Toyota' is okay. Your mileage may vary.

Monochrome photograph of a Lamborghini parked in the rain.  Contrasty.
Highly detailed but smooth.

Stable diffusion lamborghini rain Stable diffusion lamborghini rain Stable diffusion lamborghini rain
Stable diffusion lamborghini rain
The joy of AI repainting using img2img

Switching gears (lmao) to the other side of Stable Diffusion, prompts are just as important to img2img.

Drifter

Stable Diffusion Nissan 350Z drifter

I grabbed a photo from the archive and asked Stable Diffusion to recreate it in something contextually-similar: Gran Turismo. Writing a specific prompt (not sure if this is good or bad) was a lot easier since I just had to describe what's in the image.

cgi rendering in the gran turismo engine.  nissan 350z drifter on a
racetrack. tires smoking, front wheels at opposite lock.  stickers 
cover the scuffed bodywork.

Stable Diffusion Nissan 350Z drifter gran turismo style Stable Diffusion Nissan 350Z drifter gran turismo style
Stable Diffusion Nissan 350Z drifter gran turismo style Stable Diffusion Nissan 350Z drifter gran turismo style
Stable Diffusion Nissan 350Z drifter gran turismo style Stable Diffusion Nissan 350Z drifter gran turismo style
Stable Diffusion Nissan 350Z drifter gran turismo style Stable Diffusion Nissan 350Z drifter gran turismo style

It's definitely Gran Turismo-like. That last car's spoiler seems to be glitching though.

Archangel

Stable Diffusion Garrus Mass Effect

Since Stable Diffusion knows about Borderlands, I thought I'd try a screencap of everyone's favorite Turian police sniper? Just to throw it a curveball, I asked for a plate of nachos.

detailed 35mm photograph of garrus vakarian from mass effect.  standing in

front of colorful artwork, holding a plate of nachos.

Stable Diffusion Garrus Mass Effect Stable Diffusion Garrus Mass Effect plate of nachos
Stable Diffusion Garrus Mass Effect

Photo -> cgi worked pretty well with the 350Z, cgi -> photo wasn't so good. 33% nacho rate though. Not bad.

AI green screen

Stable diffusion paintball

Back to contextually-similar, recognizable things, I asked Stable Diffusion to reimagine a paintball photo as a highly specific war photograph.

world war 1 photograph of troops crossing no man's land wearing gas masks.
smoke billows in the background.  mortars explode nearby.

Stable diffusion paintball ww1 style Stable diffusion paintball ww1 style
Stable diffusion paintball ww1 style Stable diffusion paintball ww1 style
Stable diffusion paintball ww1 style

Faces and hands

Stable diffusion meme drawing hands

Things frequently go awry when Stable Diffusion tries to do faces and hands, with the exception of passport-headshot-style photos that are probably well-represented in its dataset. Generated words resemble text but aren't coherent (see above where Valentino Rossi is riding an "OIHNJOD").

Stable diffusion surfer
I wanted to see how a photo would look as a painting.

digital painting of a surfer sitting in the lineup.  wearing a shark tooth
necklace because he's a kook.  very detailed, art by greg rutkowski.  
pastel colors.  green water, blue sky.

Stable diffusion surfer Stable diffusion surfer Stable diffusion surfer
Stable diffusion surfer Stable diffusion surfer

Stable Diffusion was pretty shy about redrawing the subject's face. Other things turned out rather incoherent. I said Greg Rutkowski, not GWB.
Iterating img2img

Stable diffusion Nikon SLR Stable diffusion Nikon SLR
Stable diffusion Nikon SLR

You guessed it, img2img outputs can be pruned and fed back into the algo with an identical or modified prompt. Here are some (not the same prompt):

Stable diffusion pangolin Stable diffusion pangolin
Stable diffusion pangolin Stable diffusion pangolin
Stable diffusion pangolin Stable diffusion starship specart
Stable diffusion starship specart Stable diffusion starship specart
Stable diffusion tree Stable diffusion tree
A generative pipeline

Stable diffusion lamborghini

Now for an end-to-end run: txt2img followed by numerous passes of img2img with refined prompts.

cgi rendering of a gray lamborghini in the rain.  contrasty, epic,
hyper realistic.

Draw me some lambos.

Stable diffusion lamborghini Stable diffusion lamborghini
Stable diffusion lamborghini Stable diffusion lamborghini
Stable diffusion lamborghini Stable diffusion lamborghini
Stable diffusion lamborghini Stable diffusion lamborghini

Oh yeah, AI can't draw wheels very well either. Let's go with the low-angle one in the rain.

Time to fire up img2img. Monochrome is nice and all, but these cars deserve bright colors so let's change the prompt a little:

Colorful cgi rendering of a bright green Lamborghini in the rain.
Contrasty. Highly detailed but smooth.

Stable diffusion lamborghini Stable diffusion lamborghini fail
Stable diffusion lamborghini Stable diffusion lamborghini
Stable diffusion lamborghini Stable diffusion lamborghini
Stable diffusion lamborghini

Those aren't really all that colorful and one of the photos looks like a fireplace. Hmmm. Maybe the one with the neat reflection. Let's keep the bright green and instead put the lambo in its natural environment: the garage. Just kidding, the track.

Colorful cgi rendering of a bright green Lamborghini at Suzuka Raceway
in Japan.  Contrasty.  Highly detailed but smooth.

Stable diffusion lamborghini Stable diffusion lamborghini
Stable diffusion lamborghini Stable diffusion lamborghini
Stable diffusion lamborghini

I see some grandstands and some safety barriers. The one with the short depth of field is pretty awesome though the car's front right wheel well may have disappeared. On the plus side, the car is truly bright green with some neat orange accents on the grill. Let's try a different color scheme and maybe see if Stable Diffusion will put a muscle car air cleaner on a Lamborghini frunk (golf clubs need high-psi ventilation too).

Colorful cgi rendering of a Lamborghini in the orange and blue Gulf Oil
racing lervery.  On track at Suzuka Raceway in Japan.  Hood as a large
air cleaner hood scoop for its supercharger.  Contrasty.  Highly detailed
but smooth.

Stable diffusion lamborghini Stable diffusion lamborghini
Stable diffusion lamborghini Stable diffusion lamborghini

We're still pretty green and not at all looking like a GT40 lookalike. Either 'Gulf Oil livery' isn't well represented in the dataset or we left the strength value too low in trying to preserve our work so far. Stable Diffusion did get creative with a Bugatti-ish redesign.

But what if the car was not on a racetrack in Japan but actually in the Swiss Alps? Well then it'd probably need to be a Lamborghini trophy truck.

Colorful cgi rendering of a lifted Lamborghini trophy truck racing in the
snow in the Swiss Alps.  Contrasty.  Highly detailed but smooth.

Stable diffusion lamborghini trophy truck Stable diffusion lamborghini
Stable diffusion lamborghini Stable diffusion lamborghini

We got a couple of low-poly trophy trucks, a near-redraw, and a neat version on a snowy city street. And this is how you can meander for hours without touching a single paintbrush or OpenGL API call.
Additional checkpoints

Stable Diffusion Nissan 350Z drifter elden ring postapocalypse

So this nitrosocke guy created a checkpoint (set of trained models) using Elden Ring images. I envision this as being like transfer learning wherein final model is a generalized network (like a trained VGG-19) that is adapted to a specific dataset.

The Elden Ring style checkpoint is cool and worth dedicating some post to, but it's even more than that. The Elden Ring outputs are really good. That is, the images are more distinct, coherent, and stylistically-accurate than stuff generated by the base model. Of course it would make sense that a specialized network is better at its job.

There's one caveat that I'll mention after some Elden Ring txt2img:

Stable diffusion elden ring Stable diffusion elden ring
Stable diffusion elden ring Stable diffusion elden ring
Stable diffusion elden ring Stable diffusion elden ring
Stable diffusion elden ring Stable diffusion elden ring

Caveat: Elden Ring characters often armor covering their face and hands so it sidesteps SD's weakness.

Stable diffusion surfer elden ring Stable diffusion surfer

Trading Torrent out for a motorcycle

Stable Diffusion Ducati Supersport

(Torrent is your horse in Elden Ring.) Back to transfer learning, since the Elden Ring checkpoint derives from a more generalized base model, it knows what a motorcycle is.

Stable Diffusion Ducati Supersport elden ring style Stable Diffusion Ducati Supersport elden ring style
Stable Diffusion Ducati Supersport elden ring style Stable Diffusion Ducati Supersport elden ring style
Stable Diffusion Ducati Supersport elden ring style Stable Diffusion Ducati Supersport elden ring style
Stable Diffusion Ducati Supersport elden ring style Stable Diffusion Ducati Supersport elden ring style
Stable Diffusion Ducati Supersport elden ring style Stable Diffusion Ducati Supersport elden ring style

Skyrim Thomas the Tank Engine mod
Source. Kind of like the Thomas mod for Skyrim.

Elden Ring-Mass Effect crossover

Armor makes the jump pretty easily from sci-fi to fantasy.

Stable Diffusion Garrus Mass Effect Stable Diffusion Garrus Mass Effect elden ring style
Stable Diffusion Garrus Mass Effect elden ring style Stable Diffusion Garrus Mass Effect elden ring style
Stable Diffusion Garrus Mass Effect elden ring style Stable Diffusion Garrus Mass Effect elden ring style
Stable Diffusion Garrus Mass Effect elden ring style Stable Diffusion Garrus Mass Effect elden ring style
Stable Diffusion Garrus Mass Effect elden ring style Stable Diffusion Garrus Mass Effect elden ring style
Stable Diffusion Garrus Mass Effect elden ring style Stable Diffusion Garrus Mass Effect elden ring style

Caelid Highway

There were some interesting takes on the Z-car photo, including one that had Gran Turismo in the prompt.

Stable Diffusion Nissan 350Z drifter Stable Diffusion Nissan 350Z drifter brush strokes
Stable Diffusion Nissan 350Z drifter stylized Stable Diffusion Nissan 350Z drifter stylized
Stable Diffusion Nissan 350Z drifter gran turismo style

Paintball with scythes

The paintball image was easy for SD to redraw in Elden Ring style.

Stable diffusion paintball Stable diffusion paintball elden ring style
Stable diffusion paintball elden ring style Stable diffusion paintball elden ring style
Stable diffusion paintball elden ring style Stable diffusion paintball elden ring style
Stable diffusion paintball elden ring style Stable diffusion paintball elden ring style
Stable diffusion paintball elden ring style Stable diffusion paintball elden ring style
Code

Stable diffusion dog comic style

The canned examples are pretty good for one-off runs, but consistency isn't great. A successful generative process requires a lot of runs and a lot of variation. The Stable Diffusion sample scripts are small enough to modify easily, but do require some work (e.g. nested withs that have you starting lines at column 60). I made the following mods:



Related - internal

Some posts from this site with similar content.

Post
2022.11.14

Stable Diffusion setup

Getting Stable Diffusion up and running on Ubuntu 22.04 with an RTX3080 Ti.
Post
2022.06.23

Dall-e

Experimenting with Dall-e text inputs, moving a sandbox example offline, and troubleshooting jax/cuda.
Post
2022.08.03

Keras cheat sheet

Examples of keras merging layers, convolutional layers, and activation functions in L, RGB, HSV, and YCbCr.

Related - external

Risky click advisory: these links are produced algorithmically from a crawl of the subsurface web (and some select mainstream web). I haven't personally looked at them or checked them for quality, decency, or sanity. None of these links are promoted, sponsored, or affiliated with this site. For more information, see this post.

404ed
shruggingface.com

How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog - In this post, we walk through my entire w...

How I Used Stable Diffusion and Dreambooth to Create A Painted Portrait of My Dog - In this post, we walk through my entire workflow/process for bringing Stable Diffusion to life as a high-quality framed art print. Well touch on making art with Dreambooth, Stable Diffusion, Outpainting, Inpainting, Upscaling, preparing for print with Photoshop, and finally printing on fine-art paper with an Epson XP-15000 printer.
404ed
davidyat.es

Adventures in latent space

I first wrote about Stable Diffusion last August, shortly after its initial public release, when Stability AI transformed LLM-based image generation from a black-box gimmick behind a paywall into something that could be used and built on by anyone with a Github account.
404ed
insidr.ai

The 13+ Best AI Image Generators (2024) - AI generated Art

Best AI Image Generator tools: comparison and review of AI tools. Learn how to use AI for image & art creation, AI generated images and artwork, how to use AI.

Created 2024.03 from an index of 146,332 pages.