2022.06.23

dall-e mini collage gazelle camera vader mermaid zeppelin tree monster car desert
Craiyon


Cattle


Pickle molly

The latest popular neural application is Dall-e mini. Seeing a (now) niche video game like PUBG in Cattle's message blew me away - I'm used to fixed datasets like VGG16 and hot dog/not hot dog. I checked out the web interface:

craiyon dall-e mini neural network borderlands bloodborne

Sure enough, the browser app could produce a style transfer-like image based on text input. And it seemingly had the labeling breadth to know words associated with video games.

craiyon dall-e mini neural network claptrap borderlands
Someone thought Claptrap was worth labeling.

And it's quick; a couple minutes to generate nine renditions of your input text. Compare that to style transfer having to crunch the numbers and backprop error to modify the source image.

I tried a few more, adding style prefixes like 'steampunk' and 'graffiti' and 'escher' and 'cyber'. Whoa:

thumbnail craiyon dall-e mini neural network escher flag thumbnail craiyon dall-e mini neural network cyber wolf thumbnail craiyon dall-e mini neural network fear loathing thumbnail craiyon dall-e mini neural network lamborghini vespa
thumbnail craiyon dall-e mini neural network bugatti tuktuk
thumbnail craiyon dall-e mini neural network escher flowchart thumbnail craiyon dall-e mini neural network rainbow vader thumbnail craiyon dall-e mini neural network satanic tree
thumbnail craiyon dall-e mini neural network investment meme
thumbnail craiyon dall-e mini neural network wave blueprint thumbnail craiyon dall-e mini neural network graffiti claptrap borderlands
thumbnail craiyon dall-e mini neural network steampunk zeppelin

The results are hit-or-miss and (as with most ML stuff) the output is a 200-something-pixel square. But it's awesome to see what a steampunk smartphone looks like. Or a nuclear martini. Or a flowchart created by MC Escher (or really just any PM, amirite?).

craiyon dall-e mini neural network steampunk iphone craiyon dall-e mini neural network steampunk iphone craiyon dall-e mini neural network steampunk iphone
craiyon dall-e mini neural network steampunk iphone craiyon dall-e mini neural network steampunk iphone craiyon dall-e mini neural network evil tree
craiyon dall-e mini neural network investment meme craiyon dall-e mini neural network investment meme craiyon dall-e mini neural network wave
craiyon dall-e mini neural network escher flowchart craiyon dall-e mini neural network escher flowchart craiyon dall-e mini neural network escher flowchart
craiyon dall-e mini neural network turian craiyon dall-e mini neural network anime asari craiyon dall-e mini neural network spongebob squarepants
craiyon dall-e mini neural network nuclear martini craiyon dall-e mini neural network rainbow vader craiyon dall-e mini neural network steampunk dolphin
craiyon dall-e mini neural network steampunk dolphins craiyon dall-e mini neural network steampunk whale craiyon dall-e mini neural network steampunk zeppelin
craiyon dall-e mini neural network motorcycle craiyon dall-e mini neural network flag craiyon dall-e mini neural network gallows


The model

Dall-e mini diagram bart vqgan clip

I haven't looked into the architecture too much, but the model is a combination of text processing, GAN image generation, and a postprocessor.

Dall-e mini language model image model diagram

I originally thought the inputs were like tags (data labels), but the text processing is pretty robust.
Sandbox/offline

dall-e mini neural network ai graphics card
Neural graphics card (per Dall-e).

There's a substantially more capable flavor of Dall-e Mini called Dall-e Mega that requires considerably more GPU memory. So, I needed an offline version. There was a github repo called Dalle Playground that seemed to have a Hello World.

Offline, I mean it

The sandbox code was written to produce a web client, probably so you can make and monetize your own craiyon.com. I extracted the stuff needed to run it like: python cmd.py --text "A taco that poops ice cream".
import argparse
import base64
import os
from pathlib import Path
import time

from utils import parse_arg_boolean, parse_arg_dalle_version
from consts import ModelSize

from dalle_model import DalleModel
dalle_model = None

parser = argparse.ArgumentParser(description = "A DALL-E app to turn your textual prompts into visionary delights")
parser.add_argument("--text", default = 'llama', help = 'Input strings')
parser.add_argument("--model", type = parse_arg_dalle_version, default = ModelSize.MINI, help = "Mini, Mega, or Mega_full")
parser.add_argument("--count", default = 6, help = 'Generate count')

args = parser.parse_args()

print(f"{time.strftime('%H:%M:%S')} Creating model")
dalle_model = DalleModel(args.model)
print(f"{time.strftime('%H:%M:%S')} Created model")


def generate_images():
    print(f"{time.strftime('%H:%M:%S')} Generating")
    generated_imgs = dalle_model.generate_images(args.text, args.count)
    print(f"{time.strftime('%H:%M:%S')} Generated")

    returned_generated_images = []
    dir_name = os.path.join("./",f"{time.strftime('%Y%m%d_%H%M')}_{args.text}")
    Path(dir_name).mkdir(parents=True, exist_ok=True)
    
    for idx, img in enumerate(generated_imgs):
        img.save(os.path.join(dir_name, f'{idx}.png'), format="png")


generate_images()

print("Done")

Toolchain

When I ran my cmd.py I got messages about running in CPU mode. This simply will not do for ML applications.

The dependencies were mostly automatic (on Linux), but the library that was skipping my graphics card was jaxlib.

I ran the pip command suggested on the jax site:
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
But the dependency resolver installed an old cuda version that the git code couldn't handle.

I located the index of wheels so I could install the specific cuda/cudann flavor.

Nope:
pip install https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.10+cuda11.cudnn82-cp38-none-manylinux2014_x86_64.whl
ERROR: jaxlib-0.3.10+cuda11.cudnn82-cp38-none-manylinux2014_x86_64.whl is not a supported wheel on this platform.
The cuda and cudann versions were right:
% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5
So what was the "not supported"?

Oh. "cp38" is the cython version. Thanks, error message.
pip install cython
And I was good to go. The model loads in like a minute and each image takes a few seconds to generate.
Meanderings

Airships and styles

I started off with zeppelins and airships. Adding 'steampunk', 'watercolor', 'in a desert', 'in the mountains'. I like the fantasy and Da Vinci vibes.

dall-e mini neural network airship watercolor dall-e mini neural network airship dall-e mini neural network airship watercolor
dall-e mini neural network airship steampunk dall-e mini neural network airship monochrome dall-e mini neural network airship desert
dall-e mini neural network airship desert dall-e mini neural network airship fog dall-e mini neural network airship night mist
dall-e mini neural network airship zeppelin dall-e mini neural network airship diagram dall-e mini neural network airship forest
dall-e mini neural network airship forest dall-e mini neural network airship mountain

Animals and more styles

I tried some animals, with 'watercolor', 'plastic', and 'lineart'. The weimaraners don't look particularly photorealistic, but the artistic ones are kind of neat.

dall-e mini neural network deer dall-e mini neural network deer dall-e mini neural network gazelles
dall-e mini neural network plastic horse dall-e mini neural network lineart horse dall-e mini neural network plastic horse
dall-e mini neural network seal sand dall-e mini neural network butterfly sunset dall-e mini neural network horse cloud
dall-e mini neural network lion lineart dall-e mini neural network zebra dall-e mini neural network giraffes
dall-e mini neural network weimaraners

Topical graphics

Putting on my kilroy hat, I thought this might be a way to generate icons and graphics for posts about beer exploration or photography or a graphics card built to do neural processing (and not just vector math).

dall-e mini neural network beer dall-e mini neural network beer dall-e mini neural network beer
dall-e mini neural network beer dall-e mini neural network beer dall-e mini neural network beer
dall-e mini neural network camera slr lineart dall-e mini neural network camera slr lineart dall-e mini neural network camera slr

Lamborrari

Photorealism isn't great on the mini model. And wheels are hard. That said, at a glance, the vehicles generated by the model look like they could be cleaned up and turned into a clay model. Dall-e has the lines and the supercar intakes right. I can see some of these in the following dataset:
dall-e mini neural network sports car dall-e mini neural network hummer desert dall-e mini neural network sports car
dall-e mini neural network sports car dall-e mini neural network sports car dall-e mini neural network sports car
dall-e mini neural network sports car dall-e mini neural network sports car dall-e mini neural network sports green hummer desert
dall-e mini neural network motorcycle race track dall-e mini neural network motorcycle lineart dall-e mini neural network motorcycle lineart

I wonder if Fiero+Ferrari looks like the body kit cars.

Claptrap again

Okay but how cool is the result for "claptrap pajamas"? And is that a Bad Robot Claptrap?

dall-e mini neural network borderlands claptrap pajamas dall-e mini neural network borderlands claptrap dall-e mini neural network borderlands claptrap robot

The neural wheelhouse

The blurring and uncanny valley aspects of GAN images makes Dall-e great for generating horrifying pictures. From tentacles to sea creatures to apocalyptic angels, Dall-e knows how to be creepy.

dall-e mini neural network horror monster dall-e mini neural network horror people dall-e mini neural network horror people
dall-e mini neural network horror monster dall-e mini neural network horror person hunter thompson dall-e mini neural network horror person
dall-e mini neural network horror monster dall-e mini neural network horror cthulhu monster dall-e mini neural network horror monster
dall-e mini neural network horror alien dall-e mini neural network horror monster dall-e mini neural network horror artistic
dall-e mini neural network horror monster nuclear dall-e mini neural network horror doll monster dall-e mini neural network horror mothman
dall-e mini neural network horror monster dall-e mini neural network horror monster dall-e mini neural network horror monster
dall-e mini neural network horror monster dall-e mini neural network horror alien dall-e mini neural network horror people

Dialing back the scariness

Unlike horror images, standard fictional fare doesn't look quite so good in blurry. Adding art style keywords like 'lineart' made things less fuzzy.

dall-e mini neural network mythical dragon fire dall-e mini neural network mythical angel dall-e mini neural network mythical angel
dall-e mini neural network mythical angel dall-e mini neural network mythical mermaid dall-e mini neural network mythical centaur
dall-e mini neural network mythical soldier dall-e mini neural network mythical soldier dall-e mini neural network mythical knight
dall-e mini neural network mythical dragon fire dall-e mini neural network mythical dragon dall-e mini neural network mythical dragon
dall-e mini neural network mythical baby yoda dall-e mini neural network mythical pegasus dall-e mini neural network mythical
dall-e mini neural network mythical dall-e mini neural network mythical samurai dall-e mini neural network mythical mermaid
dall-e mini neural network mythical dall-e mini neural network mythical pegasus dall-e mini neural network mythical alien
dall-e mini neural network mythical dall-e mini neural network mythical dall-e mini neural network mythical mermaid
dall-e mini neural network mythical dall-e mini neural network monster dall-e mini neural network monster
dall-e mini neural network monster

Other stuff

Dall-e does well with landscapes and abstract art.

dall-e mini neural network abstract dall-e mini neural network abstract dall-e mini neural network lineart pose
dall-e mini neural network anime character dall-e mini neural network desert dall-e mini neural network desert
dall-e mini neural network desert dall-e mini neural network desert

More Darth Vader

With sharp edges and simple details, the galaxy's finest pod racer looks good in the neural domain.

dall-e mini neural network star wars darth vader dall-e mini neural network star wars darth vader dall-e mini neural network star wars darth vader
dall-e mini neural network star wars darth vader dall-e mini neural network star wars darth vader dall-e mini neural network star wars darth vader
Next?

dall-e mini neural network horror angel

Having set up the offline code with the goal of running the mega version of the model, it looks like my 6gb graphics card is 2g short. There are workarounds, but the easiest one is to not have such a crap video card. I'm considering pulling the trigger on a 3080, but might sit tight for the 4000s to drop.

In the meantime, the offline interface is pretty flexible for scripting or whatever.