Craiyon
Cattle |
Pickle molly
|
The latest popular neural application is
Dall-e mini. Seeing a (now) niche video game like PUBG in
Cattle's message blew me away - I'm used to fixed datasets like VGG16 and hot dog/not hot dog. I checked out the
web interface:
Sure enough,
the browser app could produce a style transfer-like image based on text input. And it seemingly had the labeling breadth to know words associated with video games.
|
Someone thought Claptrap was worth labeling. |
And it's quick;
a couple minutes to generate nine renditions of your input text. Compare that to style transfer having to crunch the numbers and backprop error to modify the source image.
I tried a few more, adding style prefixes like 'steampunk' and 'graffiti' and 'escher' and 'cyber'. Whoa:
The results are hit-or-miss and (as with most ML stuff)
the output is a 200-something-pixel square. But it's awesome to see what a steampunk smartphone looks like. Or a nuclear martini. Or a flowchart created by MC Escher (or really just any PM, amirite?).
The model
I haven't looked into
the architecture too much, but the model is
a combination of text processing, GAN image generation, and a postprocessor.
I originally thought the inputs were like tags (data labels), but the text processing is pretty robust.
Sandbox/offline
|
Neural graphics card (per Dall-e). |
There's
a substantially more capable flavor of Dall-e Mini called Dall-e Mega that requires considerably more GPU memory. So, I needed an offline version. There was a github repo called
Dalle Playground that seemed to have a Hello World.
Offline, I mean it
The sandbox code was written to produce a web client, probably so you can make and monetize your own craiyon.com. I extracted the stuff needed to run it like: python cmd.py --text "A taco that poops ice cream".
import argparse
import base64
import os
from pathlib import Path
import time
from utils import parse_arg_boolean, parse_arg_dalle_version
from consts import ModelSize
from dalle_model import DalleModel
dalle_model = None
parser = argparse.ArgumentParser(description = "A DALL-E app to turn your
textual prompts into visionary delights")
parser.add_argument("--text", default = 'llama', help = 'Input strings')
parser.add_argument("--model", type = parse_arg_dalle_version, default =
ModelSize.MINI, help = "Mini, Mega, or Mega_full")
parser.add_argument("--count", default = 6, help = 'Generate count')
args = parser.parse_args()
print(f"{time.strftime('%H:%M:%S')} Creating model")
dalle_model = DalleModel(args.model)
print(f"{time.strftime('%H:%M:%S')} Created model")
def generate_images():
print(f"{time.strftime('%H:%M:%S')} Generating")
generated_imgs = dalle_model.generate_images(args.text, args.count)
print(f"{time.strftime('%H:%M:%S')} Generated")
returned_generated_images = []
dir_name = os.path.join("./",f"{time.strftime('%Y%m%d_%H%M')}_{args.
text}")
Path(dir_name).mkdir(parents=True, exist_ok=True)
for idx, img in enumerate(generated_imgs):
img.save(os.path.join(dir_name, f'{idx}.png'), format="png")
generate_images()
print("Done")
Toolchain
When I ran my cmd.py
I got messages about running in CPU mode. This simply will not do for ML applications.
The dependencies were mostly automatic (on Linux), but
the library that was skipping my graphics card was jaxlib.
I ran the pip command suggested on the jax site:
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-
releases/jax_cuda_releases.html
But the dependency resolver installed an old cuda version that the git code couldn't handle.
I located the
index of wheels so I could
install the specific cuda/cudann flavor.
Nope:
pip install https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.
10+cuda11.cudnn82-cp38-none-manylinux2014_x86_64.whl
ERROR: jaxlib-0.3.10+cuda11.cudnn82-cp38-none-manylinux2014_x86_64.whl is
not a supported wheel on this platform.
The cuda and cudann versions were right:
% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5
So what was the "not supported"?
Oh. "cp38" is the cython version. Thanks, error message.
And I was good to go.
The model loads in like a minute and each image takes a few seconds to generate.
Meanderings
Airships and styles
I started off with zeppelins and airships. Adding 'steampunk', 'watercolor', 'in a desert', 'in the mountains'. I like the
fantasy and Da Vinci vibes.
Animals and more styles
I tried some animals, with 'watercolor', 'plastic', and 'lineart'.
The weimaraners don't look particularly photorealistic, but the artistic ones are kind of neat.
Topical graphics
Putting on my kilroy hat, I thought this might be
a way to generate icons and graphics for posts about beer exploration or photography or a graphics card built to do neural processing (and not just vector math).
Lamborrari
Photorealism isn't great on the mini model. And wheels are hard. That said,
at a glance, the vehicles generated by the model look like they could be cleaned up and turned into a clay model. Dall-e has the lines and the supercar intakes right. I can see some of these in the following dataset:
- 455 mixed with an Aston
- Ferrari California/Dodge Viper
- Cyberpunk car
- RX-7 mixed with a 787B
I wonder if Fiero+Ferrari looks like the body kit cars.
Claptrap again
Okay but how cool is the result for "claptrap pajamas"? And is that a Bad Robot Claptrap?
The neural wheelhouse
The blurring and uncanny valley aspects of GAN images makes Dall-e
great for generating horrifying pictures. From tentacles to sea creatures to apocalyptic angels, Dall-e knows how to be creepy.
Dialing back the scariness
Unlike horror images, standard fictional fare doesn't look quite so good in blurry.
Adding art style keywords like 'lineart' made things less fuzzy.
Other stuff
Dall-e does well with landscapes and abstract art.
More Darth Vader
With sharp edges and simple details, the galaxy's finest pod racer looks good in the neural domain.
Next?
Having set up the offline code with the goal of running the mega version of the model, it looks like
my 6gb graphics card is 2g short. There are workarounds, but the easiest one is to not have such a crap video card. I'm considering pulling the trigger on a 3080, but might sit tight for the 4000s to drop.
In the meantime, the offline interface is pretty flexible for scripting or whatever.
Some posts from this site with similar content.
(and some select mainstream web). I haven't personally looked at them or checked them for quality, decency, or sanity. None of these links are promoted, sponsored, or affiliated with this site. For more information, see
.