hot	top
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]
[a]	[+]	[a]	[+]

Infopost | 2022.06.23

dall-e mini collage gazelle camera vader mermaid zeppelin tree monster car desert

Craiyon

Cattle

Pickle molly

The latest popular neural application is Dall-e mini. Seeing a (now) niche video game like PUBG in Cattle's message blew me away - I'm used to fixed datasets like VGG16 and hot dog/not hot dog. I checked out the web interface:

craiyon dall-e mini neural network borderlands bloodborne

Sure enough, the browser app could produce a style transfer-like image based on text input. And it seemingly had the labeling breadth to know words associated with video games.

craiyon dall-e mini neural network claptrap borderlands

Someone thought Claptrap was worth labeling.

And it's quick; a couple minutes to generate nine renditions of your input text. Compare that to style transfer having to crunch the numbers and backprop error to modify the source image.

I tried a few more, adding style prefixes like 'steampunk' and 'graffiti' and 'escher' and 'cyber'. Whoa:

The results are hit-or-miss and (as with most ML stuff) the output is a 200-something-pixel square. But it's awesome to see what a steampunk smartphone looks like. Or a nuclear martini. Or a flowchart created by MC Escher (or really just any PM, amirite?).

The model

Source.

I haven't looked into the architecture too much, but the model is a combination of text processing, GAN image generation, and a postprocessor.

Dall-e mini language model image model diagram

I originally thought the inputs were like tags (data labels), but the text processing is pretty robust.

Sandbox/offline

dall-e mini neural network ai graphics card

Neural graphics card (per Dall-e).

There's a substantially more capable flavor of Dall-e Mini called Dall-e Mega that requires considerably more GPU memory. So, I needed an offline version. There was a github repo called Dalle Playground that seemed to have a Hello World.

Offline, I mean it

The sandbox code was written to produce a web client, probably so you can make and monetize your own craiyon.com. I extracted the stuff needed to run it like: python cmd.py --text "A taco that poops ice cream".

import argparse
import base64
import os
from pathlib import Path
import time

from utils import parse_arg_boolean, parse_arg_dalle_version
from consts import ModelSize

from dalle_model import DalleModel
dalle_model = None

parser = argparse.ArgumentParser(description = "A DALL-E app to turn your
textual prompts into visionary delights")
parser.add_argument("--text", default = 'llama', help = 'Input strings')
parser.add_argument("--model", type = parse_arg_dalle_version, default =
ModelSize.MINI, help = "Mini, Mega, or Mega_full")
parser.add_argument("--count", default = 6, help = 'Generate count')

args = parser.parse_args()

print(f"{time.strftime('%H:%M:%S')} Creating model")
dalle_model = DalleModel(args.model)
print(f"{time.strftime('%H:%M:%S')} Created model")


def generate_images():
    print(f"{time.strftime('%H:%M:%S')} Generating")
    generated_imgs = dalle_model.generate_images(args.text, args.count)
    print(f"{time.strftime('%H:%M:%S')} Generated")

    returned_generated_images = []
    dir_name = os.path.join("./",f"{time.strftime('%Y%m%d_%H%M')}_{args.
    text}")
    Path(dir_name).mkdir(parents=True, exist_ok=True)
    
    for idx, img in enumerate(generated_imgs):
        img.save(os.path.join(dir_name, f'{idx}.png'), format="png")


generate_images()

print("Done")

Toolchain

When I ran my cmd.py I got messages about running in CPU mode. This simply will not do for ML applications.

The dependencies were mostly automatic (on Linux), but the library that was skipping my graphics card was jaxlib.

I ran the pip command suggested on the jax site:

pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-
releases/jax_cuda_releases.html

But the dependency resolver installed an old cuda version that the git code couldn't handle.

I located the index of wheels so I could install the specific cuda/cudann flavor.

Nope:

pip install https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.
10+cuda11.cudnn82-cp38-none-manylinux2014_x86_64.whl
ERROR: jaxlib-0.3.10+cuda11.cudnn82-cp38-none-manylinux2014_x86_64.whl is
not a supported wheel on this platform.

The cuda and cudann versions were right:

% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5

So what was the "not supported"?

Oh. "cp38" is the cython version. Thanks, error message.

pip install cython

And I was good to go. The model loads in like a minute and each image takes a few seconds to generate.

Meanderings

Airships and styles

I started off with zeppelins and airships. Adding 'steampunk', 'watercolor', 'in a desert', 'in the mountains'. I like the fantasy and Da Vinci vibes.

Animals and more styles

I tried some animals, with 'watercolor', 'plastic', and 'lineart'. The weimaraners don't look particularly photorealistic, but the artistic ones are kind of neat.

Topical graphics

Putting on my kilroy hat, I thought this might be a way to generate icons and graphics for posts about beer exploration or photography or a graphics card built to do neural processing (and not just vector math).

Lamborrari

Photorealism isn't great on the mini model. And wheels are hard. That said, at a glance, the vehicles generated by the model look like they could be cleaned up and turned into a clay model. Dall-e has the lines and the supercar intakes right. I can see some of these in the following dataset:

455 mixed with an Aston
Ferrari California/Dodge Viper
Cyberpunk car
RX-7 mixed with a 787B

I wonder if Fiero+Ferrari looks like the body kit cars.

Claptrap again

Okay but how cool is the result for "claptrap pajamas"? And is that a Bad Robot Claptrap?

dall-e mini neural network borderlands claptrap pajamas

dall-e mini neural network borderlands claptrap

dall-e mini neural network borderlands claptrap robot

The neural wheelhouse

The blurring and uncanny valley aspects of GAN images makes Dall-e great for generating horrifying pictures. From tentacles to sea creatures to apocalyptic angels, Dall-e knows how to be creepy.

Dialing back the scariness

Unlike horror images, standard fictional fare doesn't look quite so good in blurry. Adding art style keywords like 'lineart' made things less fuzzy.

Other stuff

Dall-e does well with landscapes and abstract art.

More Darth Vader

With sharp edges and simple details, the galaxy's finest pod racer looks good in the neural domain.

Next?

Having set up the offline code with the goal of running the mega version of the model, it looks like my 6gb graphics card is 2g short. There are workarounds, but the easiest one is to not have such a crap video card. I'm considering pulling the trigger on a 3080, but might sit tight for the 4000s to drop.

In the meantime, the offline interface is pretty flexible for scripting or whatever.

tags:

◄

2022.06.18

Carnage

More investment carnage, turf boot carnage, and sci-fi carnage.

2022.06.25

Decided

Reflecting on the SCOTUS decisions from this week.

►

Related / internal

Some posts from this site with similar content.

2022.11.14

Stable Diffusion setup

Getting Stable Diffusion up and running on Ubuntu 22.04 with an RTX3080 Ti.

2022.11.19

Prompts

Experimenting with prompts in Stable Diffusion.

2020.11.29

Mods

Since it was just the two-ish of us, Jes and I went to the Lodge for Thanksgiving lunch.

Related / external

Risky click advisory: these links are produced algorithmically from a crawl of the subsurface web (and some select mainstream web). I haven't personally looked at them or checked them for quality, decency, or sanity. None of these links are promoted, sponsored, or affiliated with this site. For more information, see this post.

me.micahrl.com

Improving argparse documentation generation

All rituals restricted. All rites reserved.

n9o.xyz

How to Run Stable Diffusion On Your Laptop · N9O

In the last year, several machine learning models have become available to the public to generate images from textual descriptions. This has been an interesting development in the AI space. However, just recently did this technology became available for everyone to try.

dusty.phillips.codes

Python: Loading pathlib Paths with argparse

I really appreciate Python's pathlib module for managing filesystem stuff. While I don't love the argparse module for command line parsing, I don't think it's worse than other available options. I usually choose it for my CLI scripts, since nothing else is good enough to overcome the inertia of using a third party library. Not many people seem to be aware that the two can very easily be combined such that argparse will return Path objects instead of strings that need to be adapted aft...

Created 2025.06 from an index of 779,420 pages.

hot

top

Content

navigation