Using a PyTorch DCGAN for Generating Unique Character and Fashion Silhouettes

Paul Brzeski
5 min readSep 8, 2022

As a digital artist I tend to prefer working in vector format and as such I’ve been a keen user of Adobe Illustrator and now Affinity Designer. In the course of all my project and professional work, I’ve been accumulating vectors from all across the web and some even from magazine CD’s. This resource library is a launching off point for many of my ideas so I was keen to see what I could do with them in a machine learning context. It made sense to try to reproduce my normal design workflow so I set out to create a dataset of people so that the neural network could learn to draw its own silhouettes.

Black and white silhouettes of people
It took two weekends, but I extracted 156 silhouettes of people from different files, saved them all as SVGs and exported the SVGs to PNG. Using Mogrify to compose the image with subject in the middle and make them all the same sizes. Was really pleased how simple these open source tools made managing so many files.

My first round of experiments I followed a DCGAN tutorial from PyTorch to build a custom little script, this worked well enough as a proof of concept. I gave it a couple of dozen runs couldn’t get past convergence failure (I think..). I went looking for answers online and was able to kick it along the road for a little longer by running it in grayscale mode instead of RGB but it was clear I’d likely reached the limit my dataset could do with that simpler script.

Improvements to the dataset might have helped the experiment — for example just 156 images is actually quite small for machine learning. I also only trained for less than 10,000 epochs in total which is not a lot. For comparison’s sake some of today’s popular datasets are composed of millions of images and train for well over 100,000 epochs. The dataset is also being loaded up in the laziest way possible — I’m just pointing to the folders the images are stored in and there are no labels. In future I might integrate something like BetterLoader in order to automatically convert the folder and file name structure (I put hard work into!) so it’s automatically used.

Wanting to try some other techniques, I tried to setup a project called StudioGAN as it contained a lot of popular ones all in one system. Unfortunately I found myself fighting battle after battle getting the dependencies working so I just gave up and decided to focus on the best one it had in it which was StyleGAN3. The main version wouldn’t run on my computer but I was able to get it going with a fork.

Running StyleGAN3 for a bit longer (as it needed it to get somewhere), it seemed to get pretty close to exactly what I wanted but eventually succumbed to convergence failure as well. There’s a chance I could keep going with StyleGAN2-ADA as it’s reportedly better at handling small datasets but in the course of researching other frameworks I’ve begun to realise my vision with the silhouettes might be aiming too low for the computational cost of this approach. Given it takes so much time and so much hardware, it might be better to focus on higher end concept art which includes colour, composition of a scene and other elements which will help speed up the design process.

Just as I was getting to wrap up this machine learning mini project and write this blog article, Ars wrote about Stable Diffusion. I gave Stable Diffusion a go after reading that and with it basically leapfrogged all the previous experiences I’d had with GANs for the past few weeks. Using pre-trained weights I had to sign up to download, I was able to use two included scripts txt2img and img2img to produce concept art that’s perfect for informing existing work.

One of the uses I wanted from an image generating AI was getting fashion design ideas, so I asked it for runways in the style of a few famous designer. While the faces it produced were a bit janky, it seemed to nail the assignment in terms of understanding the design aesthetic of a designer (to some extent! please don’t send me hate mail fashion people).

“Dolce and Gabbana runway” using Stable Diffusion txt2img

Open Studios has multiple number of projects on the go which require concept art. Using txt2img I was able to get designs for an advanced Ancient Visayan golden city which seems to look South American, but that’s ok because that’s part of the story in Kamigen anyway.

A pretty decent take on the concept, although distinctly lacking in Visayan cultural elements likely because it wasn’t a significant part of the model’s dataset.

Using some hand drawn art for another project, I was able to generate some anime posters in the style of Studio Ghibli and “a popular anime series”.

Original drawing and it’s translations using Stable Diffusion img2img “Popular anime series poster” and “Studio Ghibli film poster”

Using PyTorch’s built-in TensorBoard to automatically generate views for all of the testing was handy however it was limited by whatever each framework implemented and that also prevented me from having a quick and straightforward way to compare them for benchmarking purposes. I’m keen to build the studio an internal frontend which Garrett and I can use when coming up with new ideas. This UI would connect to some kind of service that runs Stable Diffusion scripts remotely.

--

--

Paul Brzeski

Sharing my opinion and passions about the many things in life.