Today is my last day of vacation for this summer.
I had more or less important stuff planned, but yesterday I learnt about stable diffusion, "a machine-learning model to generate digital images from natural language descriptions", which happens to be open source, including its trained weights. It's also pretty unfiltered, making it possible to generate so-called "deep fakes", erotica and/or using trademarked things, for extra fun. Being the nerd that I am, I couldn't help but trying it out today and ended up doing that in a couple hours:
What do we see on this image? A screenshot of Dino showing an XMPP group chat, with a bot named t1000, and my dad (robi) prompting the bot to generate an image (well, 2 actually).
Since this type of bot is quite fun to play with, I thought it would be nice to share how I did this. Disclaimer: this is a very quick'n'dirty method to get this bot running, it's (very, very) far from being as advanced as this "similar" (hum) discord bot.
Before even thinking about a bot, I needed to try to generate some images "the hard way". It turned out to be pretty simple and I expected this part to be a lot more painful and time-consuming.
I followed the instructions from the official github repo, which included downloading the trained weights from hugging face.
Unfortunately, it turned out that a RTX 3080 Ti with its 12GB of VRAM was not enough for this base version, but a quick look at this github issue made me land on this fork which happens to have scripts for GPUs with only 4GB of VRAM. There are probably a lot of other alternatives to generate images on limited hardware, but this one worked for me, so I did not look further.
After copying the ./optimizedSD
folder in the stable diffusion original source, I could run inferences like that:
python optimizedSD/optimized_txt2img.py \ --prompt "Luffy with a guitar" \ --H 512 --W 512 \ --seed 27 \ --n_iter 2 \ --n_samples 10 \ --ddim_steps 50
which filled a folder with these images in about 3 minutes:
There are about a million things that are wrong with this implementation, but hey, it let my non-techie friends play with this crazy shit, so my goal was met.
import asyncio import logging import random import tempfile import time import re from argparse import ArgumentParser from pathlib import Path import aiohttp import slixmpp from slixmpp import JID # This is where you git cloned stable diffusion CWD = Path("/home/nicoco/tmp/stable-diffusion/") # This is the command you used to generate images. # Here we only generate 4 images by prompt. CMD_TXT2IMG = [ "/home/nicoco/.local/miniconda3/envs/ldm/bin/python", "/home/nicoco/tmp/stable-diffusion/optimizedSD/optimized_txt2img.py", "--H", # height and weights must be multiples of 64 "512", # more pixels = more VRAM needed "--W", "512", "--n_iter", # n_output_img = n_iter x n_samples "1", # n_samples = faster, but more VRAM "--n_samples", "4", "--ddim_steps", "50", "--turbo", # remove these "--unet_bs", # 3 last lines "4", # for lower VRAM usage ] CMD_IMG2IMG = [ "/home/nicoco/.local/miniconda3/envs/ldm/bin/python", "/home/nicoco/tmp/stable-diffusion/optimizedSD/optimized_img2img.py", "--H", # height and weights must be multiples of 64 "512", # more pixels = more VRAM needed "--W", "512", "--n_iter", # n_output_img = n_iter x n_samples "1", # n_samples = faster, but more VRAM "--n_samples", "4", "--ddim_steps", "50", "--turbo", # remove these "--unet_bs", # 3 last lines "4", # for lower VRAM usage ] async def worker(bot: "MUCBot"): # This will process requests one after the other, because # chances are you can only run one at a time with a # consumer-grade GPU. # Adapted from the python docs: https://docs.python.org/3/library/asyncio-queue.html q = bot.queue while True: msg = await q.get() start = time.time() # this should be replaced with a proper regex… prompt = re.sub(f"^{bot.nick}(.)", "", msg["body"]).strip() # let's call an external process that spawns # another python interpreter. quick and dirty, remember? try: url = bot.images_waiting_for_prompts.pop(msg["from"]) except KeyError: proc = await asyncio.create_subprocess_exec( *CMD_TXT2IMG, "--prompt", prompt, "--seed", str(random.randint(0, 1_000_000_000)), # random=fun stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, cwd=CWD, ) stdout, stderr = await proc.communicate() else: with tempfile.NamedTemporaryFile() as f: async with aiohttp.ClientSession() as session: async with session.get(url) as r: f.write(await r.read()) proc = await asyncio.create_subprocess_exec( *CMD_IMG2IMG, "--prompt", prompt, "--seed", str(random.randint(0, 1_000_000_000)), # random=fun "--init-img", f.name, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, cwd=CWD, ) stdout, stderr = await proc.communicate() print(stdout.decode()) # print, the best debugger ever™ print(stderr.decode()) # This retrieves the directory where the images were written # from the process's stdout. Yes, it is very ugly. output_dir = CWD / stdout.decode().split("\n")[-3].split()[-1] print(output_dir) q.task_done() bot.send_message( mto=msg["from"].bare, mbody=f"Result for: '{prompt}' (took {round(time.time() - start)} seconds)", mtype="groupchat", ) for f in output_dir.glob("*"): if f.stat().st_mtime < start: continue # only upload latest generated images url = await bot["xep_0363"].upload_file(filename=f) reply = bot.make_message( mto=msg["from"].bare, mtype="groupchat", ) # this lines are required to make the Conversations # Android XMPP client actually display the image in # the app, and not just a link reply["oob"]["url"] = url reply["body"] = url reply.send() # This part is basically just the slixmpp example MUCbot with some # minor changes class MUCBot(slixmpp.ClientXMPP): def __init__(self, jid, password, rooms: list[str], nick): slixmpp.ClientXMPP.__init__(self, jid, password) self.queue = asyncio.Queue() self.rooms = rooms self.nick = nick self.add_event_handler("session_start", self.start) self.add_event_handler("groupchat_message", self.muc_message) self.images_waiting_for_prompts = {} async def start(self, _event): await self.get_roster() self.send_presence() for r in self.rooms: await self.plugin["xep_0045"].join_muc(JID(r), self.nick) asyncio.create_task(worker(self)) async def muc_message(self, msg): if msg["mucnick"] != self.nick: if msg["body"].lower().startswith(self.nick): await self.queue.put(msg) self.send_message( mto=msg["from"].bare, mbody=f"Roger that: '{msg['body']}' (queue: {self.queue.qsize()})", mtype="groupchat", ) elif url := msg["oob"]["url"]: self.send_message( mto=msg["from"].bare, mbody=f"OK, what should I do with this image?", mtype="groupchat", ) self.images_waiting_for_prompts[msg["from"]] = url if __name__ == "__main__": parser = ArgumentParser() parser.add_argument( "-q", "--quiet", help="set logging to ERROR", action="store_const", dest="loglevel", const=logging.ERROR, default=logging.INFO, ) parser.add_argument( "-d", "--debug", help="set logging to DEBUG", action="store_const", dest="loglevel", const=logging.DEBUG, default=logging.INFO, ) parser.add_argument("-j", "--jid", dest="jid", help="JID to use") parser.add_argument("-p", "--password", dest="password", help="password to use") parser.add_argument( "-r", "--rooms", dest="rooms", help="MUC rooms to join", nargs="*" ) parser.add_argument("-n", "--nick", dest="nick", help="MUC nickname") args = parser.parse_args() logging.basicConfig(level=args.loglevel, format="%(levelname)-8s %(message)s") xmpp = MUCBot(args.jid, args.password, args.rooms, args.nick) xmpp.register_plugin("xep_0030") # Service Discovery xmpp.register_plugin("xep_0045") # Multi-User Chat xmpp.register_plugin("xep_0199") # XMPP Ping xmpp.register_plugin("xep_0363") # HTTP upload xmpp.register_plugin("xep_0066") # Out of band data xmpp.connect() xmpp.process()
After installing slixmpp and aiohttp using your favorite python environment isolation tool, You can then launch it with:
python bot.py \ -j bot@example.com \ # XMPP account of the bot -p XXX \ # password -r room1@conference.example.com room2@conference.example.com \ -n t1000 # nickname
The bot can join several rooms, so you can make one that you can show your mother, and another one for your degenerated friends.
None! I don't plan to make this a more sophisticated bot. But maybe I'll just make it generate a little more than 2 images at once, since generating 2 images takes about 1 minute and generating 20 images takes about 3 minutes, there is probably a better middle ground. Especially because so many of the output images are just crap!
But I should get back to paid work now.
EDIT (2022/09/01): added img2img early in the morning because it is even more fun, changed the number of output images to 4 and tuned a few parameters to take advantage of my beefy GPU. Now, no more!
Edit