Once upon an epoch…

Artificial Intelligence has been famously good at certain tasks. For instance, industrial tasks: Does this potato chip coming out of the factory resemble the perfect chip close enough? No? Discard it immediately. Or, tasks involving procedural knowledge: Drive this Tesla from Amsterdam to Paris. Do so optimally, and most safely. 

Why is AI so good at these tasks? Simple. They are hyper focused in very lucrative domains that save companies many millions of dollars. Over the last couple of years these tasks have been optimised by those who saw the low hanging fruit early enough. However, just because one fruit has been plucked, it does not mean there are none left… 

Creating creative content – combining different sources to create something novel – was famously decried as something machines cannot do. Coming from an era where machines were confined to niches and niches only, this sentiment can be understood. But, Times are A-Changin’.

Humans can create, they can combine skills and knowledge from different niches. Are you good at drawing and storytelling? Do you have particular knowledge about superheroes and digital art apps? What are you waiting for? Go create a comic book already!

What exactly makes that oh so impossible for an AI? Historically, combining multiple AI systems together, even if they were very good at their tasks, was challenging for many technical reasons. At AI Heroes, we believe that the times are ripe to tackle these reasons. Once you break down tasks into their niches one can focus AI methods on each of them and combine them for marvellous outcomes. In fact, our Heroic API – with its densely optimised broad range of functions – is made for exactly such reasons.

We are starting off ambitiously! We want an AI to create a visual story. Something indeed like a comic, but more malleable, more adaptable … more intelligent. A comic generally exists of some text and images that fit with the text. Naturally, we will focus on these components, these niches, first.

First off, we need some text. Text generation is an age-old topic in the field of AI. There exist many attempts throughout history to generate realistic and coherent narratives. Many interesting and funny examples can be found when looking for them. One of my favourites is this generated chapter of Harry Potter by Botnik. It was created by training an AI on the text of all seven Harry Potter novels. The results are as remarkably coherent as they are hilarious.

We will, however, not start with an entire chapter of text. We will confine ourselves to a few sentences. Each sentence should then get its own visual aspect. For our text, we use our Heroic Text Generator, which is part of our Heroic API. With it, we are able to produce tiny stories such as the following.

A ferret, a rat and a mole walk into a bar.
The bartender looks at them oddly before asking what they want.
“Bar” says the ferret.
“Beer” says the rat.

Our Heroic Story Generator

Now, you might be wondering why the ferret wants a bar or what happened to the mole. But, ask yourself, is it really the fault of the AI if you don’t understand its intricate labyrinth of artistic freedom? We will work on limiting the infinite wisdom of our Heroic Text Generator so that we can get a glimpse of its true meaning, but in the meantime, this will do!

Next up, we need to generate images from the given text. Luckily, that is exactly what our Heroic Image Generator is for! It can take text and convert it into images. Think about handing it the word ‘apple’ and expect something that looks like an apple in return. So, we went ahead and fed the entire story to the image generator:

An image generated with the whole story

Oof – that’s not particularly good. Indeed one can see shapes and forms resembling rodent-like animals. What went wrong here? Directly using the text for the Image Generator seems to not be enough – by venturing out into the unknown you always come back with new knowledge! What is missing? And, how can we improve?

Consider the making of a movie. It is not enough to simply take an existing book and use that to direct a movie. No, you need someone who does not just describe the story, but also the visual aspect – for instance the tone and setting of a scene – you need a screenwriter. Akin to a screenwriter, we need an AI that can produce a visual description from the given text. Think of it like an extra layer of creative interpretation.

We therefore developed an AI solution that takes the story and formulates a screenplay – or, at least the AI version of that… It filters out the most important phrases of the story, to use as input for the Image Generator. It also tries to find the perfect style for the story and applies that to the images as well. Want it to look like a comic? No problem. Should it resemble more a children’s storybook illustration? Done. Maybe it should look like it was shot with an early 2000s phone camera? You wouldn’t want that…

This quasi screen play can be fed again into our Heroic Image Generator and the results speak for themselves.  We just split the mini story into 4 parts and use those images. Now, important subjects of the story are highlighted more, and there is a consistent style across the panels. Here are the first two sentences as examples!

Left: image generated for “A ferret, a rat and a mole walk into a bar.”
Right: for “The bartender looks at them oddly before asking what they want.”

Great! All that is left is putting it all together neatly. We arrange the images one after the other in a nice adjustable grid and then we impose the text from our initial Text Generator.

Before revealing the final output, let’s briefly go over what we have learned. Combining niches that are well established in the field of AI can yield remarkable results. In our case, we combined Text Generation with Image Generation. Working on such combinations of niches to create bigger, better systems will become increasingly important for various fields in the industry – and AI Heroes is ready.

This first version of our Heroic Story Generator can find many applications already. Think about creating mood boards, or short visual marketing campaigns. Rapid prototyping of styles enables quick access to inspiration for designers and marketers. The time saved with such tools translates directly into reduced costs for businesses, no matter the size.

The final story generated by our AI!

Author avatar
Maximilian Velich

Post a comment

Your email address will not be published.

We use cookies to give you the best experience.

We are Hiring!
📢 👥

Do you want to become part of our team of heroes? Then join us!

Become a Hero