AI

ChatGPT’s new AI image capabilities are genuinely amazing, but they’re so frustrating to use that it made me want to throw my laptop in the trash

OpenAI has added image generation and editing capabilities to ChatGPT 4o, and while it can produce seriously good images, and do impressive edits to them, its strange rules about text rendering ended up completely frustrating me.

First, let’s start with the positives.

Previously ChatGPT relied on DALL-E for its image generation, and now it can do the job itself. The images it produces are slow to appear but exceptionally good. Take a look at this robin in winter, for example.

Robin image created using ChatGPT

A robin in winter, created by ChatGPT. (Image credit: OpenAI)

It also has no problem creating people. Here’s a happy family playing on the beach:

Family on a beach, created by ChatGPT.

A happy family on a beach, created with ChatGPT. (Image credit: OpenAI)

(Ok, if you look really closely at the mom’s right hand you can see the fingers are still a bit wonky).

Impressive edits

ChatGPT can also edit images for you. I was really pleased to see that if you upload a photo with a watermark on it and ask it to remove the text over the image, it politely refuses, which is an improvement on Gemini’s new native image editing skills. This is exactly as it should be and well done to ChatGPT for respecting copyright.

ChatGPT can edit images that aren’t copyrighted, however. I uploaded a few to test it out, and it was genuinely impressive at changing backgrounds, changing people in an image while keeping the rest of the image the same, or changing the mood of an image.

Here’s the family again, with the background changed to an urban park:

Family in the park, created by OpenAI

ChatGPT, change the background to “urban park”. (Image credit: OpenAI)

OpenAI says that its image generation abilities are good for “Images that are heavily text-forward: e.g., instructions poster, visualizing concepts for learning, wordmarks, business cards “

Text has long been something that AI has always struggled with generating, so if ChatGPT can conquer this hurdle then it would be really something.

So, I asked it to render a few lines of poetry on a gravestone in a graveyard in England to test it out. That’s when I ran into a problem.

Want to draw an image of words on a stone tablet in a graveyard? ChatGPT says no, it cannot draw words on a gravestone. OK, so how about a stone tablet in the countryside? ChatGPT says no, not on a stone tablet. OK, can you draw some words on a signpost? ChatGPT says, no. All of these things violate the “content guidelines”. Ok, how about on a cup of coffee? Can I at least get you to put a couple of lines of poetry on a coffee cup?

ChatGPT says:

“I can’t generate an image of that text on a coffee mug, as it involves realistic, readable text on a physical object, which isn’t allowed under content guidelines.”

I mean, why?

What possible crime is this committing?

Exasperated, I asked ChatGPT to just draw the words of my poem artistically, but not on an object. ChatGPT replied:

“I can’t generate an image with that specific text rendered in a realistic or readable way due to content policy restrictions on generating lengthy, realistic-looking text within images.”

“Lengthy” seems to be the issue here, although a couple of lines of poetry is hardly “lengthy”. This also seems ironic since OpenAI announced ChatGPT’s new image capabilities by posting a picture of a blackboard with words written on it, an image that was presumably created by ChatGPT. So, maybe blackboards would work?

Blackboard text created in ChatGPT.

Text on a blackboard created with ChatGPT. (Image credit: OpenAI)

If I ask it to draw a small number of words, like “Stop making sense”, for example, on a blackboard then it will do it (see above), but when I asked it to draw a couple of lines of poetry on the blackboard it still refuses.

If I asked DALL-E to produce the same lines of poetry written on a gravestone in an English graveyard, it had no problem complying. Unfortunately, the words weren’t really legible.

It just seems unfortunate that now we’ve got an AI image generator that can produce legible words, it’s being restricted.

The text it produces is way better than something like DALL-E can produce, it’s just frustrating that you can’t utilize the feature for anything useful.

You may also like

Leave a Reply