Midjourney 6 rendering of Chewbacca relaxing in a hot tub out in winter

AI Image Generators: Midjourney v6 vs DALL-E v3

Dave Gibson

In the past few months I’ve been testing AI image generators and have begun using them to create visuals for both fun and marketing. I love this new ability to envision a custom image that compliments the topic that I’m writing about. And with the recent advance of MidJourney’s version 6, AI image generation has taken yet another leap forward. After enjoying a holiday “break” during which I dove headlong into this new update to Midjourney, I thought I’d share my take on where AI image generation now stands, and compare Midjourney to it's closest compatitor, DALL-E 3.

To set the table, the primary field of players includes Stable Diffusion, Adobe Firefly v2, DALL-E v3, and the new Midjourney v6.

But before I dive into the models, allow me to start with a bit of context. AI image generation blew up this year, and we’re now getting models that are capable of rivaling custom photography. As a photographer, yes, this is disheartening, but as a marketer that creates content, it is very exciting. This is especially good for those that need images of minorities, for which stock photography is quite limited - especially of people with disabilities. 

Each model combines machine learning and creative expression to create images from written prompts. Each is far from perfect, and requires patience… a lot of patience to fine tune results to get the visual you want through trial and error. For anyone just trying, this may be a surprise, but I will say that it builds your skills in envisioning a desired image, and converting that vision into words.

Midjourney

Midjourney Inc. is a private company based in San Francisco and founded by David Holz. The first version of the model launched in its beta in March of 2022 on Discord, which is a separate platform for communication and communities. The previous version was version 5.2.

Midjourney 6 Alpha

Version 6 continues to be accessible via Discord as of this writing, but access through a more standard website is expected soon. And I can’t wait. There’s nothing fun about using Discord and I look forward to using an effective UI. 

Where MJ6 stands out is in rendering amazingly realistic images. It seems to understand composition much better and the relationship between objects in an image. 

Comparing MJ6 to MJ5.2

Midjourney 6 over 5.2 creates images that are more realistic, crisp, detailed and accurate. The nuances and depth have increased - especially for things like backlighting, reflections, and representing the natural world overally. 

Midjourney v6
Midjourney rendering of Chewbacca in hot tub
Midjourney v5.2
Midjourney v 5.2 rendering of Chewbacca in hot tub

While the 5.2 image does have fine detail, in version 6, Chewie looks like he’s really chillin’.

 

Experimenting with Midjourney v6

Over the holiday, I was in a Star Wars mood and had fun hanging with Chewie. It took much patience and trial and error to get these. It often took dozens of prompts to get an image that I really liked.

While the detail of each is impressive, what I think is most interesting is the depth, the tone, the style and composition. These are nuanced images which tell stories.

prompt: photo of tiny Pikachu and huge Chewbacca on a couch. Chewbacca trying to take game controller from Pikachu. Dark roomblog_pikachu-chewie-gaming.jpg

prompt: Little Annie character smoking a fat lit cigar from side of mouth
Midjourney rendering of orphan annie smoking a cigar

prompt: a black 2019 Volkwagen Golf GTI with snow groomer cat tracks instead of wheels. Plowing through snow heading up mountainblog_propdave_a_black_with_red_trim_2019_volkwagen_golf_gti_with_sno_6feb6359-87df-4d72-926b-6a7e690b6aa7.jpg

prompt: a photo of cyberpunk post apocolyptic roadwarrior woman with shaved head ordering a drink at a dimly lit rough bar. she has abundant tatoos and peircings. leather vest, leather boots Midjourney 6 rendering of a female cyberpunk in a bar

prompt: a photo of Pikachu meeting the Queen of England
Midjourney rendering of Pikachu meeting the Queen

Midjourney 6 Pros

  • For generating photo-real images, it’s the best. Images not only have more detail, but also more natural nuance. It gets nature and seems to lean toward taking its own poetic license. 
  • Artistic - nuanced
  • Strong community via Discord

Midjourney 6 Cons

  • Does not follow instructions well. 
  • Not iterative, you can’t respond to it and ask it to make adjustments
  • Not good at graphic design such as logo creation
  • Text generation has improved but still has a journey to take
  • Requires Discord - for now
  • User controls not as advanced as Firefly

ProTip: Use – style raw to get the most realistic images

 

 

Open AI's DALL-E 3


DALL-E, developed by Open AI, may not deliver the same level of natural realism, but where it shines is in creating graphical design and illustrations. It can also be accessed for free using Bing Image Generator or through ChatGPT-4. One thing I really like as a ChatGPT 4 Pro user is the convenience of having these two combined. And not only in the same space, but what I find really interesting is that if you enter a vague prompt, it may take the opportunity to juice it up (see the following example), offering more details to get a better image back from DALL-E. I also really like the iterative process, which allows you to make adjustments in subsequent prompts, like add or remove objects, increase styling, or anything you dream up. This is hopefully the direction that Midjourney will take with its new website. 

DALL-E enhanced prompt with followup
Screenshot of DALL-E prompt thread demonstrating how Chat GPT enhances a prompt

DALL-E’s usage restrictions may be too much for some as well. For instance, I wanted it to create an ironic image of Orphan Annie smoking a cigar. DALL-E said no and suggested swapping the cigar for a lollipop. And also no-go for Chewie in a hot tub.


DALL-E alternative rendering of Orhan Annie smoking a cigar
DALL-E rendering of a little girl with red hair holding a lollipop

DALL-E alternative rendering of Chewbacca in a hot tub
Screen shot of DALL-E UI explaining why I it won't render an image of Chewbacca, then shows a pathetic little furry alternative in a tub

DALL-E Pros

  • It follows instructions better
  • Better at graphic design
  • Better for highly saturated eye-catching social media images
  • Infusion of GPT allows it to enhance the prompt you write to give DALL-E better instructions
  • Iterative in nature that allows the user to interact with the the image

DALL-E Cons

  • Simplistic immature design style
  • Not quite capable of creating photo-realistic images
  • Displays two images, versus Midjourney’s 4 per prompt
  • Also has issues with text generation
  • Restrictive usage rules

 

Output Comparisons: Midjourney 6 vs DALL-E 3

The proof is in the pudding, so let’s see some comparisons starting with the user interface. Although note that we’re expecting Midjourney’s new website UI soon. But I do want to show the difference between the two, and also show Adobe Firefly’s interface, which is the best.

 

User Interface Comparisons for Prompting 

I'm first throwing in Adobe's Firefly UI, because it is just so good and easy to use. I hope to see Midjourney emulate this.

Adobe Firefly UI for prompting

 

DALL-E is offers nothing more than an open prompt field, and outputs two versions:

DALL-E UI for prompting

Midjourney via Discord... augh. Soon to be updated:

Midjourney UI for prompting

Comparing Renderings from DALL-E and Midjourney

First though, I want to include just one rendering from Adobe Firefly to start us off. The rest compare DALL-E and Midjourney where I trust you'll see how much better Midjourney is at creating realistic visual spaces, and how DALL-E shines in graphic design.

Comparing Adobe Firefly v DALLE-3 V Midjourney 6
Prompt : realistic photo of a brook in a lush forest in New England. Morning light streaking through the trees. A deer approaching the brook. Ferns.

< Firefly>
Firefly rendering of a brook through a forest with a deer

< DALL-E 3 >
DALL-E 3 rendering of a brook through a forest with a deer

< Midjourney 6 >
Midjourney 6 rendering of a brook through a forest with a deer

 

Comparing DALLE-3 V Midjourney 6
Prompt : realistic closeup photo of an elderly man in an urban setting, leaning against a door opening with his hand to his face smoking a cigar. No emotion. Distant look in eyes. Moody misty.

< DALL-E >
DALL-E rendering of man smoking cigar in doorway

< Midjourney >
blog_mj-man_smoking.jpg

Comparing DALLE-3 V Midjourney 6
Prompt :a graphic designed logo for a company called Buzz Coffee that sells very strong coffee 

< DALL-E >
DALL-E rendering of a logo for fictitious company Buzz Coffee

< Midjourney>
Midjourney DALL-E rendering of a logo for fictitious company Buzz Coffee

Comparing DALLE-3 V Midjourney 6
Prompt : a creative graphic design square poster for an electronic music band from the late 1990s. Flyer art style.

< DALL-E >
blog_da-flyer_poster.jpg

< Midjourney >
blog_mj-flyer-poster.jpg

 

Final Thoughts

I'm more of a photographer than a graphic designer, so I'm just blown away by the natural realism that Midjourney delivers. It gets mood. It gets color, depth, nuance and composition. It can create visual poetry. 

Considering these capabilities were developed in just a year, just imagine what 2024 will bring for AI image creation… and video… and music…

Midjourney 6 rendering of David and Chewie

 Stay tuned...

 

 

and may The Force be with you.

- Dave