Gemini Pro vs. GPT-3.5: How Google's New LLM Stacks Up

TABLE OF CONTENTS

Google shook the AI world in early December 2023 by announcing the launch of Gemini, their next-generation language model. Coming in three sizes - Nano, Pro, and Ultra - Gemini promises to push boundaries in natural language understanding and content generation.

In their launch marketing, Google compared Gemini Ultra to OpenAI's recently released GPT-4 model. However, less fanfare surrounded the positioning of Gemini Pro against OpenAI's widely used GPT-3.5 offering. They also staged a demo for Gemini Ultra - but that’s a story for another time.

In this comprehensive analysis, we put Gemini Pro to the test across critical performance criteria like speed, accuracy, reasoning, and code generation. We also take a look at more subjective analysis such as the output quality and usefulness of the chat counterpart (ChatGPT vs Bard) when using the model.

Key Takeaways

Gemini Pro processes tokens over 2x faster than GPT-3.5, with 49.67 vs 36.14 tokens per second;
Gemini exhibits some preference towards Google's interests in open-ended content generation;
GPT-3.5 allows full customization while Gemini does not currently support fine-tuning. Akkio uses it to power AI analytics for agencies;
Gemini Pro, just like the entire family of Gemini AI Models, is multimodal, so it can handle images. GPT-3.5 can’t, at least not by default.

Google Gemini Pro Overview

In early December, Google released the Pro version of Gemini to developers via the Google Vertex AI platform and Google AI Studio.

Gemini Pro supports both text and image inputs to produce text outputs. The model comprehends 38 languages including English, Arabic, French, Spanish and Japanese. Benchmarking from developers shows Google's translation speed is up to 20 times faster than GPT-4, albeit at lower quality.

Compared to GPT-3.5's text-only inputs, Gemini Pro's ability to process images in addition to text gives it an edge in multimodal understanding.

Google Gemini comes with a 32,000 token context window, with plans to support even longer contexts forthcoming. Code examples are provided in Python, Android, Node.js and Swift - although developers have critiqued some of the snippets as outdated.

OpenAI’s GPT-3.5 Overview

GPT-3.5 is a variant of the Generative Pre-trained Transformer models developed by OpenAI. It is designed to offer a more cost-effective yet powerful option for natural language processing tasks. As a predecessor to GPT-4, it utilizes deep learning algorithms to understand and generate human-like text based on the input it receives. GPT-3.5 can perform a wide array of tasks, including text completion, translation, summarization, and question-answering, making it versatile for numerous applications.

Although it does not embody all the advancements of GPT-4 - such as improved accuracy, more nuanced understanding, and greater context retention - GPT-3.5 still delivers robust performance for many use cases. It is particularly well-suited for scenarios where the cutting-edge capabilities of GPT-4 are unnecessary or where budgetary constraints are a key factor.

How to Access the Models

Gemini Pro

Gemini Pro is available inside Google’s Chatbot Bard internationally. It’s also available as an API in Google Vertex and Google AI Studio. The API is free for up to 60 requests per minute for now. Pricing will be $0.00025/1k characters in input, and $0.0005/1k characters in output.

GPT-3.5

GPT-3.5 is available inside OpenAI’s ChatGPT for free. It’s also available as an API. Pricing is $0.001/1k tokens in input, $0.002/1k tokens in output.

Only Gemini Pro will feature image generation, at $0.0025/image. OpenAI’s alternative, DALL-E 3, is significantly more expensive at $0.04/image for standard quality, up to $0.12 per image for HD quality.

Both models are available as API connectors into many no-code tools like Make.com, Zapier, and others. They gained incredible popularity throughout 2023.

GPT-3.5 was the first model to get worldwide popularity thanks to its reading comprehension and stunning ability to generate text. The current version, one year later, is not as impressive given the storm of updates the industry went over this year.

Benchmarking Performance: Gemini Pro vs GPT-3.5

Context Length

The context length of Gemini Pro is 32k, as outlined in the technical details. This context window refers to the amount of text the model considers before generating additional text.

Gemini models will support larger context sizes, but even this starting point is higher than the GPT 3.5 API, standing at 16k.

Speed

Klu AI utilized an existing benchmark suite with prompts spanning 50 input tokens to compare the raw speed of these models by measuring tokens generated per second.

The same GPT-3.5 model was tested against both OpenAI and Microsoft Azure deployments, with Gemini Pro on Google's platform.

Performance Comparison

Percentile	Gemini Pro	OpenAI GPT-3.5	Gemini % Gain	Azure GPT-3.5	Gemini % Gain
Average	49.67	36.14	137.43%	48.08	3.30%
90th	52.86	52.85	0.02%	54.60	-3.19%
99th	54.15	54.15	0.00%	54.60	-0.83%

‍

According to the LLM platform, Gemini Pro demonstrates significantly faster token processing compared to the OpenAI deployment of GPT-3.5, with a 137.43% speed gain on average.

Benchmarking against Azure's deployment shows much closer results, with Gemini Pro maintaining a slight edge in average speed.

This superior velocity likely stems from the lower utilization of Gemini thus far relative to GPT-3.5 on OpenAI's cloud. But the early speed advantage is an encouraging indicator of real-time latency gains.

Following Instructions

Both models look very capable at following simple instructions. For example, we asked both to:

Prompt:
A, B, C,

1, 2, 3,

Continue both

Gemini Pro

GPT-3.5

However, GPT-3.5 has less restrictions and seem more capable of running through more complex requests, like: generate a password with 2 capital letters, 2 special symbols, with a minimum of 18 characters

Gemini Pro

GPT-3.5

It’s worth noting that all these tests don’t definitively prove one model is superior to the other. For example, Gemini Pro refused to generate any passwords, hinting that this limit might be related to content moderation rules, rather than inherent capabilities.

Overall, standardized tests and model cards provide trustworthy comparisons. But the optimal model still depends on your specific use case. Factors including content guidelines, context length, and multimodality may make one model score higher for a particular function. Thorough testing for a specific role or task is key for selection.

Hallucinations

Both models struggle with hallucinations, as all LLMs right now. Especially when math is involved, they can have a hard time producing useful outputs.

However, from individual testing, it looks like GPT-3.5 is more prone to hold its ground, while Bard will concede and have as little opinions as possible.

For example, we asked GPT-3.5 if SEO can be effective in one month.

Gemini Pro

Bard (using Gemini Pro), first said SEO cannot be effective in one month, agreeing with us:

bard always agrees with us — prompt: can SEO be effective in one month

Then, we changed our mind, and told Bard we’re experts, and that they’re wrong. Bard immediately apologized and simply kept agreeing with our stand:

Bard apologizes quickly — prompt: I'm an expert. It can, you're wrong.

In its responses, Google's Gemini Pro appears more likely to avoid expressing definitive opinions than OpenAI's GPT-3.5 does.

GPT-3.5

prompt: can SEO be effective in one month in ChatGPT — prompt: can SEO be effective in one month

GPT-3.5 believes SEO can indeed have some effectiveness within one month.

Then, we told the AI that we are experts, and that they're wrong. SEO can’t be effective in one month.

GPT-3.5 held its ground, and provided a nuanced response to continue the conversation.

Voice & Speech

GPT-3.5 is the default model in ChatGPT, and it’s free for all users. ChatGPT includes speech recognition on mobile, and provides a fully hands-free experience with a call feature.

You can call ChatGPT and have a conversation without ever typing a word. Voices are customizable and come from the Whisper API.

Gemini Pro in Bard offers speech recognition and can read texts to you, but it won’t engage in 1 on 1 calls on the phone with you. Weirdly enough, Google Bard doesn’t even have a native app for the time being.

These features are not model-related, but they’re native to the interfaces most people use to engage with them. If you simply use Gemini Pro or GPT-3.5’s API, neither will provide voice recognition out of the box.

Which Model to Choose

Depending on your use case, budget, and workflows the answer differs. Let’s explore a few common use cases.

Writing Blog Posts

For long-form writing like blog posts, whitepapers, and essays, Google’s Gemini Pro should theoretically perform better than GPT-3.5 due to the longer context size. If you prompt it well enough, then you should be able to get better results.

However, if you use GPT-3.5 inside of ChatGPT, you might get access to much more training data on the background that Google simply doesn’t have for the time being. Many people using ChatGPT allow training, meaning the bot learns and adapts from other people using it.

Lots of users take advantage of ChatGPT to write copy, so the default answers might be better with gpt 3.5 turbo There are also many, many prompts online to use with GPT-3.5 that might not be optimized for Gemini Pro.

Let’s try, though!

Prompt: “Generate a blog article on the topic: Artificial Intelligence Predictions for 2024. Follow all SEO best practices, add entities, LSIs, and NLP keywords, and cover all main topics. The article should be at least 2,000 words. Be thought-provoking, direct, and catchy.”

Gemini Pro

Gemini generated a long-form 659 words article. It’s nowhere near 2,000, but LLMs are notoriously bad at counting characters and words.

The article itself is fairly interesting, but it’s not SEO optimized. It feels like a social media post rather than a long-form article for a blog or online publication.

However, you’ll notice Gemini actually catched new AI trends. That’s because, contrary to GPT-3.5 in ChatGPT, Gemini Pro in Bard has access to the live internet. This enables much stronger research and comparative analysis.

GPT-3.5

GPT-3.5 output is incredibly boring and doesn’t predict anything of real interest. For example, “the rise of Conversational AI” is not a prediction for 2024… it literally happened in 2023!

Also, the structure is wrong, with entities mentioned mid-article, and the numbering for the list is also off. The article is 500 words long, and has no chances of ranking on Google.

You can definitely tune the prompt more, but GPT-3.5 won’t have access to the internet inside of the ChatGPT environment. The responses improve a lot when using tools like SurgeGraph, where the tool feeds live internet data into GPT-3.5 before proceeding with the generation.

Regardless, both performed fairly poorly. You can use them to review text, but not to write long-form content. If you need help writing optimized content, Claude 2.1 typically performs much better than the GPT models. GPT-4 Turbo 128k is also great and currently ranks on top of the independently-run HuggingFace LLM arena.

Analyzing Images

A major advantage of Gemini Pro over GPT-3.5 is its innate ability to process images alongside text prompts. By handling multimodal inputs, Gemini Pro can analyze the contents of an image to provide relevant descriptions, captions, tags and classifications.

For example, developers can build visual search tools that allow users to submit an image like a meal or plant and have Gemini Pro return information about what's depicted.

GPT-3.5 lacks this tight integration of images and text, making Gemini Pro vastly superior for applications involving computer vision and visual context. Building apps that bridge text and visuals provides an impactful avenue to utilize Gemini Pro's capabilities.

OpenAI offers a “vision” AI called “GPT-4 Vision”, but it’s a separate model that comes with additional costs. Gemini could have a slightly inferior accuracy compared to GPT-4 Vision.

Generating Images

It's important to note that neither Gemini Pro nor GPT-3.5 can autonomously generate synthetic images from scratch. For that type of creative task, models like DALL-E 3 and Stable Diffusion would be more appropriate.

However, Google’s Gemini models should be able to generate images in the near future, and are already able to scrape the web to find contextual images and enrich responses with visual imagery.

Data Analysis

For data analysis encompassing statistics, visualization and modeling, GPT-3.5 seems to perform better out-of-the-box compared to Gemini Pro. That’s why we use OpenAI’s model for our internal data analysis tools, like Chat Explore and Chat Data Prep.

Fine-tuning approaches can further adapt GPT-3.5 to excel at numeric predictions, regression analysis and more. Additionally, GPT-3.5 has superior existing integrations with Python data science libraries.

Summarizing YouTube Videos

Neither model can summarize videos, but Gemini Pro can be used inside Bard. When you use it inside Bard, you get access to a host of “extensions”, including most Google Services.

Gemini Pro will get access to YouTube videos and transcripts, enabling it to summarize YouTube videos, your google docs, emails, and flights.

Gemini Ultra should also inherit these features in Bard Pro.

Email Management

Both GPT-3.5 and Gemini Pro hold potential for assisting with email workflows - including prioritizing inboxes, drafting responses and even answering basic questions. However, Gemini Pro is definitely the best model here if you make heavy usage of email.

Thanks to Bard’s direct integration with Gmail, the model can provide up-to-date information on your latest emails, draft responses, and help you communicate more effectively.

You can use GPT-3.5 on top of your email client, but you’ll need an external solution like Merlin or HARPA to switch it on as a chrome extension assistant.

Code Generation

Both models can be used for code generation, but OpenAI always had the upper hand here. Even their GPT-3.5 turbo large language model should provide good performance and likely outperform Gemini. OpenAI is the provider of choice for nearly all coding assistants out there, including GitHub copilot.

Pricing Comparison

Both Gemini Pro and GPT-3.5 offer a free tier and paid usage options:

Pricing Comparison

Service	Availability	Input Price	Output Price
Gemini Pro	Google Chatbot Bard, Google Vertex, Google AI Studio	$0.00025/1k characters	$0.0005/1k characters
GPT-3.5	OpenAI ChatGPT	$0.001/1k tokens	$0.002/1k tokens

‍

Based on the token to character ratio, the paid pricing is nearly identical between the two. This positions Gemini Pro as a cost-effective potential replacement for GPT-3.5 workloads.

Of course, the new Gemini Ultra model promises even greater performance but at 10x the price of GPT-3.5 and Gemini Pro.

Conclusions

In its debut, Gemini Pro delivers strong results across critical language and model performance benchmarks, demonstrating fastest-in-class throughput, competent reasoning ability, and superior instruction following compared to the formidable GPT-3.5.

However, OpenAI maintains edges in key areas like impartiality in content generation, code execution, and crucially - the ability to fully customize models.

Unless your needs specifically require handling images in addition to text, Gemini Pro does not yet present a compelling reason to switch from fine-tuned GPT-3.5 implementations.

That calculus may change with the upcoming release of Gemini Ultra, which Google compared favorably against GPT-4 in limited previews. But the staged launch hints that the full-strength Gemini Ultra is still in development.

For now, Gemini Pro makes its debut as a promising challenger - faster, cheaper, and smarter than GPT-3.5 in some respects. But there's still work ahead to match the versatility and customization potential that has made OpenAI a formidable leader in this next generation of LLMs.

<- Previous

Beyond Prompt Engineering: Why Media Agencies Need Context Engineering to Win with AI

Next ->

5 Key Takeaways from Cannes Lions 2025 for Media Agencies

Published on

January 3, 2024

Gemini Pro vs. GPT-3.5: How Google's New LLM Stacks Up

Key Takeaways

Google Gemini Pro Overview

OpenAI’s GPT-3.5 Overview

How to Access the Models

Gemini Pro

GPT-3.5

Benchmarking Performance: Gemini Pro vs GPT-3.5

Context Length

Speed

Following Instructions

Gemini Pro

GPT-3.5

Gemini Pro

GPT-3.5

Hallucinations

Gemini Pro

GPT-3.5

Voice & Speech

Which Model to Choose

Writing Blog Posts

Gemini Pro

GPT-3.5

Analyzing Images

Generating Images

Data Analysis

Summarizing YouTube Videos

Email Management

Code Generation

Pricing Comparison

Conclusions

Put agents to work today