Google shook the AI world in early December 2023 by announcing the launch of Gemini, their next-generation language model. Coming in three sizes - Nano, Pro, and Ultra - Gemini promises to push boundaries in natural language understanding and content generation.
In their launch marketing, Google compared Gemini Ultra to OpenAI's recently released GPT-4 model. However, less fanfare surrounded the positioning of Gemini Pro against OpenAI's widely used GPT-3.5 offering. They also staged a demo for Gemini Ultra - but that’s a story for another time.
In this comprehensive analysis, we put Gemini Pro to the test across critical performance criteria like speed, accuracy, reasoning, and code generation. We also take a look at more subjective analysis such as the output quality and usefulness of the chat counterpart (ChatGPT vs Bard) when using the model.
Gemini Pro supports both text and image inputs to produce text outputs. The model comprehends 38 languages including English, Arabic, French, Spanish and Japanese. Benchmarking from developers shows Google's translation speed is up to 20 times faster than GPT-4, albeit at lower quality.
Compared to GPT-3.5's text-only inputs, Gemini Pro's ability to process images in addition to text gives it an edge in multimodal understanding.
Google Gemini comes with a 32,000 token context window, with plans to support even longer contexts forthcoming. Code examples are provided in Python, Android, Node.js and Swift - although developers have critiqued some of the snippets as outdated.
GPT-3.5 is a variant of the Generative Pre-trained Transformer models developed by OpenAI. It is designed to offer a more cost-effective yet powerful option for natural language processing tasks. As a predecessor to GPT-4, it utilizes deep learning algorithms to understand and generate human-like text based on the input it receives. GPT-3.5 can perform a wide array of tasks, including text completion, translation, summarization, and question-answering, making it versatile for numerous applications.
Although it does not embody all the advancements of GPT-4 - such as improved accuracy, more nuanced understanding, and greater context retention - GPT-3.5 still delivers robust performance for many use cases. It is particularly well-suited for scenarios where the cutting-edge capabilities of GPT-4 are unnecessary or where budgetary constraints are a key factor.
Gemini Pro is available inside Google’s Chatbot Bard internationally. It’s also available as an API in Google Vertex and Google AI Studio. The API is free for up to 60 requests per minute for now. Pricing will be $0.00025/1k characters in input, and $0.0005/1k characters in output.
Only Gemini Pro will feature image generation, at $0.0025/image. OpenAI’s alternative, DALL-E 3, is significantly more expensive at $0.04/image for standard quality, up to $0.12 per image for HD quality.
GPT-3.5 was the first model to get worldwide popularity thanks to its reading comprehension and stunning ability to generate text. The current version, one year later, is not as impressive given the storm of updates the industry went over this year.
The context length of Gemini Pro is 32k, as outlined in the technical details. This context window refers to the amount of text the model considers before generating additional text.
Gemini models will support larger context sizes, but even this starting point is higher than the GPT 3.5 API, standing at 16k.
Klu AI utilized an existing benchmark suite with prompts spanning 50 input tokens to compare the raw speed of these models by measuring tokens generated per second.
The same GPT-3.5 model was tested against both OpenAI and Microsoft Azure deployments, with Gemini Pro on Google's platform.
According to the LLM platform, Gemini Pro demonstrates significantly faster token processing compared to the OpenAI deployment of GPT-3.5, with a 137.43% speed gain on average.
Benchmarking against Azure's deployment shows much closer results, with Gemini Pro maintaining a slight edge in average speed.
This superior velocity likely stems from the lower utilization of Gemini thus far relative to GPT-3.5 on OpenAI's cloud. But the early speed advantage is an encouraging indicator of real-time latency gains.
Both models look very capable at following simple instructions. For example, we asked both to:
A, B, C,
1, 2, 3,
However, GPT-3.5 has less restrictions and seem more capable of running through more complex requests, like: generate a password with 2 capital letters, 2 special symbols, with a minimum of 18 characters
It’s worth noting that all these tests don’t definitively prove one model is superior to the other. For example, Gemini Pro refused to generate any passwords, hinting that this limit might be related to content moderation rules, rather than inherent capabilities.
Overall, standardized tests and model cards provide trustworthy comparisons. But the optimal model still depends on your specific use case. Factors including content guidelines, context length, and multimodality may make one model score higher for a particular function. Thorough testing for a specific role or task is key for selection.
Both models struggle with hallucinations, as all LLMs right now. Especially when math is involved, they can have a hard time producing useful outputs.
However, from individual testing, it looks like GPT-3.5 is more prone to hold its ground, while Bard will concede and have as little opinions as possible.
For example, we asked GPT-3.5 if SEO can be effective in one month.
Bard (using Gemini Pro), first said SEO cannot be effective in one month, agreeing with us:
Then, we changed our mind, and told Bard we’re experts, and that they’re wrong. Bard immediately apologized and simply kept agreeing with our stand:
In its responses, Google's Gemini Pro appears more likely to avoid expressing definitive opinions than OpenAI's GPT-3.5 does.
GPT-3.5 believes SEO can indeed have some effectiveness within one month.
Then, we told the AI that we are experts, and that they're wrong. SEO can’t be effective in one month.
GPT-3.5 held its ground, and provided a nuanced response to continue the conversation.
GPT-3.5 is the default model in ChatGPT, and it’s free for all users. ChatGPT includes speech recognition on mobile, and provides a fully hands-free experience with a call feature.
You can call ChatGPT and have a conversation without ever typing a word. Voices are customizable and come from the Whisper API.
Gemini Pro in Bard offers speech recognition and can read texts to you, but it won’t engage in 1 on 1 calls on the phone with you. Weirdly enough, Google Bard doesn’t even have a native app for the time being.
These features are not model-related, but they’re native to the interfaces most people use to engage with them. If you simply use Gemini Pro or GPT-3.5’s API, neither will provide voice recognition out of the box.
Depending on your use case, budget, and workflows the answer differs. Let’s explore a few common use cases.
For long-form writing like blog posts, whitepapers, and essays, Google’s Gemini Pro should theoretically perform better than GPT-3.5 due to the longer context size. If you prompt it well enough, then you should be able to get better results.
However, if you use GPT-3.5 inside of ChatGPT, you might get access to much more training data on the background that Google simply doesn’t have for the time being. Many people using ChatGPT allow training, meaning the bot learns and adapts from other people using it.
Lots of users take advantage of ChatGPT to write copy, so the default answers might be better with gpt 3.5 turbo There are also many, many prompts online to use with GPT-3.5 that might not be optimized for Gemini Pro.
Let’s try, though!
Prompt: “Generate a blog article on the topic: Artificial Intelligence Predictions for 2024. Follow all SEO best practices, add entities, LSIs, and NLP keywords, and cover all main topics. The article should be at least 2,000 words. Be thought-provoking, direct, and catchy.”
Gemini generated a long-form 659 words article. It’s nowhere near 2,000, but LLMs are notoriously bad at counting characters and words.
The article itself is fairly interesting, but it’s not SEO optimized. It feels like a social media post rather than a long-form article for a blog or online publication.
However, you’ll notice Gemini actually catched new AI trends. That’s because, contrary to GPT-3.5 in ChatGPT, Gemini Pro in Bard has access to the live internet. This enables much stronger research and comparative analysis.
GPT-3.5 output is incredibly boring and doesn’t predict anything of real interest. For example, “the rise of Conversational AI” is not a prediction for 2024… it literally happened in 2023!
Also, the structure is wrong, with entities mentioned mid-article, and the numbering for the list is also off. The article is 500 words long, and has no chances of ranking on Google.
You can definitely tune the prompt more, but GPT-3.5 won’t have access to the internet inside of the ChatGPT environment. The responses improve a lot when using tools like SurgeGraph, where the tool feeds live internet data into GPT-3.5 before proceeding with the generation.
Regardless, both performed fairly poorly. You can use them to review text, but not to write long-form content. If you need help writing optimized content, Claude 2.1 typically performs much better than the GPT models. GPT-4 Turbo 128k is also great and currently ranks on top of the independently-run HuggingFace LLM arena.
A major advantage of Gemini Pro over GPT-3.5 is its innate ability to process images alongside text prompts. By handling multimodal inputs, Gemini Pro can analyze the contents of an image to provide relevant descriptions, captions, tags and classifications.
For example, developers can build visual search tools that allow users to submit an image like a meal or plant and have Gemini Pro return information about what's depicted.
GPT-3.5 lacks this tight integration of images and text, making Gemini Pro vastly superior for applications involving computer vision and visual context. Building apps that bridge text and visuals provides an impactful avenue to utilize Gemini Pro's capabilities.
OpenAI offers a “vision” AI called “GPT-4 Vision”, but it’s a separate model that comes with additional costs. Gemini could have a slightly inferior accuracy compared to GPT-4 Vision.
It's important to note that neither Gemini Pro nor GPT-3.5 can autonomously generate synthetic images from scratch. For that type of creative task, models like DALL-E 3 and Stable Diffusion would be more appropriate.
However, Google’s Gemini models should be able to generate images in the near future, and are already able to scrape the web to find contextual images and enrich responses with visual imagery.
For data analysis encompassing statistics, visualization and modeling, GPT-3.5 seems to perform better out-of-the-box compared to Gemini Pro. That’s why we use OpenAI’s model for our internal data analysis tools, like Chat Explore and Chat Data Prep.
Fine-tuning approaches can further adapt GPT-3.5 to excel at numeric predictions, regression analysis and more. Additionally, GPT-3.5 has superior existing integrations with Python data science libraries.
Neither model can summarize videos, but Gemini Pro can be used inside Bard. When you use it inside Bard, you get access to a host of “extensions”, including most Google Services.
Gemini Pro will get access to YouTube videos and transcripts, enabling it to summarize YouTube videos, your google docs, emails, and flights.
Gemini Ultra should also inherit these features in Bard Pro.
Both GPT-3.5 and Gemini Pro hold potential for assisting with email workflows - including prioritizing inboxes, drafting responses and even answering basic questions. However, Gemini Pro is definitely the best model here if you make heavy usage of email.
Thanks to Bard’s direct integration with Gmail, the model can provide up-to-date information on your latest emails, draft responses, and help you communicate more effectively.
You can use GPT-3.5 on top of your email client, but you’ll need an external solution like Merlin or HARPA to switch it on as a chrome extension assistant.
Both models can be used for code generation, but OpenAI always had the upper hand here. Even their GPT-3.5 turbo large language model should provide good performance and likely outperform Gemini. OpenAI is the provider of choice for nearly all coding assistants out there, including GitHub copilot.
Both Gemini Pro and GPT-3.5 offer a free tier and paid usage options:
Based on the token to character ratio, the paid pricing is nearly identical between the two. This positions Gemini Pro as a cost-effective potential replacement for GPT-3.5 workloads.
Of course, the new Gemini Ultra model promises even greater performance but at 10x the price of GPT-3.5 and Gemini Pro.
In its debut, Gemini Pro delivers strong results across critical language and model performance benchmarks, demonstrating fastest-in-class throughput, competent reasoning ability, and superior instruction following compared to the formidable GPT-3.5.
However, OpenAI maintains edges in key areas like impartiality in content generation, code execution, and crucially - the ability to fully customize models.
Unless your needs specifically require handling images in addition to text, Gemini Pro does not yet present a compelling reason to switch from fine-tuned GPT-3.5 implementations.
That calculus may change with the upcoming release of Gemini Ultra, which Google compared favorably against GPT-4 in limited previews. But the staged launch hints that the full-strength Gemini Ultra is still in development.
For now, Gemini Pro makes its debut as a promising challenger - faster, cheaper, and smarter than GPT-3.5 in some respects. But there's still work ahead to match the versatility and customization potential that has made OpenAI a formidable leader in this next generation of LLMs.