After much anticipation, Google’s next-generation large language model, PaLM 2, was unveiled to the world at this year’s Google I/O. Despite its numerous enhancements and features, it fell short of dethroning OpenAI’s reigning champion, the GPT-4 model.
PaLM 2 boasts impressive capabilities, like multilingual proficiency, advanced reasoning tasks, and code and math functionality. It also comes in a variety of sizes – from the pocket-sized Gecko to the powerful Unicorn. The new model was built using compute-optimal scaling, making it smaller yet more efficient than its predecessor. However, despite these advancements, the model has faced criticism for its overall performance.
Professor Ethan Mollick from the Wharton School, a known expert in innovation, startups, and AI, conducted various informal language tests on PaLM 2. His findings? The Google model trails behind GPT-4 and Bing in several areas.
Google Bard has a new AI model, and the whitepaper Google released focuses mostly on translation and math (where it seems to beat GPT-4) and coding (where it doesn't).
In my various informal language tests I run on every model (write a sestina, etc.) it trails GPT-4 & Bing a lot pic.twitter.com/iEmKovwfeG
— Ethan Mollick (@emollick) May 11, 2023
According to Mollick, the model failed the “Apple Test,” a test that assesses the model’s ability to generate sentences ending with a specific word, in this case, “apple.” GPT-4 had previously scored over 80% on this test.
Moreover, PaLM 2 showed less competence in some unconventional language tasks. For instance, when tasked to write a serious corporate memo justifying unusual premises, such as “the floor is now lava,” or “promotions will now be decided based on staring contests,” PaLM 2 underperformed. This suggests that GPT-4 may still be superior when it comes to understanding and generating creative or unconventional text.
Mollick also noted that PaLM 2 seemed worse at incorporating web searches into its responses, another area where GPT-4 has shown impressive performance.
While the technical report released by Google emphasises the strengths of PaLM 2, it also acknowledges that the model has some weaknesses. While it excels in areas like translation, linguistics, and mathematics, it is generally considered less “smart” overall than GPT-4. In the coding domain, comparisons are hard to make, but there’s a suggestion that PaLM 2’s specialized coding model may be less efficient than GPT-4’s.
Moreover, while the report extensively discusses issues of bias, toxicity, and gender-related issues, it lacks detail on broader AI impacts, contrasting with OpenAI’s comprehensive approach to these topics.
Looking to the future, Google announced the development of Gemini, a model designed to rival GPT-5. This new model will be multimodal, highly efficient at tool and API integrations, and will enable innovations like memory and planning. However, given the performance of PaLM 2, it remains to be seen if Google can truly outperform OpenAI’s models.
While Google’s PaLM 2 showcases impressive capabilities and potential for diverse applications, it disappointingly falls short of surpassing the older GPT-4 model by OpenAI. As the world of AI continues to evolve, it’s clear that the race to develop the most advanced large language model is far from over.