New ChatGPT dropped… GPT-4 was released today (March 14, 2023).
How do I know? Because after logging into ChatGPT, I noticed an option to try ChatGPT-4.
I’ve been using ChatGPT Plus (version 3 & 3.5) for the past couple months and my initial impression is that it can be a useful research assistant, writing analyzer, and idea generator – but it’s prone to mistakes.
Examples of mistakes I noticed: incorrectly listing white NFL coaches as black, providing false areas/dimensions/sizes of tourist attractions & countries, and failing basic math related to taxes.
(ChatGPT-3.5 failed to correctly add up monthly earning totals while preparing taxes… it was consistently off by $100 – despite multiple attempts to coax it into getting the correct total.)
What is ChatGPT-4?
ChatGPT-4 is simply a model of ChatGPT that was released by OpenAI on March 14, 2023.
According to OpenAI:
“The latest milestone in OpenAI’s effort in scaling up deep learning.”
“GPT-4 is a large multimodal model (accepting image & text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.”
In a nutshell, ChatGPT-4 is the most advanced system with safer & more useful responses, increased creativity, improved problem solving ability, and increased accuracy.
To access ChatGPT-4 you’ll need to purchase ChatGPT Plus (paid subscription) – but eventually it will probably become free for everyone.
ChatGPT-4 Basic Overview
- GPT-4 can comprehend up to 25,000 words of text in a prompt (as input)
- GPT-4 can use images as input (e.g. provide an image & ask questions about it)
- GPT-4 is ~82% less likely to respond to requests for disallowed content (vs. GPT-3.5)
- GPT-4 is ~40% more likely to produce factual responses (vs. GPT-3.5)
- GPT-4 performs significantly better than previous versions on major exams
- GPT-4 supports a system message in API that allows developers (soon users) to customize its behavior
ChatGPT-4 New Features
Accuracy boost: GPT-4 seems to significantly outperform previous version in language accuracy – across multiple languages. GPT-4 improved in English to 85.5% accuracy (MMLU) vs. GPT-3.5 at 70.1% accuracy.
Conciseness improves: GPT-4 has the ability to provide shorter, more precise responses to user queries – making it easier for users to get the information they need quickly & effectively.
Visual inputs: GPT-4 can accept a prompt of images (along with text) – which lets the user specify vision tasks (along with language). It is capable of analyzing documents with photos & text, diagrams, screenshots, etc. For example, you can submit a picture and ask GPT-4 questions about it.
Expertise levels up: GPT-4 is more of an expert across a variety of knowledge domains than previous versions. This includes: chemistry, physics, calculus, economics, statistics, biology, verbal skills, history, government, and medicine.
Editing abilities enhanced: The NYT claims that the chatbot gave a precise & accurate summary of the story nearly every time. Some even tried to fool the bot by adding a random sentence to its summary and asking if the summary is accurate – and it would note the nonsensical added sentence as problematic.
Humor depth increased: GPT-4 has enhanced ability to understand and generate humorous content, providing users with engaging and entertaining responses when appropriate. (There are some restrictions here – as it will not generate “roast” comedy that it thinks could be mean/harmful).
Advanced reasoning: GPT-4 has better reasoning skills, allowing it to draw more accurate conclusions, make better predictions, and provide more coherent responses to complex queries.
Exam scores improve: GPT-4 scored better than GPT-3.5 on most standardized tests – including: SAT EBRW, SAT Math, GRE Verbal, AP (Biology, History, Econ, Stats, Physics, Chem, Calc), LSAT, Uniform Bar Exam, etc. (It nearly aced the GRE Verbal & USA Biology Exams). (R)
Steerability: Rather than having a “fixed” writing style/tone, users can prescribe their own unique AI style/task by describing directions in a system message. This allows for a more customized API user experience (within bounds).
Less bias: David Rozado notes that political biases previously apparent in ChatGPT appear to be significantly reduced GPT-4 – such that the model acts relatively neutral and strives to provide arguments from myriad perspectives. (R)
Longer texts: GPT-4 has improved capabilities for generating and processing longer texts, maintaining coherence and relevance throughout the content. It is capable of handling over 25,000 words of text – which may help content creation, extended convos, and document search analysis (e.g. Terms of Service).
Fewer hallucinations: GPT-4 reduces hallucinations significantly compared to previous models in domains of: learning, tech, writing, history, math, science, recommendations, code, and business. Hallucinations decrease by ~10-20%+ across most domains.
ChatGPT-4 vs. ChatGPT-3.5 (Default) vs. ChatGPT-3.5 (Legacy)
Included below is a basic comparison of ChatGPT-4 with older iterations of ChatGPT 3.5 Plus Default & ChatGPT 3.5 Legacy.
Keep in mind that this comparison was made on March 14, 2023 – things may change quickly (e.g. GPT-4’s speed may increase to 4/5 or 5/5 in a few months).
Machine learning parameters
- ChatGPT-4: 100 trillion
- ChatGPT-3: 175 billion
Some claim that GPT-4 is essentially 500-fold more powerful than GPT-3 because it is built with 100 trillion (100,000,000,000,000) machine learning parameters vs. 175 billion machine learning parameters (GPT-3).
ChatGPT-4
- Reasoning: 5/5
- Speed: 2/5
- Conciseness: 4/5
This version of ChatGPT is superior over GPT-3.5 in reasoning (maxed out) and conciseness (nearly maxed out) – but is painstakingly slow as of March 2023 (sloth mode).
It excels at tasks involving advanced reasoning, complex instruction understanding, and more creativity – relative to older versions.
It can also interact with images such that you can take a picture, have ChatGPT-4 analyze it, and ask questions about the visuals.
ChatGPT-3.5 (Default or Plus)
- Reasoning: 3/5
- Speed: 5/5
- Conciseness: 2/5
Optimized for speed, available to Plus subscribers.
This version vomits walls of text in rapid-fire fashion and is great to use if speed is the top priority.
Its reasoning and conciseness are inferior to GPT-4, but it’s still excellent for basic idea generation, writing outlines, and non-technical advice/conversation.
ChatGPT-3.5 (Legacy)
- Reasoning: 3/5
- Speed: 2/5
- Conciseness: 1/5
This is the free version of ChatGPT available to everyone.
It is the older version of the Plus model – and has equal reasoning ability to the Plus version, but it is slightly less concise and significantly slower (sloth-mode).
If you paid for GPT Plus, there’s zero reason to use GPT-3.5 Legacy – as it’s inferior to the Default Plus edition.
How does GPT-4 perform on tests?
Included below are scores from GPT-4 taking exams with vision.
There are also different scores in which it took exams with “no vision” (some of these are slightly worse – but most are identical).
Good to Elite
- Uniform Bar Exam (MBE + MEE + MPT): 298/400 (90th)
- LSAT: 163 (88th)
- SAT Reading & Writing: 710/800 (93rd)
- SAT Math: 700/800 (89th)
- GRE Quantitative: 163/170 (80th)
- GRE Verbal: 169/170 (99th)
- USABO Semifinal 2020: 87/150 (99th-100th)
- Medical Knowledge Assessment: 75%
- Select AP Tests: 5 (82nd to 100th) (Art History, Biology, Environmental Science, Macroeconomics, Microeconomics, Psychology, Statistics, US Gov, US History)
- Intro Sommelier (theory): 92%
- Certified Sommelier (theory): 86%
- Advanced Sommelier (theory): 77%
- Leetcode (easy): 31/41
Room for improvement
- Codeforces Rating: 392 (below 5th)
- AMC 10: 30/150 (6th to 12th)
- AMC 12: 60/150 (45th to 66th)
- USNCO Local Section 2022: 36/60
- GRE Writing: 4/6 (54th)
- AP English Language & Composition: 2 (14th-44th)
- AP English Literature & Composition: 2 (8th-22nd)
- AP Physics 2: 4 (66th-84th)
- AP World History: 4 (65th-87th)
- Leetcode (medium & hard): 21/80 (medium) & 3/45 (hard)
Crazy things people have already done with GPT-4…
Linus (@LinusEkenstam) created a Twitter thread documenting some crazy things people had done with GPT-4 on the first day (within hours) of its release. (R)
I’m not sure the things people have done with GPT-4 are “easy” for all users – probably require a bit of background knowledge and certain tools, but it’s still impressive.
- Recreated Pong in 60 seconds: An individual recreated the game of Pong in under 1 minute.
- Improving financial transaction data: A company used GPT-4 to significantly enhance transaction data associated with their company.
- Developing 1-click lawsuits: The company DoNotPay is using GPT-4 to generate “one click lawsuits” that sue robocallers for $1,500.
- Sketch to website: GPT-4 turned a hand-drawn sketch into a fully functional website.
- Drug discovery: Identifying compounds with similar properties to existing drugs, modifying these compounds to make sure they’re not patented, purchase from supplier, etc.
- Analyze Ethereum contracts: GPT-4 is being used to analyze and pinpoint security vulnerabilities in Ethereum smart contracts. This helps prevent contract exploits and/or use exploits to your advantage.
- Dating app analysis: Someone used GPT-4 to analyze dating profiles & preferences, and determine whether the match is worth pursuing. With computer vision, you can filter for anything you want in an ideal partner.
- Coding the game Snake: GPT-4 recreated the classic game Snake (think Nokia phones) in under 20 minutes for someone with zero knowledge of Javascript.
- Google chrome extensions: An individual had GPT-4 create a chrome extension that summarizes highlighted text – despite zero coding knowledge.
Limitations of ChatGPT-4
The CEO of OpenAI, Sam Altman, noted: “GPT-4 is still flawed, still limited” but that it “still seems more impressive on first use than it does after you spend more time with it.”
ChatGPT-4 was just released, so temper expectations a bit… it’ll continue improving over time – in terms of speed, accuracy, precision, etc. – and most of the limitations below will be addressed in the near future.
- Slow AF (sloth mode): One of the biggest limitations associated with using GPT-4 is its speed. Waiting for GPT-4 to respond to a query can be painful – especially when contrasted with GPT-3.5 (rapid-fire). During hours of heavy usage, errors frequently occur – such that you’ll need to refresh. Patience…
- Usage cap: 100 requests per 4 hours. As of current, you can only make 100 query requests per 4 hours – which probably is sufficient for most. (OpenAI has stated that they will scale up and optimize over the next coming months).
- Considering the future: GPT-4 is unable to accurately predict future events or developments. While the model may provide insights or make educated guesses based on available data, it should not be considered a reliable AI for future predictions.
- Hallucinations: Like prior versions of GPT, GPT-4 hallucinates or commits factual & reasoning errors. For this reason, it’s recommended to verify any information generated by GPT-4 to ensure that it’s true/accurate before using or acting on it.
- Violent & harmful text: Although GPT-4 improved upon previous iterations in filtering violent and harmful text and/or requests – it may slip up every once in a while (such that users could be exposed to violent or dangerous information). Additionally, some may perceive the lack of violent & harmful content as a limitation.
- Knowledge cutoffs: The knowledge cutoffs for GPT-4 within ChatGPT are September 2021 (as of present). This means it will not know information from the past year or so – which can limit its ability to give advice, make predictions, or generate ideas based off of newer data.
- Remembering previous information: After experimenting with GPT-4 for the past day, feeding it information and telling it to “remember this information for future conversation,” it routinely forgot or had to be reminded to use this info. Even within the same thread, it would make errors and need nudging to use the information provided earlier.
Mainstream reactions to GPT-4
- Washington Post: “GPT-4 will blow ChatGPT out of the water.” (R)
- New York Times (NYT): “GPT-4 is impressive but still flawed.” (R)
- New York Times (NYT): “GPT-4 is exciting & scary.” (R)
My thoughts on ChatGPT-4…
I’ve used ChatGPT-4 for around 1 day and am impressed. The writing is significantly better than ChatGPT-3.5 and its better at following specific prompt instructions.
I agree with Sam Altman when he states it seems more impressive on first use than it does after you spend more time with it… I was more impressed in the first couple hours than after a day.
However, I still think GPT-4 is a noticeable upgrade over GPT-3.5 in content quality, accuracy, precision, bias, and reasoning.
The only thing I really don’t like is the slothy speed, but this was expected for the initial rollout – and should continue to get faster in forthcoming months.
How have I used GPT-4? I had it analyze data related to the March Madness 2023 NCAA Men’s Basketball Tournament (rankings, brackets, injury reports, etc.) and make predictions.
I also used it to give me some ideas for writing – and had it try to analyze some of my content and write something similar… it performed far better than GPT-3.5 but is still lacking in creativity.
Have you used ChatGPT-4?
In which specific ways did you use GPT-4?
In which ways do you find ChatGPT-4 to be most useful?
What are your initial impressions of GPT-4?
Do you think GPT-4 is a significant improvement over GPT-3.5? (If so, which improvements do you consider most significant?)