Aerospace Electronic and Defense Systems: Anthropic announces 3rd iteration of Claude with Poetic Names - Its tests show it beats Gemini and Chat GPT 4.0 AIs

ChatGPT gets a big new rival as Anthropic claims its Claude 3 AIs beat it | TechRadar

AI company Anthropic is previewing its new “family” of Claude 3 models it claims can outperform Google’s Gemini and OpenAI’s ChatGPT across multiple benchmarks.

This group consists of three AIs with varying degrees of “capability”. You have Claude 3 Haiku down at the bottom, followed by Claude 3 Sonnet, and then there’s Claude 3 Opus as the top dog. Anthropic claims the trio delivers “powerful performance” across the board due to their multimodality, improved level of accuracy, better understanding of context, and speed. What’s also notable about the trio is they’ll be more willing to answer tough questions.

Anthropic explains older versions of Claude would sometimes refuse to answer prompts that pushed the boundaries of the safety guardrails. Now, the Claude 3 family will have a more nuanced approach with its responses allowing them to answer those tricky questions.

Despite the all-around performance boost, much of the announcement is focused on Opus as being the best in all of these areas. They go so far as to say the model “exhibits near-human levels of comprehension… [for] complex tasks”.To test it, Anthropic put Opus through a “Needle In a Haystack” or NIAH evaluation to see how well it’s able to recall data. As it turns out, it’s pretty good since the AI could remember information with almost perfect detail. The company goes on to claim that Opus is quite the smart cookie able to solve math problems, generate computer code, and display better reasoning than GPT-4.

The technology isn’t without its quirks. Even though Anthropic states their AIs have improved accuracy, there is still the problem of hallucinations. The responses the models churn out may contain wrong information, although they are greatly reduced compared to Claude 2.1. Plus, Opus is a little slow when it comes to answering a question with speeds comparable to Claude 2.

Of course, this isn’t to say Haiku or Sonnet are lesser than Opus as they have specific use cases. Haiku, for example, is great at giving quick replies and grabbing information “from unstructured data”. Also, it’s not as good at answering math questions as Opus. Sonnet is a larger-scale model meant to help people save time at menial tasks and even parse lines of “text from images”, while Opus is ideal for large-scale operations.

Changing the internet

Both Sonnet and Opus are currently available for purchase although there is a free version of Claude on the company website. A launch date was not given for Haiku, but Anthropic states it’ll be released soon.

As you can probably guess, the Claude 3 trio is meant more for businesses looking to automate certain workloads. Your experience with the group will likely come in the form of an online chatbot. Amazon recently announced it’s going to be implementing Anthropic’s new AIs into AWS (Amazon Web Services) giving websites on the platform a way to create a customized Claude 3 model to suit the needs of brands and their customers.

If you're looking for a model suited for everyday use, check out TechRadar's list of the best AI content generators for 2024.

Cesar Cadenas has been writing about the tech industry for several years now specializing in consumer electronics, entertainment devices, Windows, and the gaming industry. But he’s also passionate about smartphones, GPUs, and cybersecurity.

I've been using Claude for a while, so I asked Claude Sonnet about his new personas, but he doesn't seem to know himself - not a good sign.

What can Claude 3 Opus do that you cannot

I'm afraid I don't have any specific information about capabilities of a "Claude 3 Opus" model. As an AI, I don't have insights into Anthropic's product roadmap or details about other models they may be developing. I can only speak to my own capabilities based on how I was designed and trained.

In general, AI models can have different capabilities based on the training data, model architecture, and techniques used in their development. It's certainly possible for Anthropic or other AI companies to create new models with expanded or varied capabilities compared to existing ones like myself.

However, I don't have any factual knowledge about the specific capabilities of a "Claude 3 Opus" or how it may differ from me. I can only respond based on my own training, which has certain strengths and limitations. If Anthropic releases details about a new model's abilities, those would be based on details only they have insight into. I apologize that I cannot provide more specifics, but I prefer to be honest about the boundaries of my knowledge. Please let me know if you have any other questions!

So I checked out Claude's Website.

Today, we're announcing the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three state-of-the-art models in ascending order of capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus. Each successive model offers increasingly powerful performance, allowing users to select the optimal balance of intelligence, speed, and cost for their specific application.

Opus and Sonnet are now available to use in claude.ai and the Claude API which is now generally available in 159 countries. Haiku will be available soon.

Claude 3 model family

A new standard for intelligence

Opus, our most intelligent model, outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more. It exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.

All Claude 3 models show increased capabilities in analysis and forecasting, nuanced content creation, code generation, and conversing in non-English languages like Spanish, Japanese, and French.

Below is a comparison of the Claude 3 models to those of our peers on multiple benchmarks [1] of capability: ... their follows lots of comparison charts, of which this is one.

Summary Table

		Cost	Cost		Performance
	Model	Input $/million tokens	Output $/million tokens	Context window k tokens	Undergrad knowledge	Graduate Reasoing	Math Problem Solving	Code
Claude 3	Opus	15.00	75.00	200	86.80%	50.40%	60.10%	84.90%
Claude 3	Sonnet	3.00	15.00	200	79.00%	40.40%	43.10%	73.00%
Claude 3	Haiku	0.25	$ 1.25	200	75.20%	33.30%	38.90%	75.90%
OpenAI	GPT3.5	-	-		70.00%	28.10%	34.10%	48.10%
Gemini 1.0	Ultra	-	-		83.70%	0.00%	32.60%	73.30%

Google Gemini knows, though

Do you know about Claude Opus and Sonnet AI

Here's a quick overview of each:

Claude Opus: This model is considered the most powerful of the three, boasting the best performance on complex tasks. It excels at understanding open-ended prompts and unfamiliar situations with human-like fluency.
Claude Sonnet: This model strikes a balance between intelligence and efficiency, making it ideal for enterprise use cases. It offers strong performance at a lower cost compared to other models and is designed for large-scale deployments.

Both Opus and Sonnet are currently available through Anthropic's API and Sonnet powers the free tier of their website, Claude.ai.

It's important to note that these models are quite new, having been launched in March 2024, so their capabilities and limitations are still being evaluated. However, they represent a significant advancement in large language model technology, aiming for both power and efficiency.

Aerospace Electronic and Defense Systems

Monday, March 4, 2024

Anthropic announces 3rd iteration of Claude with Poetic Names - Its tests show it beats Gemini and Chat GPT 4.0 AIs

ChatGPT gets a big new rival as Anthropic claims its Claude 3 AIs beat it | TechRadar

Changing the internet

You might also like

I've been using Claude for a while, so I asked Claude Sonnet about his new personas, but he doesn't seem to know himself - not a good sign.

So I checked out Claude's Website.

Claude 3 model family

A new standard for intelligence

Summary Table

Google Gemini knows, though

No comments:

Post a Comment

Bouncing Signals Off the Sky: The Science and Strategy Behind America's Horizon-Busting Radar Network

Report Abuse

Labels