I had a whole post ready for today to discuss “dark matter developers” and the silent majority that keeps our software applications running. After reading Anthropic’s press release for new Claude models today I felt that was more important and the dark matter developers post will be up next week ✌️
Claude 3.5 Release
Anthropic has released an updated Claude 3.5 Sonnet model, showing major improvements in software development capabilities. The new model scored 49% on SWE Bench tests (up from 33.4%), Claude was already the best model for software development and it has now extended the lead scoring higher than all other publicly available models in solving real-world GitHub issues in Python repositories.
Claude Sonnet’s understanding of business domains has also improved with performance on the TAU-Bench metrics increasing from 62.5% to 69.2%. This means that Sonnet is more capable of reasoning and understanding compared to previous models and most importantly when compared to other public models.
Claude 3.5 Haiku has some updates as well. It is now more efficient and has the same cost as the previous generation of Claude Haiku 3.0 with much better results. Haiku can now outperform the previous generation of Opus and remains the best option when building user-facing AI applications.
Interestingly, Claude Opus has been removed from the model list where it was previously marked as “Coming Soon”. Online, people are speculating this is because Anthropic is able to get better results with a smaller model, if so that is very exciting and continues to show the growth in research and increasing power of AI systems.
The most exciting part of this release is the Computer Use feature where Claude can now take on the role of a user in a real world application accessing browser windows and even the computer itself. In their demo video Anthropic researchers show Claude filling out a form based on spreadsheet data, other online information and information from the user’s computer. The possibilities of this to increase productivity is massive with many data entry tasks being fully automated.
One thing I do wish Anthropic would resolve is their versioning, it would be much better if this was called Claude Sonnet 3.6 instead of reusing the 3.5 version number. This duplication of versioning means it is more difficult to understand what people are referring to particularly in blogs or on X.
If you are curious about how Generative AI and LLMs can help your business particularly if you are an engineering manager or software engineer please reach out. I would also love to hear from you if you have interesting use cases or examples from your real world workflows.
If you are interested in following my journey you can subscribe to my blog and YouTube channel. My other links are available on My Linktree.