A couple weeks ago, we bet big on AI agents with the launch of Spellbook Associate. We believe that agentic approaches will be the next big thing in legal AI. In short, they enable AI systems to plan, reason, execute and check their own work. This is not just an incremental improvement: agents enable full, complex projects to be taken on, almost like delegating to an AI colleague.
OpenAI’s brand new o1 model is now bringing agentic approaches to ChatGPT, and will power new experiences in Spellbook. We’re excited to share some of the key legal use cases for o1 we’ve tested within Spellbook.
AI chat assistants were a bit like putting up a microphone to someone’s face and getting them to answer a question on the spot, with no time to think. o1 introduces the ability for the system to reason before answering. This is similar to the psychology concepts of System 1 and System 2 thinking. System 1 thinking is rapid and instantaneous in the moment, while System 2 thinking is slow and intentional.
The proof is in the pudding that System 2 thinking dramatically improves results across the board. o1 blows away gpt4o across many benchmarks.
The #1 thing we’re most excited about is o1’s performance in document revision tasks.
A lot of genAI experiences spit out entirely new documents. But lawyers are rarely drafting anything from scratch. They typically have a precedent that they want to modify.
Contracts like Share Purchase Agreements can be 100+ pages long. Making significant revisions to them requires a lot of jumping around, consistency checking, and making sure numbers add up. System 1 thinking does not work well here, and it’s a deep challenge to get these tasks performing well with models like GPT4.
In the example below, we used o1 with Spellbook Associate to update a commercial lease based on a termsheet:
With o1, we are seeing dramatic improvements for revision tasks across the board. One of our top predictions is that there will be a lot more workflows based on nuanced document revision launched over the coming year.
Another consistent weakness with GPT4 has been its ability to really understand the numerical content it is working with in agreements, and whether it "all adds up". Discrepancies between cap table spreadsheets and deal documents have cost shareholders many millions of dollars.
While tools like Spellbook have been great for detecting legal issues in text, they’ve been “blind” to whether things like share prices and ownership percentages really add up.
One of the biggest surprises for us has been o1’s ability to detect and correct mathematical errors—often proactively without even asking.
In the example below, we used a dummy termsheet to populate some financing documents. O1 proactively picked up the error in the data:
And not only that, I was able to request a new ownership percentage (5%), and it was able to do all the math to determine the other values in the document:
And here's o1-preview working to discover a similar problem in Spellbook's Chat feature within Microsoft Word:
Lastly: what excites us most about agents is their ability to execute sprawling workflows across multiple documents and applications. Not just chatting, but really planning and doing work for you. Our Spellbook Associate launch video shows this in action:
o1 doesn’t support this kind of workflow on its own, but it enables steps in Associate’s workflow to be much more reliable. A key issue for fully autonomous agents to date has been their tendency to sometimes go “off the rails” after making one mistake. o1’s ability to self-correct and reflect on its own work is already enabling many new complex use cases in testing—especially when workflows include complex document revisions.
To continue to get sneak peaks of the future for AI agents in law, consider signing up for Spellbook Associate's early access program.
You can unsubscribe at any time. Read our Privacy Policy for more.
Thank you for your interest!
Thank you for your interest! We are currently only onboarding legal professionals.