Law is a literature-dependent profession, and unlike many fields, it’s not limited to current literature. Thousands of case files or legislative data points could potentially matter for a single case. Indexing, searching, and truly comprehending it all has always been a challenge. Enter AI: it seems like a perfect fit, and with success stories like CaseText, investors are flocking in.  

But the realities of legal research are fabulously incongruent with that narrative. So, let’s explore the evolution of this space: what’s actually being researched, by whom, and why, and how two players have so successfully dominated this space for so long. The good news? The landscape might finally be ripe for disruption and the emergence of something transformative and massive. 

First, some caveats.  

  1. I don’t have a law degree. I thought about getting one, but the food at the business school was better. If I’m wrong about something, please tell me. 
  2. I don’t have a crystal-clear view of the tech of every company I reference. If I understate or overstate capabilities, feel free to set the record straight. 

WHAT IS LEGAL RESEARCH, EXACTLY?  

At its core, legal research splits into two types of sources: primary and secondary.  

Primary sources include the Constitution, state codes, case laws, judicial opinions; basically, the foundational legal texts. These are, generally speaking, free-ish and accessible. But that does not mean easily accessible. It might not be well-indexed and searchable or even digitized. While they’re copyright-free, accessing them often involves administrative costs or navigating paywalls through specific portals.  

Secondary sources include basically everything else: annotations, reviews, articles, practice guides, etc. These are often copyrighted, meaning you’re looking at publisher paywalls or digital portals with distribution rights. Here, for example, a citator tool, which evaluates whether a statute is still valid based on case history, would fall under secondary sources, while the actual laws and case outcomes are primary. 

THE COMPLEXITY OF RELEVANCE

The relevance of these sources can vary wildly depending on practice area, and the diversity of practice areas is astounding. At a basic level, you could split practices between litigation and transactional work, but there is also relevance in the CLO org.  

Litigation firms rely heavily on case law and statutory history. Civil litigation might focus on damages while criminal litigation involves defending or prosecuting criminal charges. Within civil litigation, practice areas like IP, commercial disputes, employment law or mass torts require specific research.  

Transactional attorneys also leverage research, but their focus is often on ensuring enforceable clauses in agreements, informed by precedent. Much of their research might come from recycling and refining previous merger or financing agreements. They could also be pulled into diligence and risk assessment for clients across industries which might involve deep, highly specific research.  

In-house counsel. GC’s draft internal policies and tackle compliance, they also manage contracts. Access to resources that keep them current on relevant legislation and to a lesser degree pertinent case law around labor, for example, matters. 

WHAT BUYERS WANT (AND CAN AFFORD)

What research is relevant will change based on practice areas, but also forum, size and revenue model. Some law firms pursue cases at the state level, focused on state constitutions and local laws, while others focus on federal issues – antitrust, IP, constitutional challenges, etc. Firms can be skilled at trial / lower courts, arguing before a judge and jury, or appellate courts, where research burden climbs to understand and reverse lower court decisions. A solo practitioner specializing in local property law might benefit from certain research subscriptions but be priced out entirely; even within a large firm, specific research might be critical for one arm but irrelevant elsewhere. Of course, contingency-driven plaintiff firms are far more selective with their spending compared to firms billing by the hour, where research costs might pass directly to clients or at least be balanced by retainers and expected revenue.  

The right sales motion, and which research models find traction, is deeply tied to these factors. (More on that later.) 

TRANSACTIONAL PRACTICES AND INTERNAL MATERIALS

It’s also worth considering internal materials, an essential but separate layer of legal research. Transactional practices often lean harder on past contracts, in-house research, or internal case notes, etc. For instance, a legal firm focusing on M&A might care less about precedent-setting case law and more about its collection of merger agreements. Here, more relevant than cases or statutes would be previously used clauses and data on negotiated terms, or benchmarking market norms.  

This whole area is more in the realm of Contract Lifecycle Management (CLM), Document Management Systems (DMS), and similar tools- which is an adjacent but separate field, and beyond the scope of this post. 

SOME HISTORY 

To take it way back, circa 1800’s, access to precedent case data was far from straightforward. Where it existed, it was reported by publishers regionally and not shared by any government entity. In fact, before Wheaton v. Peters in 1834, publishers could even copyright case outcomes. So there was a high cost to access, counsel likely had poor access, and what existed was limited, regional and very delayed. 

West Publishing made strides after 1876, publishing case data across multiple jurisdictions. This comprehensive reporting got the ABA stamp of approval in 1898. This evolution in accessibility even helped shape law school teaching, giving rise to the case method that took root around 1914. 

Tracking laws was equally challenging. Federal laws had been published annually since 1789, but it wasn’t until 1926 that they were organized by subject in the United States Code, finally offering a cohesive structure for navigating legislation. 

As access improved through efforts of commercial publishers and governments, legal literature proliferated aggressively in the second half of the 20th century. That sounds great in theory, but it was very hard to keep up with from both cost and complexity. By the early 1970s, an estimated 30,000 new judicial decisions were added every year to the existing 2.5 million, alongside 10,000 legislative enactments added annually.  

For some modern context, in 2023 alone, there were 40,681 appellate filings and 412,052 district court filings across civil and criminal cases (not even counting state courts). This creates an ever-growing mountain of information to track and sift through, even before factoring in legislative sessions. 

RESPECT THE LAW [LIBRARIAN]

Some very pretty law libraries were built to house it all; this was also responsible for the emergence of the role that matters heavily here – the law librarian. The American Association of Law Librarians emerged in 1906, but it was in the post-war period that law librarians became vital in managing the flood of data.  

Law Librarians didn’t just organize information – they routed inquiries, led teams to pull relevant filings from local courts, and eventually became the gatekeepers of emerging technology. As digital solutions emerged, it was the librarians who procured and leveraged those tools. They trained associates to use this technology, oversaw its application, and ensured research was accurately tracked, especially when it was billable to clients.  

DIGITAL ACCESS BIRTHS MASSIVE PLAYERS

Digital access came relatively late to legal research, tracing its roots back to John Horty at the University of Pittsburgh and, ultimately, the Ohio State Bar Association, which launched LEXIS in 1973. West quickly followed with Westlaw in 1975.  

Initially, these platforms focused on non-copyrighted content, like case law and statutes, with West’s own headnotes as an exception. Over time, they added law reviews and secondary sources. The early versions had full text with keyword searchability. Boolean search didn’t arrive until the 1980s, followed by hypertext linking in the 1990s alongside the internet boom. 

Despite new entrants, a solid duopoly quickly formed: Westlaw and LexisNexis (now under RELX and Thomson Reuters). Even with Google Scholar’s free offering and Bloomberg Law’s entry in 2009/2010, breaking the duopoly has proved remarkably difficult. Many smaller players either merged (e.g., Casemaker and Fastcase, now under vLex), while others, like CaseText, gained traction only to be acquired by one of the giants. 

Why has the duopoly been so resilient? A few key reasons:  

  1. Comprehensive Data Assets: In legal research, more is better. Over decades, Westlaw and LexisNexis built the most expansive repositories of primary and secondary data.  
  2. Proprietary Features and Exclusive Content: Tools like Westlaw’s Key Number system for categorizing law, along with KeyCite and Shepard’s citators offer unique value and become industry standards. 
  3. Early Entrenchment in Legal Education: WestLaw and LexisNexis secured deep access to law schools, effectively indoctrinating new lawyers from the start of their careers. 

In 1977, there were 23 legal publishers of some size. As of the early 2000’s three remained, collectively controlling 90% of the U.S. legal publishing business, accounting for some $3bn in annual revenue [source]. 

OVERCOME-ABLE FORCES

Most industries have incumbents, and the startup’s challenge is often to displace something on which many people have already become reliant. The key is spotting shifting dynamics that expose weaknesses in incumbents.  

For example, presence in law schools does inure gradating lawyers to existing players, but arguably that effect is weakened if AI enables a much easier identification of relevant cases or search for key statutes. Boolean search, a skill once critical for navigating legal databases, might soon matter far less for new graduates. 

Billing dynamics contribute as well. Historically, a massive hurdle might have been that the Big Two were generally billed back to clients, and had built their software to tether work and billing to specific client matters. These company names were acceptable on client invoices. Over time firms have migrated increasingly to treating research expense as overhead, passing along costs at discounted rates or in specific circumstances. Cost pressure from the duopoly, increasing as they acquire competitors and price up those new features, drives appetite for alternatives. This is complemented by greater access to searchable and low-cost sources (read: Google). It helps that there is a champion – the Law Librarian, or Head of Information Services – for whom a core task is to think about cost effective resources.  

It’s reasonable, then, to look at this duopoly as ripe for disruption, but legal research is uniquely challenging terrain. For one thing, the incumbents are famously litigious. Consider ROSS Intelligence, which shut down under the strain of litigation even before a verdict was delivered on the Thomson Reuters case against them. 

People will point to Casetext as the other side of the argument – they, too, emerged as a cheaper alternative to the big research players, but instead of drowning in litigation, they found a strong exit. It’s worth remembering that Casetext’s success didn’t happen overnight. It took years of effort and only took off when they adopted AI, ultimately catching the eye of one of the only meaningful buyers in the space. If anything, their quick acquisition by an incumbent underscores an important reality: the dominant players are not sleeping on innovation. 

STARTUP ANGLES OF ATTACK

So, what’s the big opportunity here? How does a new player emerge to displace this old guard? 

Startups in this space can build moats by: 

  1. Creating a better all around solution for pulling up relevant data 
  2. Offering proprietary or otherwise hard-to-access sources, whether primary or secondary data, or   
  3. Targeting a different cost structure or customer profile (ICP) 
BETTER EXPERIENCE

Improving the experience doesn’t necessarily mean sleek UX or better natural language processing – though those help. The real competitive advantage often lies in indexing, hierarchies, and categorization. These are the backbone of legal research and a big piece of the propriety advantage for existing players. 

As one observer noted, “Despite the fact that Lexis and Westlaw compile information that is largely publicly available, they distinguish themselves from potential entrants to the broader electronic legal information market by virtue of the search and other capabilities that they have developed within their respective databases.” A new, better approach to categorizing cases, even in a specific practice area, could be meaningful. 

Even small innovations in format can have outsized impacts. Think of LexisNexis getting stung for copying the West’s star pagination system, which is really just a solution for easier case citation. Or take Casetext, which disrupted the space by introducing parallel search. Their approach enabled users to upload sections of a brief or drafted sentences, using the context to deliver more relevant results. This was a massive leap forward from keyword and Boolean search, offering a faster path to relevant answers more closely tethered to drafting workflow. They also leveraged this for case strategy – you could upload motions from opposing counsel and quickly identify the precedent cases their positions hinged on. [YT] 

Emerging players are pushing this even further. For example, Spellbook integrates contextual search within Word, prompting suggestions right within a lawyer’s draft. Prompting in other workflows could be meaningful in-house; think labor law within Workday flows or gap analysis and regulation insight within compliance team solutions.  

Alternatively, data visualization could be an interesting play, or a solution that layers sophisticated courtroom and litigation strategy into case management systems.   

DIFFERENT DATA

The days of sending reporters to courts for decisions are long gone – governments have, to various degrees, caught up to offer digital resources for legislation and court decisions. Startups today have a wealth of primary data to work with. 

The real opportunity lies in making the inaccessible accessible. Even as recently as the 1980s, law librarians relied on runners to fetch court data. Why? While Westlaw might deliver case dispositions (whether there was a criminal conviction, judgment in a civil case, dismissal, etc.) it lacked relevant exhibits, judicial opinions, discovery materials, and actual transcripts – critical data points for many cases.  

Even today, companies are being built that create their data assets from exhibits they physically source or methodically index – Qumis and Trellis, for example.  

Another approach is innovating around secondary sources, which are critical to contextual understanding – such as citators that evaluate the status of a specific law. Existing players already offer proprietary content, like Westlaw’s headnotes. However, small firms often pare back to primary sources because secondary libraries are cost prohibitive. Today, there’s “a relatively small number of legal scholarly works that are cited account for a lion’s share of citations.” [source] A new player could exploit this concentration and target a specific practice area to build a repository of high-value complementary content.  

A more exciting approach is to ask, can AI develop secondary sources that don’t currently exist? For example, BenchIQ generates judicial options where they don’t exist based on transcripts. Or consider Paxton’s purely AI-driven citator tool, potentially disrupting one of the incumbent’s strongholds. These highly targeted and powerful solutions could also be billed as pass-through expenses to clients, making the value proposition to a law firm much easier to digest. 

DIFFERENT PRICING AND BUYERS 

As mentioned above, legal research is increasingly seen as overhead cost for law firms, and they will acquire specific access. For example, if a firm purchases a specific subset of data from Lexis around state-specific labor cases, but needs to go out-of-plan for a matter, Lexis pricing will get transactional and aggressive. In these cases, alternatives become much more attractive. If the law firm cedes some ground to the autonomy of creative associates, that democratized buying behavior creates an avenue for players selling bottoms-up. 

Cheaper models also resonate with small firms, and may have distribution strategies that help. While the duopoly remains entrenched in law schools, new entrants have found success with other audiences. Think of Fastcase (now under vLex), often offered alongside bar membership, or emerging legal communities that offer discounts to new solutions like Midpage. Platforms like Filevine and Clio are great examples of how big the opportunity with smaller firms can be. However, most startups remain focused on aggressively hunting enterprise-wide licenses with the Am Law Top 20. 

As mentioned above, in-house buyers are also a good target (something we wrote about here).  

MOONSHUT VS TUCK-IN

As is true in other industries, it’s crucial to distinguish between startups positioning for acquisition and those aiming to compete as standalone platforms. A handful of active acquirors in this space are actively seeking solutions and data that complement their existing product catalog or expand their client base.  

While the three strategies outlined above can help startups gain traction, they aren’t a recipe for fully shaking up the status quo. To do that, a player must create a fundamentally sticky, transformative approach to legal research or the broader legal workflow. We are excited to see some of the players mentioned making strides toward just that. 

As always – if you’re building in this space, we want to know about it – drop us a line. 

Special thanks to Mike Devlin, Michael Sander, Jim Ovbiagele for their critical eye, advice, and contributions to this piece.