For the first time, a court in the UK has approved the use of predictive coding to assist with the e-disclosure process in litigation.
With a precedent having now been set, the use of predictive coding in litigation is finally set to take off. Is this a step towards the introduction of far more disruptive, artificial intelligence (AI) based technologies in the legal industry?
What is e-disclosure?
A distinctive feature of common law jurisdictions such as England and Wales or the US is that litigants are typically entitled to receive (and required to provide) disclosure of relatively wide categories of documents. As a result, the process of searching for and reviewing potentially disclosable documents can be a highly time-consuming, and expensive, exercise. The vast and increasing volume of electronic data most businesses hold can leave litigants having to sort through millions of documents in order to find the relatively small number of documents that are actually disclosable to their opponent.
In an attempt to combat this, parties in England and Wales are now expected to answer an electronic disclosure (e-disclosure) questionnaire and look to agree on parameters for the e-disclosure exercise. By applying filters such as date range, custodian(s) of documents and keywords, data sets can be reduced significantly, but may still result in a large volume of documents that need to be reviewed by lawyers or other legal staff (such as paralegals).
What is predictive coding?
AI-based technological solutions have been developed to assist with the e-disclosure process. Technology-assisted review, also known as predictive coding, uses AI to ‘learn’ from human reviewers which documents are likely to be relevant. The basic process is as follows:
- A small initial sample of the pre-filtered dataset is reviewed for relevance by a person (usually a qualified, often relatively senior, lawyer).
- The computer system analyses the documents that have been flagged as relevant and extracts key concepts and themes.
- The system then applies those concepts to the full dataset and selects another batch of documents, tagging those it considers to be relevant.
- The human reviewer looks at those documents (without knowing which have been tagged by the system) and decides which they consider relevant, which is then fed back to the system.
- This iterative process will continue until the proportion of machine-selected documents that are ‘overturned’ by the human review is within a pre-agreed tolerance.
The process still requires input from lawyers, so may more accurately be described as “augmented intelligence”, rather than entirely AI. The ‘learning’ imparted to the system will also be specific to the case in hand. Nevertheless, where the volume of data is particularly large, the overall amount of time spent by human reviewers may be significantly reduced.
The growth of predictive coding
The technology behind predictive coding is not particularly new, but it has taken time to gain traction and gain explicit judicial approval.
Over the last few years, predictive coding has come to be used and accepted by courts in the US, and in 2015 the Irish High Court also approved its use. In doing so, the Irish court found that the evidence had established that “in review of large datasets, predictive coding is at least as accurate as, and, probably more accurate than, [manual review] in identifying relevant documents.”
Yet until last week, there had not been any direct English authority on the use of predictive coding. That has now changed.
Pyrrho Investments v MWB Property: approval of predictive coding from the English courts
Pyrrho Investments & others v MWB Property & others  EWHC 256 is an ongoing dispute regarding alleged breaches of fiduciary duty. The parties had exchanged information about the e-disclosure process, and had identified that the number of documents that would need to be reviewed was in excess of three million. It was estimated that the costs of a traditional review of those documents would run to several million pounds.
The parties had therefore agreed to use predictive coding, subject to approval by the court. They estimated that the costs of the exercise carried out in that way would be in the region of £180k to £470k.
Having set out the English law on e-disclosure, considered the nature of e-disclosure and predictive coding, and reviewed the US and Irish authorities, the High Court Master, Master Matthews, gave the parties the approval they were seeking. Master Matthews noted that:
- there was nothing in the CPR or Practice Directions which precluded predictive coding;
- experience from other jurisdictions had shown that predictive coding could be useful in appropriate cases, with no evidence that it leads to less accurate disclosure than manual review (and some evidence to the contrary);
- the use of a computer effectively to apply the approach of a single senior lawyer would give a more consistent result than using dozens or hundreds of less senior reviewers;
- the cost of using predictive coding would be proportionate to the amount in dispute (tens of millions), and significantly less than a traditional review; and
- in any event, there was sufficient time before trial (set for June 2017) for the parties to consider other methods if the results of predictive coding were found not to be satisfactory.
Osborne Clarke comment
With predictive coding having now been expressly approved by the English courts, we expect more parties to make use of this technology. It will not remove entirely the human element of the review process: indeed, the requirement for more senior input at the outset means that it will not be cost-effective in every case. Nevertheless, for cases involving very large volumes of disclosure, predictive coding can offer significant potential advantages over more traditional methods of review.
In Pyrrho v MWB, one of the key factors was that the parties all agreed to the use of predictive coding. The court may have been less keen to impose a novel technology on an unwilling party. However, since the use of predictive coding has now been approved by one judge, parties may find it more difficult to oppose its use.
The adoption of predictive coding raises wider questions about the role of AI in the legal industry. For many in the legal profession, the possible applications of AI are a cause for excitement or concern, or both. AI systems like IBM’s Deep Blue and Watson, for example, are able to defeat humans at activities like chess or TV game shows. IBM’s recently announced collaboration with Thomson Reuters opens the possibility of combining that AI technology with rich data streams in the form of leading legal databases.
The courts are also now beginning to see the potential. A key part of the new Online Court being proposed by Briggs LJ (see our previous article here) is an initial self-diagnosis stage for parties. This would be an AI-based system, giving basic, generic advice based on the answers to a questionnaire, and then potentially using the information given to generate outline claim or defence documents. If such technologies are successful, there are clear implications elsewhere in the legal industry.
The experience of predictive coding shows that, in the foreseeable future at least, new technologies are more likely to amount to “augmented intelligence”, working alongside lawyers’ own skills, rather than replacing them altogether. Whether augmented or ‘full-AI’, however, new technologies seem set to play an increasingly important role for parties looking to resolve their disputes in a proportionate, cost-effective manner.