Shepherd Technologists on Predictive Coding: What to Expect
POSTED ON April 14

Last post, Shepherd technology whizzes Brandon Ward and Ben Legatt talked about firing up a predictive coding project. In this edition, the last in a three-part series on predictive coding, they tell us what to expect as the project proceeds. “With enough on-point and conceptually rich documents, the process for the reviewer is fundamentally simple,” Legatt assures us.

But even in a simple process, complications can arise. Issues may evolve over the course of litigation. “When there’s a new or different issue, it may take additional review rounds to reach a stable standard or yardstick against which all documents are measured,” Ward explains. The term overturn is used to characterize the level of stability in the result. “A low overturn rate is an indicator that the initial review and resultant seed set or control group of ideal documents was on target,” commented Legatt. “Overturns are documents that other team reviewers believe have been mis-classified. Five percent or less is a good stable level for overturns,” says Ward.

One of the interesting aspects of predictive coding is how the system views documents. “A document is a giant bag of words to the system, so it can just focus on organizing by conceptual similarity based on the connections among those words, rather than being distracted by the content,” Legatt says.

Predictive coding doesn’t work on just any matter, though, Ward continues. “The system needs words to form concepts, so the more documents it has to work with, the better. After culling out unusable files such as Excel spreadsheets, 50,000 conceptual documents is really the minimum for the universe of documents.”

What if documents contain foreign words or jargon? “Predictive coding works on any language,” says Legatt. If foreign words are relevant to a concept, the system will process them. On the other hand, solitary strings that have no connection to any concepts—like “please see attached” in emails—are dropped once flagged.

Ward provided an example of predictive coding in action. “We had a client who needed a project done in two weeks. By using one reviewer on the sample and applying the results to the universe, the system ran six rounds, and reduced a 200,000 document set down to 20,000 for individual review.”

It’s important to remember that predictive coding is part of the larger litigation process, and must be acceptable to all parties and the court. What if there is a discovery dispute as the case moves toward trial? “Predictive coding through Relativity offers a very transparent process. If anyone has questions about inclusion or exclusion of a document or group of documents, we can produce reports showing why a document was treated the way it was.” Legatt clarifies.

If e-discovery is in the cards, predictive coding may be the best path forward. As Legatt says, “The goal is to process a large number of documents with the fewest possible people.” And that can save time and money for everyone involved.

About the Author Chris

Author Avatar Christine Chalstrom is the Founder, CEO, and President of Shepherd Data Services, Trustee, Mitchell Hamline Law School and Adviser, Center for Law and Business. She has spoken widely on the Amendments to the Federal Rules of Civil Procedures, Digital Forensics, and eDiscovery best practices. Her credits include presentations to the American Bar Association, Association of Certified e-Discovery Specialists (ACEDS), Corporate Counsel Institute, MN Association of Corporate Counsel, MN Association of Litigation Support Professionals, MN CLE, Mitchell Hamline School of Law, Upper Midwest Employment Law Institute. She is an attorney, programmer, and forensic examiner.