Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Technology Assisted Review: Much More Than Predictive Coding

By Greg Buckles
July 30, 2012

Last June, Recommind stole a march in the e-discovery market with a patent for its predictive coding (PC) offering. The patent covers Recommind's systems and methods for iterative computer-assisted document analysis and review, and came just as a wave of different technology assisted review (TAR) offerings hit the market.

The result was a tumultuous year where confusion reigned: What is PC? What does the Recommind patent cover, and can other vendors offer PC? What about all the other predictive-type solutions flooding the market?

With some case law beginning to emerge now, almost a year later, the market has recognized that Recommind's PC methodology and usage case is only a small part of the bigger TAR picture, and that it is time for legal teams to embrace new, advanced review methodologies.

The bottom line is that, in the context of today's advanced technological world, TAR is about using a combination of technology and people to speed, improve and sometimes automate elements of the legal review process in a way that reduces costs and improves quality.

The eDJ Group has been conducting surveys and interviews to get a clearer picture of market adoption and attitudes. Interestingly, a quick graph of average Google hits per month for the search term “predictive coding” reveals a rapidly increasing use of the term that peaked at LTNY 2012 and has begun to decline despite recent related cases. See Figure 1 below.

[IMGCAP(1)]

The broader search term “technology assisted review” first appeared on the Internet in the middle of last year and has gained traction, most likely because it is a more suitable term to describe a market in which PC is but one advanced method. A recent eDiscovery Journal poll showed almost 60% of respondents preferred the broader term TAR to the narrower PC. See Figure 2 below.

[IMGCAP(2)]

TAR is not simply about determining which documents are relevant and/or privileged and marking them as such; rather, TAR is more broadly applicable in other scenarios:

  • Pre-collection. Early case assessment (ECA) ' identification of custodians, sources and collection criteria.
  • Processing. Culling and collection organization.
  • Review. Clustering, relationship extraction and more.
  • Post-review. Quality assurance and iterative collection refinement.

The majority of TAR users eDJ interviewed were more comfortable using TAR pre-and post-review than actually allowing the system to make relevance decisions on the final “collection.” Recent high-profile cases highlight some of the issues around TAR and how it is applied in practice. The issues in one of these cases, Kleen Products v. Packaging Corporation of America, can be seen as a TAR-generational conflict: search optimization vs. concept training. Both methods utilize iterative sampling processes based on human decisions to include or exclude ESI. The dominant TAR methods appear to fall into three primary groupings:

  1. Rules-driven. 'I know what I am looking for and how to profile it.' Human experts extract the common criteria for searches or rules. Examples: search optimization, linguistic analysis, filtering.
  2. Facet-driven.I let the system show me the profile groups first.” The collection is analyzed and profiled to identify groups. Examples: clustering, concepts, social network analysis.
  3. Propagation-driven.I start making decisions and the system looks for similar-related items.” Sample and known seed sets are reviewed and the system learns commonalities from the decisions. Examples: near-duplicate expansion, predictive coding.

These TAR mechanisms are not mutually exclusive. In fact, combining the mechanisms together can help overcome the limitations of individual approaches. For example, if a document corpus is not rich (e.g., does not have a high enough percentage of relevant documents), it can be hard to create a seed set that will be a good training set for the propagation-based system. It is, however, possible to use facet-based TAR methods like concept searching to more quickly find the documents that are relevant to create a model for relevance that the propagation-based system can leverage.

The Da Silva Moore v. Publicis Groupe case raised customer interest in trying TAR solutions because of claims that certain products or methods were approved or endorsed. One should note, however, that no tool or specific process has been generally approved or endorsed; rather, the use of TAR has been allowed in cases where the parties have agreed on TAR or allowed pending objections based on the results.

It is important to understand TAR in the context of priorities: people, process and then technology. Most e-discovery teams have adapted traditional linear review workflows from paper documents to ESI collections. TAR solutions step out of the linear review box and introduce concepts such as confidence levels, distribution factors, precision, recall and F1 (a summary measure combining both recall and precision) stability. Someone on the team must understand your chosen TAR solution and be able to explain and defend it in the context of your unique discovery. TAR solutions promise to increase relevance quality while decreasing the time and cost of review. Hold them to that promise by measuring the results. Most courts seem more interested in the quantified output than the technology underpinning the process; measurement ultimately trumps method.

Getting the right expertise in place is critical to practicing TAR in a way that will not only reduce review costs, but stand up in court. Organizations looking to successfully exploit the mechanisms of TAR will need:

  • Experts in the right tools and information retrieval. Software is an important part of TAR. The team executing TAR will need someone that can program the toolset with the rules necessary for the system to intelligently mark documents. Furthermore, information retrieval is a science unto itself, blending linguistics, statistics and computer science. Anyone practicing TAR will need the right team of experts to ensure a defensible and measurable process;
  • A legal review team. While much of the chatter around TAR centers on its ability to cut lawyers out of the review process, the reality is that the legal review team will become more important than ever. The quality and consistency of the decisions this team makes will determine the effectiveness that any tool can have in applying those decisions to a document set; and
  • An auditor. Much of the defensibility and acceptability of TAR mechanisms will rely on statistics that demonstrate how certain the organization can be that the output of the TAR system matches the input specification. Accurate measures of performance are important not only at the end of the TAR process, but also throughout the process in order to understand where efforts need to be focused in the next cycle or iteration. Anyone involved in setting or performing measurements should be trained in statistics.

That brings us back to the crux of the Da Silva Moore arguments, “How do you know when your TAR process is good enough?” How do you assure yourself that your manual review satisfies the standards of reasonable effort?

The answer? Strict quality control during the process followed by quality assurance with predefined acceptance criteria ' and thorough documentation at every step.

The Da Silva Moore transcripts and expert affidavits contain some interesting arguments on sample sizing and acceptable rates of false-negative results. No sufficiently large relevance review is perfect, but few counsel are ready to hear that truth. We have no firm rules or case law that define discovery quality standards. Therefore, anyone practicing TAR should document TAR decisions and QA/QC efforts with the knowledge that the other side may challenge them.


Greg Buckles is a co-founder & principal analyst for the consultancy, eDJ Group. Previously, Buckles served as the senior product manager of e-discovery for Symantec Corporation's Information Foundation group. Buckles is also a member of the Sedona Conference and the EDRM Committees.

Last June, Recommind stole a march in the e-discovery market with a patent for its predictive coding (PC) offering. The patent covers Recommind's systems and methods for iterative computer-assisted document analysis and review, and came just as a wave of different technology assisted review (TAR) offerings hit the market.

The result was a tumultuous year where confusion reigned: What is PC? What does the Recommind patent cover, and can other vendors offer PC? What about all the other predictive-type solutions flooding the market?

With some case law beginning to emerge now, almost a year later, the market has recognized that Recommind's PC methodology and usage case is only a small part of the bigger TAR picture, and that it is time for legal teams to embrace new, advanced review methodologies.

The bottom line is that, in the context of today's advanced technological world, TAR is about using a combination of technology and people to speed, improve and sometimes automate elements of the legal review process in a way that reduces costs and improves quality.

The eDJ Group has been conducting surveys and interviews to get a clearer picture of market adoption and attitudes. Interestingly, a quick graph of average Google hits per month for the search term “predictive coding” reveals a rapidly increasing use of the term that peaked at LTNY 2012 and has begun to decline despite recent related cases. See Figure 1 below.

[IMGCAP(1)]

The broader search term “technology assisted review” first appeared on the Internet in the middle of last year and has gained traction, most likely because it is a more suitable term to describe a market in which PC is but one advanced method. A recent eDiscovery Journal poll showed almost 60% of respondents preferred the broader term TAR to the narrower PC. See Figure 2 below.

[IMGCAP(2)]

TAR is not simply about determining which documents are relevant and/or privileged and marking them as such; rather, TAR is more broadly applicable in other scenarios:

  • Pre-collection. Early case assessment (ECA) ' identification of custodians, sources and collection criteria.
  • Processing. Culling and collection organization.
  • Review. Clustering, relationship extraction and more.
  • Post-review. Quality assurance and iterative collection refinement.

The majority of TAR users eDJ interviewed were more comfortable using TAR pre-and post-review than actually allowing the system to make relevance decisions on the final “collection.” Recent high-profile cases highlight some of the issues around TAR and how it is applied in practice. The issues in one of these cases, Kleen Products v. Packaging Corporation of America, can be seen as a TAR-generational conflict: search optimization vs. concept training. Both methods utilize iterative sampling processes based on human decisions to include or exclude ESI. The dominant TAR methods appear to fall into three primary groupings:

  1. Rules-driven. 'I know what I am looking for and how to profile it.' Human experts extract the common criteria for searches or rules. Examples: search optimization, linguistic analysis, filtering.
  2. Facet-driven.I let the system show me the profile groups first.” The collection is analyzed and profiled to identify groups. Examples: clustering, concepts, social network analysis.
  3. Propagation-driven.I start making decisions and the system looks for similar-related items.” Sample and known seed sets are reviewed and the system learns commonalities from the decisions. Examples: near-duplicate expansion, predictive coding.

These TAR mechanisms are not mutually exclusive. In fact, combining the mechanisms together can help overcome the limitations of individual approaches. For example, if a document corpus is not rich (e.g., does not have a high enough percentage of relevant documents), it can be hard to create a seed set that will be a good training set for the propagation-based system. It is, however, possible to use facet-based TAR methods like concept searching to more quickly find the documents that are relevant to create a model for relevance that the propagation-based system can leverage.

The Da Silva Moore v. Publicis Groupe case raised customer interest in trying TAR solutions because of claims that certain products or methods were approved or endorsed. One should note, however, that no tool or specific process has been generally approved or endorsed; rather, the use of TAR has been allowed in cases where the parties have agreed on TAR or allowed pending objections based on the results.

It is important to understand TAR in the context of priorities: people, process and then technology. Most e-discovery teams have adapted traditional linear review workflows from paper documents to ESI collections. TAR solutions step out of the linear review box and introduce concepts such as confidence levels, distribution factors, precision, recall and F1 (a summary measure combining both recall and precision) stability. Someone on the team must understand your chosen TAR solution and be able to explain and defend it in the context of your unique discovery. TAR solutions promise to increase relevance quality while decreasing the time and cost of review. Hold them to that promise by measuring the results. Most courts seem more interested in the quantified output than the technology underpinning the process; measurement ultimately trumps method.

Getting the right expertise in place is critical to practicing TAR in a way that will not only reduce review costs, but stand up in court. Organizations looking to successfully exploit the mechanisms of TAR will need:

  • Experts in the right tools and information retrieval. Software is an important part of TAR. The team executing TAR will need someone that can program the toolset with the rules necessary for the system to intelligently mark documents. Furthermore, information retrieval is a science unto itself, blending linguistics, statistics and computer science. Anyone practicing TAR will need the right team of experts to ensure a defensible and measurable process;
  • A legal review team. While much of the chatter around TAR centers on its ability to cut lawyers out of the review process, the reality is that the legal review team will become more important than ever. The quality and consistency of the decisions this team makes will determine the effectiveness that any tool can have in applying those decisions to a document set; and
  • An auditor. Much of the defensibility and acceptability of TAR mechanisms will rely on statistics that demonstrate how certain the organization can be that the output of the TAR system matches the input specification. Accurate measures of performance are important not only at the end of the TAR process, but also throughout the process in order to understand where efforts need to be focused in the next cycle or iteration. Anyone involved in setting or performing measurements should be trained in statistics.

That brings us back to the crux of the Da Silva Moore arguments, “How do you know when your TAR process is good enough?” How do you assure yourself that your manual review satisfies the standards of reasonable effort?

The answer? Strict quality control during the process followed by quality assurance with predefined acceptance criteria ' and thorough documentation at every step.

The Da Silva Moore transcripts and expert affidavits contain some interesting arguments on sample sizing and acceptable rates of false-negative results. No sufficiently large relevance review is perfect, but few counsel are ready to hear that truth. We have no firm rules or case law that define discovery quality standards. Therefore, anyone practicing TAR should document TAR decisions and QA/QC efforts with the knowledge that the other side may challenge them.


Greg Buckles is a co-founder & principal analyst for the consultancy, eDJ Group. Previously, Buckles served as the senior product manager of e-discovery for Symantec Corporation's Information Foundation group. Buckles is also a member of the Sedona Conference and the EDRM Committees.
Read These Next
Major Differences In UK, U.S. Copyright Laws Image

This article highlights how copyright law in the United Kingdom differs from U.S. copyright law, and points out differences that may be crucial to entertainment and media businesses familiar with U.S law that are interested in operating in the United Kingdom or under UK law. The article also briefly addresses contrasts in UK and U.S. trademark law.

The Article 8 Opt In Image

The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.

Strategy vs. Tactics: Two Sides of a Difficult Coin Image

With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.

Legal Possession: What Does It Mean? Image

Possession of real property is a matter of physical fact. Having the right or legal entitlement to possession is not "possession," possession is "the fact of having or holding property in one's power." That power means having physical dominion and control over the property.

The Stranger to the Deed Rule Image

In 1987, a unanimous Court of Appeals reaffirmed the vitality of the "stranger to the deed" rule, which holds that if a grantor executes a deed to a grantee purporting to create an easement in a third party, the easement is invalid. Daniello v. Wagner, decided by the Second Department on November 29th, makes it clear that not all grantors (or their lawyers) have received the Court of Appeals' message, suggesting that the rule needs re-examination.