Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Technology-Assisted Review: One Size Doesn't Fit All

By Hope Swancy-Haslam
October 31, 2012

As data volumes increase year-after-year, counsel are focused on managing two key issues inherent in litigation: the cost and the time it takes to complete a large-volume document review. This article describes how leveraging technology to accelerate review, known as Technology-Assisted Review (TAR), is an effective tool for managing these issues. Further, it outlines the two key approaches to substantially accelerating review ' the artificial intelligence-based and language-based methodologies ' and discusses their relative benefits. Finally, the article recommends best practices for implementing each approach, according to case law, and details how to decide when to use each one.

Technology-Assisted Review Overview

Accelerating document review in litigation is a hot topic for a number of reasons, most notably because corporate-generated electronically stored information is growing at a rapid pace, while budgets remain unchanged. To make matters worse, the time constraints in litigation, discovery and document review are tightening from flooded, inflexible dockets. All of these factors play into a very real concern in litigation.

Historically, organizations have reviewed every document in a collection in order to minimize the perceived risk of producing privileged information or missing relevant data. But volumes are increasing such that this “linear review” is no longer a financially feasible approach.

Technology-assisted review has surfaced as a solution to these problems because of its ability to significantly decrease the time and expense of determining potential relevance of possibly millions of documents in a collection ' a process that can take up to 75% of the e-discovery budget. It often involves the interplay of humans and computers, and may use one or more technological
approaches that can include keyword search, Boolean querying, artificial intelligence, clustering, relevance ranking and/or sampling.

Despite clear benefits of TAR, apprehension persisted ' until recently. Important factors, such as the exact savings that can be provided, how the technology works, and, importantly, how competing offerings differ, remain perplexing issues for counsel as they determine the best approach to take. However, apprehension is waning due to recent opinions encouraging the use of TAR, and it is thus important for today's counsel to understand the technology and how to leverage the right alternative for the right problem.

Two Key Approaches

Today, two general TAR approaches are emerging ' one that leverages artificial intelligence to identify potentially relevant data in a document collection, and another that relies on a human's understanding of language to identify potentially relevant data. Both deliver significant savings in time and cost, but each approach has specific instances when its use is ideal.

For artificial intelligence-based TAR, two elements are most often in play: the need to arrive at quick decisions early in the litigation (assess the case early to decide whether to settle or litigate); and having enough time remaining available to read an average of 10,000 documents in order to train the system with a “seed set.” A language-based TAR approach is most appealing when transparency and insight into coding decisions are of paramount concern, when having the ability to audit reviewers in real time is important, and when an organization wants to incorporate this approach as a regular business practice.

The important point to note is that support for both approaches has been provided by recent court opinions, most notably Judge Andrew Peck's order in Da Silva Moore v. Publicis Groupe & MSL Group, No. 11 Civ. 1279 (ALC) (AJP) (S.D.N.Y. Feb. 24, 2012) and Kleen Products v. Packaging Corporation of America, Case No. 10 C 5711 (N.D. Ill. April 8, 2011).

In Da Silva, Judge Peck specifically holds that:

(Technology)-assisted review is
an acceptable way to search for
relevant ESI in appropriate cases.

This statement, given that an artificial intelligence-based approach was at issue, clearly gives us comfort in considering such an approach for expediting document review and minimizing its cost.

In Kleen, a case litigating the use of a language-based analytics workflow in document review, Judge Nan Nolan held for the producing party for a number of reasons, but specifically because their approach has been embraced by the court system for years. She specifically relies on Principle 6 of the Sedona Best Practices, Recommendations and Principles for Addressing Electronic Document Production in justifying her decision. Principle 6 directs that:

Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information. [Emphasis added.]

With this set of decisions in play, the runway for TAR is clear. Now, we must determine which approach is most appropriate for each case.

Choosing the Right Method

Generally, the makeup of your case and your data set will influence which TAR approach to take. Specific factors to consider include, but may not be limited to:

  • The estimated budget for the case;
  • The total amount in controversy;
  • The time allowed for producing responsive documents;
  • The volume of potentially-relevant data identified for document review; and
  • The need for transparency in support of your selection.

Regardless of the approach selected, particular attention must be given to The Sedona Conference Cooperation Proclamation before the approach is implemented. To emphasize this point, both Da Silva and Kleen reference the Proclamation as a key basis for their decisions. The Da Silva opinion provides:

Of course, the best approach to the use of computer-assisted coding (Technology-Assisted Review) is to follow the [Sedona Cooperation Proclamation] model. Advise opposing counsel that you plan to use computer-assisted coding and seek agreement; if you cannot, consider whether to abandon predictive coding for that case or go to the court for advance approval.

Da Silva Moore, 11 civ 1279 Slip Op., Feb. 24, 2012, at 5.

Without a showing that an agreement is in place, the ability to refute a challenge to your TAR protocol will likely be much more difficult.

Best Practices for Both Alternatives

Taking a look at an artificial intelligence-based approach first, it is important to document the following at the planning stage:

  • The parties' agreement;
  • The relative amount of ESI to be reviewed;
  • The superiority of an (artificial-intelligence based) review to the available alternatives;
  • The need for cost effectiveness and proportionality under Rule 26(b)(2)(C); and
  • The transparency of the process.

Once an agreement has been reached between the parties on this approach, the producing party should be able to address the following questions to support the results:

  • What was done to implement the agreed-upon process;
  • Why has that process produced a defensible result;
  • Were the documents used to train the system shared with opposing counsel in advance; and
  • Can a showing be made that sufficient quality control testing was done to validate the results.

Da Silva Moore, Slip Op. at 22.

Keep in mind that there can be a “blind spot” with this process in the crafting of the seed set. A few years ago, the common practice was to review as few as 500 documents in order to train the system on what to look for within the collection. In short, the system would say “show me some documents you are looking for (or parts of documents) and I'll find others like it.” With the volume of electronically stored information in litigation today comprising terabytes of data, a best practice now is to build a seed set of approximately 10,000 documents. This better ensures that all semantic patterns, which fuel the artificial intelligence by establishing associations between terms that occur in similar contexts, are captured ' thus helping protect you from under-inclusive results (missing critical data) or over-inclusive results (including too much data that isn't relevant into later rounds of review that yields greater downstream time and cost). Also, this task is of enough importance to the entire process that a senior-level attorney should be selecting these seed documents, so make sure you plan accordingly at the outset.

Contrast this with the language-based approach, which relies on human intelligence rather than artificial intelligence to drive the decision-making process. Again, this approach is often selected for cases where there are greater transparency requirements, often driven by the significance of the case to the producing party.

Overall, the focus is to assure that the relevant and potentially relevant documents are considered first. Meet with your consultant to discuss and commemorate the key issues in the case, and let them craft sample queries for document prioritization prior to review. Assign a senior attorney to apply these queries against the language in the data set to make synonym associations with other words. Based on these synonym associations, this workflow will assign the documents into the following categories and advance (or suppress) them accordingly.

  • “Highly relevant” documents:
    • 10% of the data set.
    • Immediately advanced to Second-Pass Review.
  • “Might be relevant” documents
    • 50% of the data set.
    • Reviewed in First-Pass Review.
  • “Not-relevant” documents
    • 40% of the data.
    • Suppressed.

The documents marked as “might-be relevant” should be run through an eyes-on (first-pass) review where the potential relevance is weighed for each document. Consider leveraging technology that allows you to highlight the specific language within each document that the reviewer felt made the document relevant. This allows you to audit reviewer decisions and adjust the reviewer's understanding of the mater in real time, as well as “bulk tag” other documents containing similar language and reduce the remaining document collection by up to 50%. Further, make sure you sample the documents previously marked as “not relevant” and achieve some measurable certainty that no other potentially relevant documents were left behind. With the right methodology and testing, you can achieve a level of assurance of up to 99.9% that all responsive documents have been identified.

Finally, a clear and unique benefit of the language-based approach is the reusability of the work product. Particularly for those in highly regulated or litigious industries, make sure you don't approach each new matter as if it is your first. Save your work product from previous cases and build on it to drive even greater savings into future cases.

Regardless of which approach you choose, remember that implementing review acceleration technology and managing a case from beginning to end can be a difficult process and require resources that you may not have on staff. Consider retaining a technology and legal workflow expert to help you choose the right approach, and to supplement your current on-staff strengths. This will enhance the chances that a workflow is designed to meet the specific needs of your case and optimizes the results.

Conclusion

As we have learned from the opinions spinning out of the Da Silva and Kleen matters, technology-assisted review is fast becoming a standard in document review. Of course, not all TAR approaches are appropriate for all cases ' there is no one-size-fits-all solution. Be sure to select the right methodology for each unique problem, and take proactive steps to ensure that you achieve optimal results from your selected approach.


Hope Swancy-Haslam is Director of Analytics Market Development for RenewData. She has been delivering technology solutions to the legal market since 1992, including roles as Director of Electronic Discovery Services for a regional consulting firm based in Dallas, TX, as well as similar positions with Merrill Corporation and Engenium Corporation. She can be reached at [email protected].

As data volumes increase year-after-year, counsel are focused on managing two key issues inherent in litigation: the cost and the time it takes to complete a large-volume document review. This article describes how leveraging technology to accelerate review, known as Technology-Assisted Review (TAR), is an effective tool for managing these issues. Further, it outlines the two key approaches to substantially accelerating review ' the artificial intelligence-based and language-based methodologies ' and discusses their relative benefits. Finally, the article recommends best practices for implementing each approach, according to case law, and details how to decide when to use each one.

Technology-Assisted Review Overview

Accelerating document review in litigation is a hot topic for a number of reasons, most notably because corporate-generated electronically stored information is growing at a rapid pace, while budgets remain unchanged. To make matters worse, the time constraints in litigation, discovery and document review are tightening from flooded, inflexible dockets. All of these factors play into a very real concern in litigation.

Historically, organizations have reviewed every document in a collection in order to minimize the perceived risk of producing privileged information or missing relevant data. But volumes are increasing such that this “linear review” is no longer a financially feasible approach.

Technology-assisted review has surfaced as a solution to these problems because of its ability to significantly decrease the time and expense of determining potential relevance of possibly millions of documents in a collection ' a process that can take up to 75% of the e-discovery budget. It often involves the interplay of humans and computers, and may use one or more technological
approaches that can include keyword search, Boolean querying, artificial intelligence, clustering, relevance ranking and/or sampling.

Despite clear benefits of TAR, apprehension persisted ' until recently. Important factors, such as the exact savings that can be provided, how the technology works, and, importantly, how competing offerings differ, remain perplexing issues for counsel as they determine the best approach to take. However, apprehension is waning due to recent opinions encouraging the use of TAR, and it is thus important for today's counsel to understand the technology and how to leverage the right alternative for the right problem.

Two Key Approaches

Today, two general TAR approaches are emerging ' one that leverages artificial intelligence to identify potentially relevant data in a document collection, and another that relies on a human's understanding of language to identify potentially relevant data. Both deliver significant savings in time and cost, but each approach has specific instances when its use is ideal.

For artificial intelligence-based TAR, two elements are most often in play: the need to arrive at quick decisions early in the litigation (assess the case early to decide whether to settle or litigate); and having enough time remaining available to read an average of 10,000 documents in order to train the system with a “seed set.” A language-based TAR approach is most appealing when transparency and insight into coding decisions are of paramount concern, when having the ability to audit reviewers in real time is important, and when an organization wants to incorporate this approach as a regular business practice.

The important point to note is that support for both approaches has been provided by recent court opinions, most notably Judge Andrew Peck's order in Da Silva Moore v. Publicis Groupe & MSL Group, No. 11 Civ. 1279 (ALC) (AJP) (S.D.N.Y. Feb. 24, 2012) and Kleen Products v. Packaging Corporation of America , Case No. 10 C 5711 (N.D. Ill. April 8, 2011).

In Da Silva, Judge Peck specifically holds that:

(Technology)-assisted review is
an acceptable way to search for
relevant ESI in appropriate cases.

This statement, given that an artificial intelligence-based approach was at issue, clearly gives us comfort in considering such an approach for expediting document review and minimizing its cost.

In Kleen, a case litigating the use of a language-based analytics workflow in document review, Judge Nan Nolan held for the producing party for a number of reasons, but specifically because their approach has been embraced by the court system for years. She specifically relies on Principle 6 of the Sedona Best Practices, Recommendations and Principles for Addressing Electronic Document Production in justifying her decision. Principle 6 directs that:

Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information. [Emphasis added.]

With this set of decisions in play, the runway for TAR is clear. Now, we must determine which approach is most appropriate for each case.

Choosing the Right Method

Generally, the makeup of your case and your data set will influence which TAR approach to take. Specific factors to consider include, but may not be limited to:

  • The estimated budget for the case;
  • The total amount in controversy;
  • The time allowed for producing responsive documents;
  • The volume of potentially-relevant data identified for document review; and
  • The need for transparency in support of your selection.

Regardless of the approach selected, particular attention must be given to The Sedona Conference Cooperation Proclamation before the approach is implemented. To emphasize this point, both Da Silva and Kleen reference the Proclamation as a key basis for their decisions. The Da Silva opinion provides:

Of course, the best approach to the use of computer-assisted coding (Technology-Assisted Review) is to follow the [Sedona Cooperation Proclamation] model. Advise opposing counsel that you plan to use computer-assisted coding and seek agreement; if you cannot, consider whether to abandon predictive coding for that case or go to the court for advance approval.

Da Silva Moore, 11 civ 1279 Slip Op., Feb. 24, 2012, at 5.

Without a showing that an agreement is in place, the ability to refute a challenge to your TAR protocol will likely be much more difficult.

Best Practices for Both Alternatives

Taking a look at an artificial intelligence-based approach first, it is important to document the following at the planning stage:

  • The parties' agreement;
  • The relative amount of ESI to be reviewed;
  • The superiority of an (artificial-intelligence based) review to the available alternatives;
  • The need for cost effectiveness and proportionality under Rule 26(b)(2)(C); and
  • The transparency of the process.

Once an agreement has been reached between the parties on this approach, the producing party should be able to address the following questions to support the results:

  • What was done to implement the agreed-upon process;
  • Why has that process produced a defensible result;
  • Were the documents used to train the system shared with opposing counsel in advance; and
  • Can a showing be made that sufficient quality control testing was done to validate the results.

Da Silva Moore, Slip Op. at 22.

Keep in mind that there can be a “blind spot” with this process in the crafting of the seed set. A few years ago, the common practice was to review as few as 500 documents in order to train the system on what to look for within the collection. In short, the system would say “show me some documents you are looking for (or parts of documents) and I'll find others like it.” With the volume of electronically stored information in litigation today comprising terabytes of data, a best practice now is to build a seed set of approximately 10,000 documents. This better ensures that all semantic patterns, which fuel the artificial intelligence by establishing associations between terms that occur in similar contexts, are captured ' thus helping protect you from under-inclusive results (missing critical data) or over-inclusive results (including too much data that isn't relevant into later rounds of review that yields greater downstream time and cost). Also, this task is of enough importance to the entire process that a senior-level attorney should be selecting these seed documents, so make sure you plan accordingly at the outset.

Contrast this with the language-based approach, which relies on human intelligence rather than artificial intelligence to drive the decision-making process. Again, this approach is often selected for cases where there are greater transparency requirements, often driven by the significance of the case to the producing party.

Overall, the focus is to assure that the relevant and potentially relevant documents are considered first. Meet with your consultant to discuss and commemorate the key issues in the case, and let them craft sample queries for document prioritization prior to review. Assign a senior attorney to apply these queries against the language in the data set to make synonym associations with other words. Based on these synonym associations, this workflow will assign the documents into the following categories and advance (or suppress) them accordingly.

  • “Highly relevant” documents:
    • 10% of the data set.
    • Immediately advanced to Second-Pass Review.
  • “Might be relevant” documents
    • 50% of the data set.
    • Reviewed in First-Pass Review.
  • “Not-relevant” documents
    • 40% of the data.
    • Suppressed.

The documents marked as “might-be relevant” should be run through an eyes-on (first-pass) review where the potential relevance is weighed for each document. Consider leveraging technology that allows you to highlight the specific language within each document that the reviewer felt made the document relevant. This allows you to audit reviewer decisions and adjust the reviewer's understanding of the mater in real time, as well as “bulk tag” other documents containing similar language and reduce the remaining document collection by up to 50%. Further, make sure you sample the documents previously marked as “not relevant” and achieve some measurable certainty that no other potentially relevant documents were left behind. With the right methodology and testing, you can achieve a level of assurance of up to 99.9% that all responsive documents have been identified.

Finally, a clear and unique benefit of the language-based approach is the reusability of the work product. Particularly for those in highly regulated or litigious industries, make sure you don't approach each new matter as if it is your first. Save your work product from previous cases and build on it to drive even greater savings into future cases.

Regardless of which approach you choose, remember that implementing review acceleration technology and managing a case from beginning to end can be a difficult process and require resources that you may not have on staff. Consider retaining a technology and legal workflow expert to help you choose the right approach, and to supplement your current on-staff strengths. This will enhance the chances that a workflow is designed to meet the specific needs of your case and optimizes the results.

Conclusion

As we have learned from the opinions spinning out of the Da Silva and Kleen matters, technology-assisted review is fast becoming a standard in document review. Of course, not all TAR approaches are appropriate for all cases ' there is no one-size-fits-all solution. Be sure to select the right methodology for each unique problem, and take proactive steps to ensure that you achieve optimal results from your selected approach.


Hope Swancy-Haslam is Director of Analytics Market Development for RenewData. She has been delivering technology solutions to the legal market since 1992, including roles as Director of Electronic Discovery Services for a regional consulting firm based in Dallas, TX, as well as similar positions with Merrill Corporation and Engenium Corporation. She can be reached at [email protected].
Read These Next
Major Differences In UK, U.S. Copyright Laws Image

This article highlights how copyright law in the United Kingdom differs from U.S. copyright law, and points out differences that may be crucial to entertainment and media businesses familiar with U.S law that are interested in operating in the United Kingdom or under UK law. The article also briefly addresses contrasts in UK and U.S. trademark law.

The Article 8 Opt In Image

The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.

Strategy vs. Tactics: Two Sides of a Difficult Coin Image

With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.

Legal Possession: What Does It Mean? Image

Possession of real property is a matter of physical fact. Having the right or legal entitlement to possession is not "possession," possession is "the fact of having or holding property in one's power." That power means having physical dominion and control over the property.

The Stranger to the Deed Rule Image

In 1987, a unanimous Court of Appeals reaffirmed the vitality of the "stranger to the deed" rule, which holds that if a grantor executes a deed to a grantee purporting to create an easement in a third party, the easement is invalid. Daniello v. Wagner, decided by the Second Department on November 29th, makes it clear that not all grantors (or their lawyers) have received the Court of Appeals' message, suggesting that the rule needs re-examination.