Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

The Challenge of e-Discovery Search

By Tom Gelbmann, Venkat Rangan and Karen Williams
October 28, 2009

To many, the mention of search in the context of finding electronic records in response to litigation conjures up thoughts of legal research or searching the Web. True, you would not look for responsive documents in those places, but it is tempting to use the same search techniques for locating electronic evidence. Actually, constructing searches for finding electronic evidence is a lot harder. Many factors contribute to the complexity of e-discovery search.

e-Discovery Search Challenges

In the context of e-discovery searching, the challenges are considerable:

  • Locating a complete set of results, not just the top few, and focusing on the most relevant documents first;
  • Providing a defensible trail, documenting and reporting all searches;
  • Accounting for various files types and their metadata;
  • Effective capture of hidden information, such as information in spreadsheet comments or formulas metadata;
  • Text embellishments, such as bolding and highlighting of text, since these embellishments establish the importance of the text among the participants that are involved in a decision-making event;
  • Inclusion of word variants, jargon, and exclusive lingo for a project, team or a group of individuals;
  • Establishing document relationships, such as e-mail responses in a thread, or documents enclosed or contained in another document. This includes preserving metadata as documents are processed in the context of other threads;
  • Appropriate handling of options such as tokenization and word split characters; and
  • Effective search through multi-lingual documents.

In addition to basic searching, there are several project management and critical evidence assessment decisions that can impact the outcome of a case.

EDRM Initiative

Since its launch, the EDRM Project has focused on the goal of making an impact in the electronic discovery landscape by providing practical solutions to the community. Its past and current projects have defined a reference model for various tasks involved in electronic discovery, established standards for measurement, and specified data exchange formats such as EDRM XML.

Continuing on this theme of contributing items of significance to e-discovery practitioners, EDRM has taken on the task of simplifying and de-mystifying e-discovery search.

Search is an important aspect of e-discovery. Any large scale e-discovery undertaking has search as a significant activity, as part of Identification, Collection, Preservation, Review or Analysis. Proper recording of searches and communicating these searches from one party to another is an e-discovery requirement. In most legal cases, a requesting party specifies search requests and the producing party responds with results of an automated search. In addition to the responsive document collections, the producing party is often required to document the actual searches that were performed, the number of document hits for each search, the number of items that were produced as responsive and the number of items that were not produced because of privilege.

More and more, success of an e-discovery project depends on search being utilized and applied correctly, as well as search queries and results recorded and communicated accurately. Given this large role of search, we think there is a significant gap in standardization that EDRM can look to address.

EDRM proposes a search standardization framework and specification that covers the many aspects of a search query:

  • Vocabulary for defining various search types, and document the expected behavior. Do this for types of automated search, i.e., keyword search, fuzzy search, wildcard search, Boolean search, concept search, related words search and so on.
  • Parameters and definitions for each search type. For example, keyword search would be further characterized as stemmed or unstemmed, signifying, for example, whether all forms of a verb would be included in the search.
  • Expression language specification for Boolean search, including language terms such as and, not and or.
  • Specification of the language used for describing and communicating search definitions.
  • Specification of aspects such as the character set and language in which the keyword is encoded.

Additional candidates for standardization based on community interest include:

  • A framework for representing results;
  • A vocabulary for specifying results;
  • Identifying important aspects of results, such as term hits and document hits;
  • Identifying objects and developing an object model that can be used to describe results; and
  • Document results of searches such as those performed with and without the use of stemming, with and without using a wildcard character, with specifying the proximity of search terms, and other search specifications.

EDRM Work Product and Status

The EDRM Search Group completed a comprehensive 85-page EDRM Search Guide that covers many aspects of e-discovery search. Contributors drawn from the industry, including lawyers, litigation software and service providers, consultants, vendors and e-discovery practitioners, collaborated to produce the Search Guide. First released at Legal Tech New York 2009, this guide is available for public review and comments and forms the foundation of further work in the EDRM Search Group (see, http://edrm.net).

The EDRM Search Guide provides information for attorneys, judges and paralegals, as well as for litigation support professionals of all types. In addition to definitions and examples of each search type, the Search Guide provides a search framework for litigation and a comprehensive use case for the litigation workflow. Considerations for designing and validating a comprehensive search strategy are identified from both a workflow and a technical approach.

Also, the EDRM Technology Subgroup completed an initial version of a formal XML specification for describing an e-discovery search. This XML specification captures several aspects of e-discovery search and serves the purpose of a comprehensive way to capture the nuances of e-discovery.

Goals and Deliverables for 2009-2010

Building on the comprehensive Search Guide, the EDRM Search Working Group is focused on identifying various practical situations that call for a specific search strategy. A set of problem profiles, each listing a particular litigation type, and the search that needs to be performed is then presented.

The problem profiles are organized around particular matters, and provide information on search challenges, search objectives and search strategy. Each problem profile is a comprehensive example that illustrates the definitions and discussions provided in the Search Guide, as well as pointing back to the corresponding discussion areas in the Search Guide.

Besides the problem profiles, the technology sub-group within EDRM Search is continuing to evolve the Search XML specification. EDRM Search Group expects to finalize this by the end of the current project year (May 2009 to May 2010).

Additionally, the Search Technology subgroup is taking on the task of defining various search metrics to measure and document the performance and effectiveness of e-discovery search. Besides traditional information retrieval measures such as precision and recall, the metrics of interest such as sampling methodology and sample sizes will be provided.

The Search Technology subgroup is collaborating with the EDRM Data Set group to identify various searches and expected results as a template for e-discovery practitioners and vendors to communicate the inner workings of various searches.

Case Law

There are several important recent cases that point to failed and incomplete searches contributing to adverse sanctions and motions. The most notable cases related to a problematic approach to e-discovery search include:

  • Seroquel Products Liability Litigation Case No. 6;
  • AIU Ins. Co. v. TIG Ins. Co.;
  • Peskoff v. Faber ;
  • Victor Stanley, Inc. v. Creative Pipe, Inc. (250 F.R.D. 251 (D. Md. 2008) (Judge Grimm));
  • S.E.C. v. Collins & Aikman Corp. (2009 WL 94311 (S.D.N.Y. Jan. 13, 2009)); and
  • William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Company.

These cases involved a wide variety of serious issues that include an overly narrow or incomplete selection of custodians and repositories, inadequate treatment of a data format, difficult to explain and understand de-duplication algorithms, poor sampling and quality control measures, executing searches on document titles and not content, and incomplete documentation of keywords and the scope of production.

Conclusion

The challenges facing legal teams related to effective search activities in electronic discovery are considerable. If not designed and executed properly, responses to these challenges can place the successful outcome of the matter in jeopardy. Guidelines and standards related to electronic discovery search processes can contribute to increased recognition of potential problems and inform alternative approaches to manage risks and increase changes of success. The mission of the EDRM Search project is to contribute to the development of guidelines and standards for electronic discovery search. The EDRM Search Guide is a first step toward fulfilling this mission by providing a framework to deliver practical approaches to address the challenges of electronic discovery search.


Tom Gelbmann is the co-founder of the Electronic Discovery Reference Model project and co-publisher of the Socha-Gelbmann Electronic Discovery Survey. A member of this newsletter's Board of Editors, Gelbmann can be reached at [email protected] or 651-483-0022. Karen Williams is a product manager for CT Summation, and is a co-leader of the EDRM Search project. She can be reached at [email protected]. Venkat Rangan is Co-Founder and Chief Technology Officer at Clearwell Systems, Inc. He is a co-leader of the EDRM Search project, and is a member of several Sedona Conference working groups and a participant of the TREC Legal Track. Venkat can be reached at [email protected].

To many, the mention of search in the context of finding electronic records in response to litigation conjures up thoughts of legal research or searching the Web. True, you would not look for responsive documents in those places, but it is tempting to use the same search techniques for locating electronic evidence. Actually, constructing searches for finding electronic evidence is a lot harder. Many factors contribute to the complexity of e-discovery search.

e-Discovery Search Challenges

In the context of e-discovery searching, the challenges are considerable:

  • Locating a complete set of results, not just the top few, and focusing on the most relevant documents first;
  • Providing a defensible trail, documenting and reporting all searches;
  • Accounting for various files types and their metadata;
  • Effective capture of hidden information, such as information in spreadsheet comments or formulas metadata;
  • Text embellishments, such as bolding and highlighting of text, since these embellishments establish the importance of the text among the participants that are involved in a decision-making event;
  • Inclusion of word variants, jargon, and exclusive lingo for a project, team or a group of individuals;
  • Establishing document relationships, such as e-mail responses in a thread, or documents enclosed or contained in another document. This includes preserving metadata as documents are processed in the context of other threads;
  • Appropriate handling of options such as tokenization and word split characters; and
  • Effective search through multi-lingual documents.

In addition to basic searching, there are several project management and critical evidence assessment decisions that can impact the outcome of a case.

EDRM Initiative

Since its launch, the EDRM Project has focused on the goal of making an impact in the electronic discovery landscape by providing practical solutions to the community. Its past and current projects have defined a reference model for various tasks involved in electronic discovery, established standards for measurement, and specified data exchange formats such as EDRM XML.

Continuing on this theme of contributing items of significance to e-discovery practitioners, EDRM has taken on the task of simplifying and de-mystifying e-discovery search.

Search is an important aspect of e-discovery. Any large scale e-discovery undertaking has search as a significant activity, as part of Identification, Collection, Preservation, Review or Analysis. Proper recording of searches and communicating these searches from one party to another is an e-discovery requirement. In most legal cases, a requesting party specifies search requests and the producing party responds with results of an automated search. In addition to the responsive document collections, the producing party is often required to document the actual searches that were performed, the number of document hits for each search, the number of items that were produced as responsive and the number of items that were not produced because of privilege.

More and more, success of an e-discovery project depends on search being utilized and applied correctly, as well as search queries and results recorded and communicated accurately. Given this large role of search, we think there is a significant gap in standardization that EDRM can look to address.

EDRM proposes a search standardization framework and specification that covers the many aspects of a search query:

  • Vocabulary for defining various search types, and document the expected behavior. Do this for types of automated search, i.e., keyword search, fuzzy search, wildcard search, Boolean search, concept search, related words search and so on.
  • Parameters and definitions for each search type. For example, keyword search would be further characterized as stemmed or unstemmed, signifying, for example, whether all forms of a verb would be included in the search.
  • Expression language specification for Boolean search, including language terms such as and, not and or.
  • Specification of the language used for describing and communicating search definitions.
  • Specification of aspects such as the character set and language in which the keyword is encoded.

Additional candidates for standardization based on community interest include:

  • A framework for representing results;
  • A vocabulary for specifying results;
  • Identifying important aspects of results, such as term hits and document hits;
  • Identifying objects and developing an object model that can be used to describe results; and
  • Document results of searches such as those performed with and without the use of stemming, with and without using a wildcard character, with specifying the proximity of search terms, and other search specifications.

EDRM Work Product and Status

The EDRM Search Group completed a comprehensive 85-page EDRM Search Guide that covers many aspects of e-discovery search. Contributors drawn from the industry, including lawyers, litigation software and service providers, consultants, vendors and e-discovery practitioners, collaborated to produce the Search Guide. First released at Legal Tech New York 2009, this guide is available for public review and comments and forms the foundation of further work in the EDRM Search Group (see, http://edrm.net).

The EDRM Search Guide provides information for attorneys, judges and paralegals, as well as for litigation support professionals of all types. In addition to definitions and examples of each search type, the Search Guide provides a search framework for litigation and a comprehensive use case for the litigation workflow. Considerations for designing and validating a comprehensive search strategy are identified from both a workflow and a technical approach.

Also, the EDRM Technology Subgroup completed an initial version of a formal XML specification for describing an e-discovery search. This XML specification captures several aspects of e-discovery search and serves the purpose of a comprehensive way to capture the nuances of e-discovery.

Goals and Deliverables for 2009-2010

Building on the comprehensive Search Guide, the EDRM Search Working Group is focused on identifying various practical situations that call for a specific search strategy. A set of problem profiles, each listing a particular litigation type, and the search that needs to be performed is then presented.

The problem profiles are organized around particular matters, and provide information on search challenges, search objectives and search strategy. Each problem profile is a comprehensive example that illustrates the definitions and discussions provided in the Search Guide, as well as pointing back to the corresponding discussion areas in the Search Guide.

Besides the problem profiles, the technology sub-group within EDRM Search is continuing to evolve the Search XML specification. EDRM Search Group expects to finalize this by the end of the current project year (May 2009 to May 2010).

Additionally, the Search Technology subgroup is taking on the task of defining various search metrics to measure and document the performance and effectiveness of e-discovery search. Besides traditional information retrieval measures such as precision and recall, the metrics of interest such as sampling methodology and sample sizes will be provided.

The Search Technology subgroup is collaborating with the EDRM Data Set group to identify various searches and expected results as a template for e-discovery practitioners and vendors to communicate the inner workings of various searches.

Case Law

There are several important recent cases that point to failed and incomplete searches contributing to adverse sanctions and motions. The most notable cases related to a problematic approach to e-discovery search include:

  • Seroquel Products Liability Litigation Case No. 6;
  • AIU Ins. Co. v. TIG Ins. Co.;
  • Peskoff v. Faber ;
  • Victor Stanley, Inc. v. Creative Pipe, Inc. (250 F.R.D. 251 (D. Md. 2008) (Judge Grimm));
  • S.E.C. v. Collins & Aikman Corp. (2009 WL 94311 (S.D.N.Y. Jan. 13, 2009)); and
  • William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Company.

These cases involved a wide variety of serious issues that include an overly narrow or incomplete selection of custodians and repositories, inadequate treatment of a data format, difficult to explain and understand de-duplication algorithms, poor sampling and quality control measures, executing searches on document titles and not content, and incomplete documentation of keywords and the scope of production.

Conclusion

The challenges facing legal teams related to effective search activities in electronic discovery are considerable. If not designed and executed properly, responses to these challenges can place the successful outcome of the matter in jeopardy. Guidelines and standards related to electronic discovery search processes can contribute to increased recognition of potential problems and inform alternative approaches to manage risks and increase changes of success. The mission of the EDRM Search project is to contribute to the development of guidelines and standards for electronic discovery search. The EDRM Search Guide is a first step toward fulfilling this mission by providing a framework to deliver practical approaches to address the challenges of electronic discovery search.


Tom Gelbmann is the co-founder of the Electronic Discovery Reference Model project and co-publisher of the Socha-Gelbmann Electronic Discovery Survey. A member of this newsletter's Board of Editors, Gelbmann can be reached at [email protected] or 651-483-0022. Karen Williams is a product manager for CT Summation, and is a co-leader of the EDRM Search project. She can be reached at [email protected]. Venkat Rangan is Co-Founder and Chief Technology Officer at Clearwell Systems, Inc. He is a co-leader of the EDRM Search project, and is a member of several Sedona Conference working groups and a participant of the TREC Legal Track. Venkat can be reached at [email protected].
Read These Next
Major Differences In UK, U.S. Copyright Laws Image

This article highlights how copyright law in the United Kingdom differs from U.S. copyright law, and points out differences that may be crucial to entertainment and media businesses familiar with U.S law that are interested in operating in the United Kingdom or under UK law. The article also briefly addresses contrasts in UK and U.S. trademark law.

The Article 8 Opt In Image

The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.

Strategy vs. Tactics: Two Sides of a Difficult Coin Image

With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.

Legal Possession: What Does It Mean? Image

Possession of real property is a matter of physical fact. Having the right or legal entitlement to possession is not "possession," possession is "the fact of having or holding property in one's power." That power means having physical dominion and control over the property.

The Stranger to the Deed Rule Image

In 1987, a unanimous Court of Appeals reaffirmed the vitality of the "stranger to the deed" rule, which holds that if a grantor executes a deed to a grantee purporting to create an easement in a third party, the easement is invalid. Daniello v. Wagner, decided by the Second Department on November 29th, makes it clear that not all grantors (or their lawyers) have received the Court of Appeals' message, suggesting that the rule needs re-examination.