Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

e-Discovery Compliance: Using Technology for Keyword Transparency and Defensibility

By Todd M. Haley
November 25, 2008

As a consultant and solutions provider, I see the issues arising from keyword search use becoming more of an issue for lawyers and legal professionals each day. With the recent decision in Victor Stanley v. Creative Pipe, 2008 WL 2221841 (D. Md. May 29, 2008), corporations and law firms need to be concerned about ensuring that proper searching is done on electronically stored information (“ESI”) more than ever before. (For more details on the Victor Stanley decision, see, “The Future of Search in e-Discovery: What IT Needs to Know About the State of the Law,” in the Sept. 2008 issue of LJN's Legal Tech Newsletter, available at www.ljnonline.com/issues/ljn_legaltech/26_6/news/150932-1.html.) However, most e-discovery software today, designed for processing and/or review, was designed more for enterprise search rather than for the specific use as an electronic discovery search tool.

The Evolution of Keyword Searching

Keyword searching and its defensibility is one of the hottest topics currently existing for law firms and corporations. With Victor Stanley and other legal decisions now being handed down from the courts, the ability for lawyers to be able to analyze data quickly and efficiently is absolutely essential. In addition, lawyers need a way to sample, test and confirm their selection of keywords.

Keyword searching has evolved along with technology. Originally, keyword searching was a straightforward search of the text of documents, returning to the searcher a hit list that assisted in parsing document collections and determining relevance. Over time, keyword searching permitted the grouping of documents using various criteria, such as relevance based on keyword counts and proximity to other keywords. Even with these technological advances, keyword searching remains a tool that is largely used after the collection of data and relatively late in the discovery process. In response to courts becoming more knowledgeable about technology and requiring attorneys to become more savvy about the use of technology, keyword searching needs to move up earlier in the discovery process, preferably before the initial Federal Rules of Civil Procedure (“FCRP”) Rule 26(f) “meet and confer” session.

In preparing for the Rule 26(f) session, lawyers understand that they need to immediately identify relevant custodians, meet with relevant custodians to ensure that litigation hold requirements are being followed, make sure that all appropriate destruction processes have been suspended, and preserve potentially relevant evidence for collection. In addition, however, lawyers should also do an early case assessment on the data collection to determine exactly what they have, analyze any pitfalls within their data, and use the knowledge of their data set to determine specific production requests.

When collecting and reviewing data, the Sedona Conference Working Group on Best Practices for Document Retention and Production, Search & Retrieval Sciences Special Project Team, recommends that search terms be tested, sampled and documented to ensure better search processes. (See, The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, A Project of The Sedona Conference Working Group on Best Practices for Document Retention and Production (WG1), Search & Retrieval Sciences Special Project Team, August 2007 Public Comment Version). Many times, lawyers will determine a set of keywords based on the business models and practices of their clients, but not take the time to determine if that set of keywords is truly representative of the dataset that they have collected. There are now tools developed, and in development, that allow keywords to be tested, reviewed and extrapolated in order to provide better insight and more defensibility in the overall search process.

The State of Software

With this in mind, how does current e-discovery software manage these keyword issues? Initially, when software developers developed enterprise search, their requirements were such that they only had to allow users to enter keywords, Boolean or Bayesian phrases, and return documents found by these searches to the user. There was no need to provide the defensibility of the algorithm used and/or produce reports on how the selected set of documents compared to the specific search terms. These “black box” searches may not be sufficient now since lawyers need to be able to document and defend their search methodologies and how the software performed the searches. In addition, these straight searches have a few inherent risks, which allow a search to be under-inclusive:

  • Unknown keywords. Lawyers and other business stakeholders are not always able to come up with all of the right search words. Many times, keywords are better identified through review of the data; however, ESI is extremely large and not able to allow for a quick review to determine keyword search lists;
  • Word variations. By searching for a particular word, such
    as “earn,” lawyers and other business stakeholders do not locate other variations, such as “earnings,” “earned” or “earns”;
  • Similar keywords. Also, when searching a particular word, words that have similar meaning are not included in the search. As in the example above,
    “earn” would not find “savings,” “financials” and “pay,” which all could have a similar meaning in a particular set of data; and
  • Misspellings. In a standard search, misspellings can easily cause a relevant document to be missed. With the “black box” approach, there is no easy way to locate and identify these misspellings, such as “earnins.”

Finding a Defensible Solution

At the most recent International Legal Technology Association conference, many search solutions were showcased to try and grasp these concepts. However, in analyzing multiple products, one seemed to provide a true solution to the issues surrounding keyword defensibility, especially when looking at it through the prism of usefulness during early case assessment and preparing for the “meet and confer.” In its most recent release, Clearwell Systems unveiled “Transparent Search,” which features advanced keyword preview, keyword filters and reporting. The elements of this new search are discussed below to show how new technology can mitigate the risks associated to keyword culling and filtering.

In its advanced search form, the first component allows the user the ability to enter a keyword, such as “earn,” and then click on an icon to bring up all variations located within the dataset. This list might include the following: “earnest,” “earnestly,” “earnie,” “earned,” “earnings,” “earns” and “earnins.” Once the user pulls up the variation list, he or she is able to uncheck those variations that do not fit the specific concept that he or she is looking for. In the example above, the user may be looking for “financial earnings,” so he or she would uncheck the words “earnest,” “earnestly” and “earnie” to better target the search. In addition, “earnins” may be checked because it is an obvious misspelling. In addition, the generated keyword list would give the user an idea of what other words might be relevant in this set, thus providing additional keywords that were not originally considered. Once the search is run, the set of documents is returned and a filter is generated that shows the user how many documents met each specific variation and allows the user to filter the new data selection by any subset of those filters.

The second component of the new feature is the one that was found to be most compelling. After the search is performed, an automated report is generated that provides the exact terms searched and their specific counts, as well as the exact terms that were not searched. This report provides a very specific, defensible way to provide the opposing side with the search methodology, term lists and results. At present, there are very few search tools that provide a list of potential keywords that were not searched, a feature that allows lawyers and legal professionals to transparently provide those terms that were not searched within a dataset even though these terms did appear in the dataset.

By reviewing their own clients' data using the transparent search solution, lawyers will have a better feel for the data that they are reviewing. In using early assessment tools, lawyers can quickly analyze their data, determine the true keywords that will assist in identifying relevant data and be better prepared to ask the right questions to get the relevant data from the opposing side.

Other Considerations

Of course, no single technology solution handles every phase of e-discovery today. The reality is that several different technologies must be utilized, each deployed at different phases. While vendors continually improve their ability to integrate, there are still challenges. However, many of these challenges can be mitigated using expert third-party project management and consulting services. Below are several items to keep in mind when inserting Clearwell in the e-discovery process:

  • Integration with Data Collection Tools. Clearwell uses an indexing system to quickly index large amounts of data. This index captures all of the full text within the documents as well as a subset of the total document metadata fields found in documents. While it captures most of the relevant metadata fields, lawyers and other users should understand specifically what it does and does not search. Forensic tools, such as Encase and FTK AccessData, may still provide a better way to initially search documents and metadata during data collection, especially when working with extremely large network systems and/or data drives. Clearwell becomes useful after an initial system file filtering has been implemented and interactive human review is needed to minimize large amounts of remaining data; with the introduction of new search capabilities in Clearwell, metadata and full text searching has increased to mitigate some of the inherent concerns in previous products. Moving case data from one e-discovery phase to another should be done carefully and in a forensically sound and auditable way, especially as the potential for Daubert challenges becomes more prevalent. Expert project management can help here by mandating best practices, documenting the steps and helping mitigate risk.
  • Integration with Review Tools. Clearwell is a culling, filtering, processing and first pass review tool that quickly analyzes the keywords that are being considered and provides additional insight into the keywords being used. The new version of Clearwell now allows for export to review applications, such as Concordance. However, lawyers and legal professionals need to understand what they are trying to achieve when using such a product and consider that additional processing tools may be necessary to achieve specific load files or image formats. Transparent keyword searching allows for the data to be culled and filtered quickly and efficiently, but it is also critical to plan carefully for how the targeted documents will flow downstream from Clearwell through the rest of the electronic discovery process.
  • First-pass Review. When discussing keyword filtering with clients, many of them immediately begin talking about keywords in the context of a full review. Clearwell, to its credit, does not sell its software as a full review platform; however, while many firms try to use Clearwell in all levels of review, it may be best suited for first pass review. If the documents that are being reviewed and/or the keywords being searched are in a foreign language, the ability to translate these documents and search these documents would not be available because Clearwell is currently not Unicode compliant. Clearwell is good for an initial native review, a primary privilege review and the initial culling of documents based on keywords and concepts. Once these documents are culled down, however, lawyers and legal professionals should look at the entire spectrum of electronic review platforms, especially if items such as rules-based batch processing and management, redaction, foreign language review, advanced privilege review and advanced issue coding are needed. Though the transparent keyword functionality allows for defensible searching, culling and filtering, the leap from this keyword filtering to advanced review technologies should not be made without analyzing specific requirements.

As Clearwell evolves, the expectation is that these challenges will be addressed and their searching, culling, processing, and first pass review tool will develop into a fuller, more robust review tool. While the above challenges should be analyzed, it is best for lawyers and legal professionals to understand what these challenges are, ask questions about the specifics of the product, and enlist independent technology partners to assist in analyzing the best approach for a discovery project. In every project, the three project management variables ' time, cost and quality ' need to be considered. With an early case assessment tool and the right project management partner, all three of these components can be enhanced and the right solution determined based on the goal of a particular case.

Keywords will remain an integral part of determining discovery processes for many years to come. With new advances in technology and greater understanding of these technologies from the bench, most of the electronic discovery software will evolve to integrate sooner in the litigation lifecycle and allow for more transparency and defensibility. However, at this point, it seems that Clearwell has taken the lead in showing what is possible.


Todd M. Haley is the Vice President of E-Discovery at EPIC Legal Document Solutions (www.epiclds.com). Haley consults on e-discovery matters and his company provides e-discovery services throughout the entire EDRM model, including litigation support services. In his current position, as well as in his previous experience as the Chief Technology Officer of a litigation law firm, Haley develops strategies, protocols and project management models to help his clients use the technology available to complete filings successfully and win cases.

As a consultant and solutions provider, I see the issues arising from keyword search use becoming more of an issue for lawyers and legal professionals each day. With the recent decision in Victor Stanley v. Creative Pipe, 2008 WL 2221841 (D. Md. May 29, 2008), corporations and law firms need to be concerned about ensuring that proper searching is done on electronically stored information (“ESI”) more than ever before. (For more details on the Victor Stanley decision, see, “The Future of Search in e-Discovery: What IT Needs to Know About the State of the Law,” in the Sept. 2008 issue of LJN's Legal Tech Newsletter, available at www.ljnonline.com/issues/ljn_legaltech/26_6/news/150932-1.html.) However, most e-discovery software today, designed for processing and/or review, was designed more for enterprise search rather than for the specific use as an electronic discovery search tool.

The Evolution of Keyword Searching

Keyword searching and its defensibility is one of the hottest topics currently existing for law firms and corporations. With Victor Stanley and other legal decisions now being handed down from the courts, the ability for lawyers to be able to analyze data quickly and efficiently is absolutely essential. In addition, lawyers need a way to sample, test and confirm their selection of keywords.

Keyword searching has evolved along with technology. Originally, keyword searching was a straightforward search of the text of documents, returning to the searcher a hit list that assisted in parsing document collections and determining relevance. Over time, keyword searching permitted the grouping of documents using various criteria, such as relevance based on keyword counts and proximity to other keywords. Even with these technological advances, keyword searching remains a tool that is largely used after the collection of data and relatively late in the discovery process. In response to courts becoming more knowledgeable about technology and requiring attorneys to become more savvy about the use of technology, keyword searching needs to move up earlier in the discovery process, preferably before the initial Federal Rules of Civil Procedure (“FCRP”) Rule 26(f) “meet and confer” session.

In preparing for the Rule 26(f) session, lawyers understand that they need to immediately identify relevant custodians, meet with relevant custodians to ensure that litigation hold requirements are being followed, make sure that all appropriate destruction processes have been suspended, and preserve potentially relevant evidence for collection. In addition, however, lawyers should also do an early case assessment on the data collection to determine exactly what they have, analyze any pitfalls within their data, and use the knowledge of their data set to determine specific production requests.

When collecting and reviewing data, the Sedona Conference Working Group on Best Practices for Document Retention and Production, Search & Retrieval Sciences Special Project Team, recommends that search terms be tested, sampled and documented to ensure better search processes. (See, The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, A Project of The Sedona Conference Working Group on Best Practices for Document Retention and Production (WG1), Search & Retrieval Sciences Special Project Team, August 2007 Public Comment Version). Many times, lawyers will determine a set of keywords based on the business models and practices of their clients, but not take the time to determine if that set of keywords is truly representative of the dataset that they have collected. There are now tools developed, and in development, that allow keywords to be tested, reviewed and extrapolated in order to provide better insight and more defensibility in the overall search process.

The State of Software

With this in mind, how does current e-discovery software manage these keyword issues? Initially, when software developers developed enterprise search, their requirements were such that they only had to allow users to enter keywords, Boolean or Bayesian phrases, and return documents found by these searches to the user. There was no need to provide the defensibility of the algorithm used and/or produce reports on how the selected set of documents compared to the specific search terms. These “black box” searches may not be sufficient now since lawyers need to be able to document and defend their search methodologies and how the software performed the searches. In addition, these straight searches have a few inherent risks, which allow a search to be under-inclusive:

  • Unknown keywords. Lawyers and other business stakeholders are not always able to come up with all of the right search words. Many times, keywords are better identified through review of the data; however, ESI is extremely large and not able to allow for a quick review to determine keyword search lists;
  • Word variations. By searching for a particular word, such
    as “earn,” lawyers and other business stakeholders do not locate other variations, such as “earnings,” “earned” or “earns”;
  • Similar keywords. Also, when searching a particular word, words that have similar meaning are not included in the search. As in the example above,
    “earn” would not find “savings,” “financials” and “pay,” which all could have a similar meaning in a particular set of data; and
  • Misspellings. In a standard search, misspellings can easily cause a relevant document to be missed. With the “black box” approach, there is no easy way to locate and identify these misspellings, such as “earnins.”

Finding a Defensible Solution

At the most recent International Legal Technology Association conference, many search solutions were showcased to try and grasp these concepts. However, in analyzing multiple products, one seemed to provide a true solution to the issues surrounding keyword defensibility, especially when looking at it through the prism of usefulness during early case assessment and preparing for the “meet and confer.” In its most recent release, Clearwell Systems unveiled “Transparent Search,” which features advanced keyword preview, keyword filters and reporting. The elements of this new search are discussed below to show how new technology can mitigate the risks associated to keyword culling and filtering.

In its advanced search form, the first component allows the user the ability to enter a keyword, such as “earn,” and then click on an icon to bring up all variations located within the dataset. This list might include the following: “earnest,” “earnestly,” “earnie,” “earned,” “earnings,” “earns” and “earnins.” Once the user pulls up the variation list, he or she is able to uncheck those variations that do not fit the specific concept that he or she is looking for. In the example above, the user may be looking for “financial earnings,” so he or she would uncheck the words “earnest,” “earnestly” and “earnie” to better target the search. In addition, “earnins” may be checked because it is an obvious misspelling. In addition, the generated keyword list would give the user an idea of what other words might be relevant in this set, thus providing additional keywords that were not originally considered. Once the search is run, the set of documents is returned and a filter is generated that shows the user how many documents met each specific variation and allows the user to filter the new data selection by any subset of those filters.

The second component of the new feature is the one that was found to be most compelling. After the search is performed, an automated report is generated that provides the exact terms searched and their specific counts, as well as the exact terms that were not searched. This report provides a very specific, defensible way to provide the opposing side with the search methodology, term lists and results. At present, there are very few search tools that provide a list of potential keywords that were not searched, a feature that allows lawyers and legal professionals to transparently provide those terms that were not searched within a dataset even though these terms did appear in the dataset.

By reviewing their own clients' data using the transparent search solution, lawyers will have a better feel for the data that they are reviewing. In using early assessment tools, lawyers can quickly analyze their data, determine the true keywords that will assist in identifying relevant data and be better prepared to ask the right questions to get the relevant data from the opposing side.

Other Considerations

Of course, no single technology solution handles every phase of e-discovery today. The reality is that several different technologies must be utilized, each deployed at different phases. While vendors continually improve their ability to integrate, there are still challenges. However, many of these challenges can be mitigated using expert third-party project management and consulting services. Below are several items to keep in mind when inserting Clearwell in the e-discovery process:

  • Integration with Data Collection Tools. Clearwell uses an indexing system to quickly index large amounts of data. This index captures all of the full text within the documents as well as a subset of the total document metadata fields found in documents. While it captures most of the relevant metadata fields, lawyers and other users should understand specifically what it does and does not search. Forensic tools, such as Encase and FTK AccessData, may still provide a better way to initially search documents and metadata during data collection, especially when working with extremely large network systems and/or data drives. Clearwell becomes useful after an initial system file filtering has been implemented and interactive human review is needed to minimize large amounts of remaining data; with the introduction of new search capabilities in Clearwell, metadata and full text searching has increased to mitigate some of the inherent concerns in previous products. Moving case data from one e-discovery phase to another should be done carefully and in a forensically sound and auditable way, especially as the potential for Daubert challenges becomes more prevalent. Expert project management can help here by mandating best practices, documenting the steps and helping mitigate risk.
  • Integration with Review Tools. Clearwell is a culling, filtering, processing and first pass review tool that quickly analyzes the keywords that are being considered and provides additional insight into the keywords being used. The new version of Clearwell now allows for export to review applications, such as Concordance. However, lawyers and legal professionals need to understand what they are trying to achieve when using such a product and consider that additional processing tools may be necessary to achieve specific load files or image formats. Transparent keyword searching allows for the data to be culled and filtered quickly and efficiently, but it is also critical to plan carefully for how the targeted documents will flow downstream from Clearwell through the rest of the electronic discovery process.
  • First-pass Review. When discussing keyword filtering with clients, many of them immediately begin talking about keywords in the context of a full review. Clearwell, to its credit, does not sell its software as a full review platform; however, while many firms try to use Clearwell in all levels of review, it may be best suited for first pass review. If the documents that are being reviewed and/or the keywords being searched are in a foreign language, the ability to translate these documents and search these documents would not be available because Clearwell is currently not Unicode compliant. Clearwell is good for an initial native review, a primary privilege review and the initial culling of documents based on keywords and concepts. Once these documents are culled down, however, lawyers and legal professionals should look at the entire spectrum of electronic review platforms, especially if items such as rules-based batch processing and management, redaction, foreign language review, advanced privilege review and advanced issue coding are needed. Though the transparent keyword functionality allows for defensible searching, culling and filtering, the leap from this keyword filtering to advanced review technologies should not be made without analyzing specific requirements.

As Clearwell evolves, the expectation is that these challenges will be addressed and their searching, culling, processing, and first pass review tool will develop into a fuller, more robust review tool. While the above challenges should be analyzed, it is best for lawyers and legal professionals to understand what these challenges are, ask questions about the specifics of the product, and enlist independent technology partners to assist in analyzing the best approach for a discovery project. In every project, the three project management variables ' time, cost and quality ' need to be considered. With an early case assessment tool and the right project management partner, all three of these components can be enhanced and the right solution determined based on the goal of a particular case.

Keywords will remain an integral part of determining discovery processes for many years to come. With new advances in technology and greater understanding of these technologies from the bench, most of the electronic discovery software will evolve to integrate sooner in the litigation lifecycle and allow for more transparency and defensibility. However, at this point, it seems that Clearwell has taken the lead in showing what is possible.


Todd M. Haley is the Vice President of E-Discovery at EPIC Legal Document Solutions (www.epiclds.com). Haley consults on e-discovery matters and his company provides e-discovery services throughout the entire EDRM model, including litigation support services. In his current position, as well as in his previous experience as the Chief Technology Officer of a litigation law firm, Haley develops strategies, protocols and project management models to help his clients use the technology available to complete filings successfully and win cases.
Read These Next
Major Differences In UK, U.S. Copyright Laws Image

This article highlights how copyright law in the United Kingdom differs from U.S. copyright law, and points out differences that may be crucial to entertainment and media businesses familiar with U.S law that are interested in operating in the United Kingdom or under UK law. The article also briefly addresses contrasts in UK and U.S. trademark law.

The Article 8 Opt In Image

The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.

Strategy vs. Tactics: Two Sides of a Difficult Coin Image

With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.

Legal Possession: What Does It Mean? Image

Possession of real property is a matter of physical fact. Having the right or legal entitlement to possession is not "possession," possession is "the fact of having or holding property in one's power." That power means having physical dominion and control over the property.

The Anti-Assignment Override Provisions Image

UCC Sections 9406(d) and 9408(a) are one of the most powerful, yet least understood, sections of the Uniform Commercial Code. On their face, they appear to override anti-assignment provisions in agreements that would limit the grant of a security interest. But do these sections really work?