Last week, I was pleased to help lead the discussion at The Cowen Group’s Leadership Breakfast in Manhattan. I’ve been spending a lot of time thinking and writing about Big Data lately, and jumped at the chance to hear what this community was thinking about it.
It was a great group of breakfasters—predominantly law firm attendees (partners, associates and directors), with a mix of in-house lawyers, consultants, and at least one journalist. The discussion was a fast ride through a landscape of emotional responses to Big Data: excitement, skepticism, curiosity, confusion, optimism, confusion, and ennui.
Just like every other discussion I have had about Big Data.
We spent a lot of time talking about what, exactly, Big Data is. The problem with this discussion is that, like most technology marketing terms, it can mean something or nothing at all.
How can a bunch of smart people having breakfast in the same room one morning be expected to define Big Data when the people who are paid to create such definitions leave us feeling . . . confused?
Here’s how Gartner defines big data:
Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
Here’s how McKinsey defines it:
‘Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective . . .
Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.
Huh? No wonder we were confused as we ate our delicious bacon and eggs.
Big Data is a squishy term, and for lawyers without a serious technology or data science background it is even squishier.
The concepts behind it are not new. However, there are new elements. One is the focus on unstructured data (e.g., documents, email messages, social media) instead of data stored in enterprise databases (the traditional focus of “Business Intelligence.”) Two is the technologies that store, manage, and process data in a way that is not just incrementally better, bigger, or faster, but that are profoundly different (new file systems; aggregating massive pools of unstructured data instead of databases; storage on cheap connected hard drives, etc.). Three is newly commercialized tools and methods for performing analysis on these pools of unstructured data (even data that you don’t own) to draw business conclusions.
There is a lot of skepticism about the third point—specifically about the ease with which truly insightful and accurate predictions can be generated from Big Data. Even Nate Silver—famous for accurately predicting the outcome of the 2012 US Presidential Election with data—cautions that even though data is growing exponentially, the “amount of useful information almost certainly isn’t.”
Big Data is many things to many people. But what is it to eDiscovery professionals?
I think there are three pieces to the Big Data discussion that are relevant for this community.
- Is Data Good or Bad? In the world of Big Data, all data is good and more data is better. A well-known data scientist was recently quoted in the New York Times as saying, “Storing things is cheap. I’ve tended to take the attitude, ‘Don’t throw electronic things away.” To a data scientist this makes sense. After all, statistical analysis gets better with more data. However, eDiscovery professionals know that storage is not cheap when its full potential lifecycle is calculated, such as a company spending “$900,000 to produce an amount of data that would consume less than one-quarter of the available capacity of an ordinary DVD.” Data itself is of course neither good nor bad, but eDiscovery professionals need to help Big Data proponents understand that data most definitely can have a downside. I wrote about this tension extensively here.
- Data Analytics for eDiscovery. Though not often talked about, I believe there is serious potential for some parties in the eDiscovery process to analyze the data flowing through its process and to monetize that analysis. What correlations could a smart data scientist investigate between the nature of the data collected and produced across multiple cases and their outcomes and costs. Could useful predictions be made? Could eDiscovery processes be improved and routinized? I have some ideas, but no firm answers. We should dig into this further as a community.
- Privacy and Accessibility. What does “readily available” mean in our age—an age where a huge chunk of all human knowledge can be accessed in seconds using a device you carry around in your pocket? Does better access to information simply offer speed and convenience, or does it offer something more profound? When a local newspaper posted the names and addresses of gun permit holders on an interactive map in the wake of the Sandy Hook Elementary School shooting, there was a huge outcry—despite the fact that this information is publicly available, by law. This is a critical emerging issue as the pressure to consolidate and mine unstructured information to gain business insight collides with expectations of privacy and confidentiality.
Simply put legal and eDiscovery professionals need to be at the table when Big Data discussions are happening. YOU bring a critical perspective that no one else offers.
Barclay Blair is a consultant to Fortune 500 companies, software and hardware vendors, and government institutions, and is an author, speaker, and internationally recognized authority on a broad range of policy, compliance, and management issues related to information governance and information technology. Barclay has led several high-profile consulting engagements at the world’s leading institutions to help them globally transform the way they manage information.
ViaLumina has a range of additional consultants and project managers with deep experience in the IT, Legal, Records Management, and Business aspectsof information governance.
For more information about attending or sponsoring a TCG Event, contact The Cowen Group at 212-661-0025 or firstname.lastname@example.org.