Last week, I was pleased to help lead the dis­cus­sion at The Cowen Group’s Lead­er­ship Break­fast in Man­hat­tan. I’ve been spend­ing a lot of time think­ing and writ­ing about Big Data lately, and jumped at the chance to hear what this com­mu­nity was think­ing about it.

It was a great group of break­fasters—pre­dom­i­nantly law firm at­ten­dees (part­ners, as­so­ci­ates and di­rec­tors), with a mix of in-house lawyers, con­sul­tants, and at least one jour­nal­ist. The dis­cus­sion was a fast ride through a land­scape of emo­tional re­sponses to Big Data: ex­cite­ment, skep­ti­cism, cu­rios­ity, con­fu­sion, op­ti­mism, con­fu­sion, and ennui.

Just like every other dis­cus­sion I have had about Big Data.

We spent a lot of time talk­ing about what, ex­actly, Big Data is. The prob­lem with this dis­cus­sion is that, like most tech­nol­ogy mar­ket­ing terms, it can mean some­thing or noth­ing at all.

How can a bunch of smart peo­ple hav­ing break­fast in the same room one morn­ing be ex­pected to de­fine Big Data when the peo­ple who are paid to cre­ate such de­f­i­n­i­tions leave us feel­ing . . . con­fused?

Here’s how Gart­ner de­fines big data:

Big data is high-vol­ume, high-ve­loc­ity and high-va­ri­ety in­for­ma­tion as­sets that de­mand cost-ef­fec­tive, in­no­v­a­tive forms of in­for­ma­tion pro­cess­ing for en­hanced in­sight and de­ci­sion mak­ing.

Here’s how McK­in­sey de­fines it:

‘Big data’ refers to datasets whose size is be­yond the abil­ity of typ­i­cal data­base soft­ware tools to cap­ture, store, man­age, and an­a­lyze. This de­f­i­n­i­tion is in­ten­tion­ally sub­jec­tive . . .


Big Data is the fron­tier of a firm’s abil­ity to store, process, and ac­cess (SPA) all the data it needs to op­er­ate ef­fec­tively, make de­ci­sions, re­duce risks, and serve cus­tomers.

Huh? No won­der we were con­fused as we ate our de­li­cious bacon and eggs.

Big Data is a squishy term, and for lawyers with­out a se­ri­ous tech­nol­ogy or data sci­ence back­ground it is even squishier.

The con­cepts be­hind it are not new. How­ever, there are new el­e­ments. One is the focus on un­struc­tured data (e.g., doc­u­ments, email mes­sages, so­cial media) in­stead of data stored in en­ter­prise data­bases (the tra­di­tional focus of “Busi­ness In­tel­li­gence.”) Two is the tech­nolo­gies that store, man­age, and process data in a way that is not just in­cre­men­tally bet­ter, big­ger, or faster, but that are pro­foundly dif­fer­ent (new file sys­tems; ag­gre­gat­ing mas­sive pools of un­struc­tured data in­stead of data­bases; stor­age on cheap con­nected hard dri­ves, etc.). Three is newly com­mer­cial­ized tools and meth­ods for per­form­ing analy­sis on these pools of un­struc­tured data (even data that you don’t own) to draw busi­ness con­clu­sions.

There is a lot of skep­ti­cism about the third point—specif­i­cally about the ease with which truly in­sight­ful and ac­cu­rate pre­dic­tions can be gen­er­ated from Big Data. Even Nate Sil­ver—fa­mous for ac­cu­rately pre­dict­ing the out­come of the 2012 US Pres­i­den­tial Elec­tion with data—cau­tions that even though data is grow­ing ex­po­nen­tially, the “amount of use­ful in­for­ma­tion al­most cer­tainly isn’t.”

Big Data is many things to many peo­ple. But what is it to eDis­cov­ery pro­fes­sion­als?

I think there are three pieces to the Big Data dis­cus­sion that are rel­e­vant for this com­mu­nity.

  1. Is Data Good or Bad? In the world of Big Data, all data is good and more data is bet­ter. A well-known data sci­en­tist was re­cently quoted in the New York Times as say­ing, “Stor­ing things is cheap. I’ve tended to take the at­ti­tude, ‘Don’t throw elec­tronic things away.” To a data sci­en­tist this makes sense. After all, sta­tis­ti­cal analy­sis gets bet­ter with more data. How­ever, eDis­cov­ery pro­fes­sion­als know that stor­age is not cheap when its full po­ten­tial life­cy­cle is cal­cu­lated, such as a com­pany spend­ing “$900,000 to pro­duce an amount of data that would con­sume less than one-quar­ter of the avail­able ca­pac­ity of an or­di­nary DVD.” Data it­self is of course nei­ther good nor bad, but eDis­cov­ery pro­fes­sion­als need to help Big Data pro­po­nents un­der­stand that data most def­i­nitely can have a down­side. I wrote about this ten­sion ex­ten­sively here.
  2. Data An­a­lyt­ics for eDis­cov­ery. Though not often talked about, I be­lieve there is se­ri­ous po­ten­tial for some par­ties in the eDis­cov­ery process to an­a­lyze the data flow­ing through its process and to mon­e­tize that analy­sis. What cor­re­la­tions could a smart data sci­en­tist in­ves­ti­gate be­tween the na­ture of the data col­lected and pro­duced across mul­ti­ple cases and their out­comes and costs. Could use­ful pre­dic­tions be made? Could eDis­cov­ery processes be im­proved and rou­tinized? I have some ideas, but no firm an­swers. We should dig into this fur­ther as a com­mu­nity.
  3. Pri­vacy and Ac­ces­si­bil­ity. What does “read­ily avail­able” mean in our age—an age where a huge chunk of all human knowl­edge can be ac­cessed in sec­onds using a de­vice you carry around in your pocket? Does bet­ter ac­cess to in­for­ma­tion sim­ply offer speed and con­ve­nience, or does it offer some­thing more pro­found? When a local news­pa­per posted the names and ad­dresses of gun per­mit hold­ers on an in­ter­ac­tive map in the wake of the Sandy Hook El­e­men­tary School shoot­ing, there was a huge out­cry—de­spite the fact that this in­for­ma­tion is pub­licly avail­able, by law. This is a crit­i­cal emerg­ing issue as the pres­sure to con­sol­i­date and mine un­struc­tured in­for­ma­tion to gain busi­ness in­sight col­lides with ex­pec­ta­tions of pri­vacy and con­fi­den­tial­ity.

Sim­ply put legal and eDis­cov­ery pro­fes­sion­als need to be at the table when Big Data dis­cus­sions are hap­pen­ing. YOU bring a crit­i­cal per­spec­tive that no one else of­fers.

