Lentis/"Data is the new oil"

"Data is the new oil!" was the title of a talk given by data scientist Clive Humby in 2006 at an Association of National Advertisers conference. In his talk, Humby claimed that raw data has to be processed just like crude oil has to be refined in order to have value.[1] Since then, the phrase has entered mass circulation to illustrate a myriad of other parallels between data and oil. Just as the Second Industrial Revolution of the late 1800s and early 1900s was fueled by oil, the digital revolution today is being powered by data.

Processing and RefinementEdit

Clive Humby initially made the analogy to demonstrate that the raw forms of data and oil needed some intermediate processing step before they become sellable products. Just like oligopolistic oil companies which almost always own their own refineries, oligopolistic data companies often perform their own in-house data processing. These oligopolistic data companies should be referred to as data-driven companies, because they completely rely on their processed data for revenue.


A 1904 political cartoon depicting Standard Oil's strategy to conquer the U.S. oil market. It appears to hint that Standard Oil is using regulatory capture of Congress and possibly the White House in order to advance its economic agenda. Overall, the connotation is negative, similar to how some view the modern data collection giants (e.g. Facebook).

Like the oil industry, the data collection industry is limited to only a few large multinational companies which control the entire market. There are countless smaller companies that contribute in some small way to the macroscale supply chain operations of data/oil, but the large companies dominate the economy in terms of market share and overall sociotechnical influence. Examples of these larger companies in the oil industry include ExxonMobil, Chevron, BP, and Shell, and examples of their analogs in the data industry include Amazon, Google, Facebook, and YouTube. The world's first billionaire,[2] John D. Rockefeller, was the CEO of the Standard Oil Company (the company that would eventually be split into companies such as ExxonMobil and Chevron), and the world's first centi-billionaire,[3] Jeff P. Bezos, is the CEO of Amazon (currently the world's largest data-driven e-commerce platform). The parallels between the two industries are clear in terms of the amount of wealth and power contained in them.

Data-Driven CompaniesEdit

The business model of these companies is to collect data from their consumers and use the data to guide internal decision-making to improve sales by better targeting their products to individual consumers' preferences. A new variant of this business model accompanies the advent of the Internet, which makes it possible to use advertising as the sole means of revenue generation. In exchange for users' data, data-driven companies can offer their services to users for free.

A schematic of the difference in workflow between data-driven and data-brokering companies that shows how each type of company generates revenue from data.

Data-Brokering CompaniesEdit

These companies profit almost exclusively by collecting and selling data to third-party companies and entities. They usually gather data by buying it, mining public records, and/or creating applications that collect user data. There are three main types: 1) companies that create an online marketplace where individual users can pay for information about other individuals (e.g. PeopleFinders, White Pages), 2) companies that sell data to entities that will use the data for marketing decisions (e.g. Acxiom and Cambridge Analytica), and 3) companies that sell data to entities using it for risk mitigation and identity verification (e.g. ID Analytics).[4]

Leaks and SpillsEdit

Oil spills cause lasting damage, especially on marine ecosystems, and usually result in massive backlash from environmentalists and erosion of public trust in oil companies.[5] Some notable examples in the United States include the 1989 Exxon Valdez and 2010 Deepwater Horizon oil spills. Similar to these spills, data leaks result in backlash from privacy advocates and erosion of public trust in the companies collecting consumer data.[6]

Major Data LeaksEdit

Countless data breaches have resulted in the non-consensual publishing of consumer personal data. Reasons for breaches include hacking, accidental uploading, and intentional leaking. Breaches affect both the private and public sectors and encompass a wide swath of data sets ranging from electronic medical records to location data to friends lists on social networking sites.

Facebook and Cambridge AnalyticaEdit

Aleksandr Kogan, a researcher at Cambridge University, claimed to be conducting research and received Facebook approval for a personality quiz that thousands of Facebook users installed on their accounts. As many as 87 million Facebook users had their data given to Cambridge Analytica, a political data-brokering company hired by Donald Trump’s 2016 presidential campaign.[7][8] Facebook's "terms of service" allows researchers to use the data for academic purposes. However, selling data is prohibited.[9] Some claim that this scandal should prompt researchers to be more careful in how they distribute personal data. The principle of beneficence states that the researchers should place the well-being of the participants above all.[10]


LocationSmart is a company that works with U.S. wireless carriers to sell people's location data.[11] Up until 2018, any LocationSmart user was able to find the real-time location of any phone in the United States. An anonymous hacker also breached the website of LocationSmart client Securus and accessed confidential law enforcement information.[12]


In the United States, domestic offshore oil drilling and oil pipeline transport are subject to regulation by the Environmental Protection Agency (EPA). Similar to how oil spills have engendered public uproar for increased regulations on oil companies, data leaks over the past twenty years have served as the impetus for data privacy regulations that are just beginning to emerge.

California Consumer Privacy ActEdit

The California Consumer Privacy Act was originally passed in June 2018 under former Governor Jerry Brown.[13] The main tenets of the law state that Californians now have the rights to 1) access what data is collected on them, 2) disallow collection of their data, and 3) delete their data.[14] The law is one of the first examples of comprehensive state legislation on consumer data protection.

“Data Dividend”Edit

Similar to how Alaskans receive a Permanent Fund dividend in part for oil drilling in their home state, some California officials (e.g. Gavin Newsom, Andrew Yang) are proposing that Californians receive a dividend for use/sale of their online data.[15][16] Many argue this proposal is impractical. Former Facebook executive Antonio Martínez contends that Amazon, Google, and Facebook do not believe they owe their consumers anything, because they provide their services for free in exchange for user data that can be used to generate advertising revenue.[17]

General Data Protection RegulationEdit

The General Data Protection Regulation is a European Union law passed in April 2016. Like California's CCPA, it mandates transparency in data collection with a focus on user control over data. It is viewed as an exemplary law in global data protection policy.[18] The Electronic Privacy Information Center (EPIC), a data privacy watchdog, contends that the U.S. needs a similar law to the GDPR or a regulatory body (e.g. Data Protection Agency).[19]

Social EffectsEdit

Both data and oil affect billions of lives on a daily basis. It is impossible to understand their technological impacts without first considering their social effects.

A Useful CommodityEdit

Both oil and data have been valuable resources for humanity. Trade and harvest data have been collected for thousands of years. One of the first examples of big data being used to drastically improve human life was in 1663 when John Graunt used mortality data to make predictions about impending bubonic plague outbreaks.[20] Governments have been collecting and utilizing data through the census to allocate funding properly and help ensure democracy.[21] Businesses realized how collected data can be used to improve their efficiency and customer experience, with Clive Humby's consulting firm Dunnhumby and British grocer Tesco being some of the pioneering companies. In the twenty-first century, as soon as data's worth became apparent, the data business exploded. Like collection of data, oil use also has a long history, dating back to the Sumerians efforts to make fire. Modern use of oil started in mid-1850s, when a significant quantity of oil was discovered in the U.S.[22] Oil’s utility began as a light and heat source, but went on to powering transportation and an economic revolution. Companies in the industry became very wealthy as a result.[23]


Google Books Ngram Viewer frequency of the phrase "data science" in the English corpus from the years 1800-2000.

Automobiles that run on petroleum are the primary modes of transportation in most countries. The modern economy, especially with the advent of e-commerce, is dependent on the shipping industry, and thus, automobiles. Many plastics are petroleum-based, and plastics lie at the heart of packaging and retail. Our society is reliant on oil today, and some contend that will continue,[24] but a new dependence may be emerging: data. According to Google's Ngram Viewer, the phrase "data science" entered the vernacular somewhere during the early 1900s.[25] Internal data helps companies optimize supply chain operations, and consumer data helps companies make decisions to improve customer experience. In other words, data collection does not always have to be negative or privacy-invading. Despite myriad opinions regarding acceptable data collection practices, data collection lies at the heart of our digital economy.


We can surmise that data collection rose concurrently with the Internet, as one notices a steep climb in the "data science" Ngram curve around 1990.[25] One may hypothesize this was the result of the Internet's streamlining of data collection. At the touch of a button, it is now possible to collect data across different locations, socioeconomic strata, and cultures. More diverse social groups have their data represented than ever before; data may act as a great equalizer. Similarly, with cheap oil and refining methods came the freedom for many more people to drive.

Accuracy of the AnalogyEdit

Overall, data has both similarities and difference with oil, but the phrase "data is the new oil" exerts a social effect of its own. It communicates data's power and lucrativeness in the Digital Age. It also hearkens back to the troubled history of oil, which is perhaps appropriate, as we presently grapple with the privacy threats of data collection. The phrase teaches us that words have power, since the phrase itself has dissenters and disciples. We also witness that emergent technologies, especially disruptive ones like the Internet and mass data collection, raise many unanswered economic, regulatory, and ethical questions. Data science is still in its infancy, but we already see data's massive impact on our modern world.


  1. Humby, C.; Palmer, M. (2006, November 3). Data is the New Oil. https://ana.blogs.com/maestros/2006/11/data_is_the_new.html (accessed December 9, 2019).
  2. Simpson, S. (2019, November 9). Who Will Be The World's First Trillionaire?. https://www.investopedia.com/financial-edge/0211/the-first-trillionaire.aspx (accessed December 9, 2019).
  3. Au-Yeung, A. (2019, October 24). Jeff Bezos Is No Longer The Richest Person In The World After Amazon Stock Plunges. https://www.forbes.com/sites/angelauyeung/2019/10/24/jeff-bezos-is-no-longer-the-richest-person-in-the-world/#4dfb029b67ae (accessed December 9, 2019).
  4. Pasternack, A.; & Meldenez, S. (2019, May 28). Here are the data brokers quietly buying and selling your personal information. https://www.fastcompany.com/90310803/here-are-the-data-brokers-quietly-buying-and-selling-your-personal-information (accessed December 2, 2019).
  5. Walsh, B. (2010, July 7). The Oil Spill and the Perils of Losing Trust. http://science.time.com/2010/07/07/the-oil-spill-and-the-perils-of-losing-trust/ (accessed December 10, 2019).
  6. Weisbaum, H. (2018, April 18). Trust in Facebook has dropped by 66 percent since the Cambridge Analytica scandal. https://www.nbcnews.com/business/consumer/trust-facebook-has-dropped-51-percent-cambridge-analytica-scandal-n867011 (accessed December 9, 2019).
  7. Meyer, R. (2018, October 26). The Cambridge Analytica Scandal, in 3 Quick Paragraphs. https://www.theatlantic.com/technology/archive/2018/03/the-cambridge-analytica-scandal-in-three-paragraphs/556046/ (accessed November 30, 2019).
  8. Granville, K. (2018, March 19). Facebook and Cambridge Analytica: What You Need to Know as Fallout Widens. https://www.nytimes.com/2018/03/19/technology/facebook-cambridge-analytica-explained.html (accessed November 30, 2019).
  9. What Are Terms of Service: Everything You Need to Know. (n.d.). https://www.upcounsel.com/what-are-terms-of-service (accessed November 30, 2019).
  10. Nature Editorials. (2018, March 27). Cambridge Analytica controversy must spur researchers to update data ethics. https://www.nature.com/articles/d41586-018-03856-4 (accessed December 10, 2019).
  11. Oremus, W. (2018, May 21). The Privacy Scandal That Should Be Bigger Than Cambridge Analytica. https://slate.com/technology/2018/05/the-locationsmart-scandal-is-bigger-than-cambridge-analytica-heres-why-no-one-is-talking-about-it.html (accessed November 30, 2019).
  12. The critical security crisis nobody's talking about. (2018, May 22). https://nordvpn.com/blog/securus-locationsmart-phone-tracking/ (accessed November 30, 2019).
  13. California State Legislature. (2018). California Consumer Privacy Act of 2018. https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180AB375 (accessed November 28, 2019).
  14. Californians for Consumer Privacy. (2019). About the California Consumer Privacy Act. https://www.caprivacy.org/about (accessed November 28, 2019).
  15. Clifford, C. (2019). Andrew Yang: You should get a check in the mail from Facebook, Amazon, Google for your data. https://www.cnbc.com/2019/10/17/andrew-yang-facebook-amazon-google-should-pay-for-users-data.html (accessed November 28, 2019).
  16. Daniels, J. (2019). California governor proposes ‘new data dividend’ that could call on Facebook and Google to pay users. https://www.cnbc.com/2019/02/12/california-gov-newsom-calls-for-new-data-dividend-for-consumers.html (accessed November 28, 2019).
  17. Martínez, A. (2019). No, Data Is Not the New Oil. https://www.wired.com/story/no-data-is-not-the-new-oil/ (accessed November 28, 2019).
  18. European Union. (2016). General Data Protection Regulation (GDPR). https://gdpr-info.eu/ (accessed November 28, 2019).
  19. EPIC. (2019). Data Protection Agency. https://epic.org/dpa/ (accessed December 9, 2019).
  20. Morabia, A. (2013). Epidemiology's 350th Anniversary: 1662-2012. Epidemiology (Cambridge, Mass.), 24(2), 179–183. doi:10.1097/EDE.0b013e31827b5359
  21. Barazesh, S. (2019, July 7). Probing Question: Why is the census important? Penn State News. https://news.psu.edu/story/141197/2009/07/27/research/probing-question-why-census-important
  22. Business and Research Economic Advisor. (2006). The Oil & Gas Industry. https://www.loc.gov/rr/business/BERA/issue5/history.html.
  23. History.com Editors. (2010, April 8). Oil Industry. https://www.history.com/topics/industrial-revolution/oil-industry.
  24. Clemente, J. (2015). Three Reasons Oil Will Continue to Run the World. https://www.forbes.com/sites/judeclemente/2015/04/19/three-reasons-oil-will-continue-to-run-the-world/#793fb55843f9 (accessed December 9, 2019).
  25. a b Google Books Ngram Viewer. data science. https://books.google.com/ngrams/graph?content=data+science&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cdata%20science%3B%2Cc0#t1%3B%2Cdata%20science%3B%2Cc0 (accessed December 9, 2019).