Data Hoarding

Data Hoarding

Update: 2021-10-29
Share

Description

In this episode of Beneficial Intelligence, I discuss data hoarding. Gathering too much data costs money and doesn't add value.  We think we need all this data to train our AI, but hoarding data is the wrong place to start. 

Using a counterproductive metaphor, some say that "data is the new oil." That is a dangerous metaphor with no less than four problems:

  • First, data is not fungible like oil is. One barrel of oil is just as valuable as the next barrel. But one data record does not have the same value as another data record.  
  • Second, data hoarding shows diminishing returns. The value of 100 million barrels of oil is 100 times the value of 1 million barrels. But the value of 100 million transaction records is not 100 times the value of  1 million transaction records. 
  • Third, the process of refining data into valuable business insight is not repeatable. Anybody can build an oil refinery. That's just a question of money. But extracting value from data is more art than science, and even with the best data scientists, you might still not be able to extract any value from your data. 
  • Fourth, the value density in data is very low. Everything in a barrel of oil becomes a useful product. But most data records do not provide any business insight. 

Gathering data in the hope of extracting value is putting the cart in front of the horse. The right way to work with data is to start with a business goal and a hypothesis about which data might provide insight. Gather the data, run the experiment and evaluate. Don't just hoard data.

Beneficial Intelligence is a bi-weekly podcast with stories and pragmatic advice for CIOs, CTOs, and other IT leaders. To get in touch, please contact me at sten@vesterli.com

Comments 
In Channel
People Shortage

People Shortage

2021-11-2605:43

Data Hoarding

Data Hoarding

2021-10-2907:29

Monoculture

Monoculture

2021-10-1509:04

Trust, but Verify

Trust, but Verify

2021-10-0109:34

Time to Recover

Time to Recover

2021-09-1708:28

Goal Fixation

Goal Fixation

2021-09-0309:10

Narrow Focus

Narrow Focus

2021-08-2008:28

Back to the Office

Back to the Office

2021-08-0608:38

Humans and Computers

Humans and Computers

2021-07-2306:42

Competition

Competition

2021-07-0910:18

Pseudo-Security

Pseudo-Security

2021-06-2507:53

Good Enough

Good Enough

2021-06-1807:55

Unnecessary Roadblocks

Unnecessary Roadblocks

2021-06-0409:08

Expectation Management

Expectation Management

2021-05-2807:50

Gaming the Metrics

Gaming the Metrics

2021-05-0710:31

Accidental Publication

Accidental Publication

2021-04-3007:55

Irrational Optimism

Irrational Optimism

2021-04-2308:05

Risk Aversion

Risk Aversion

2021-04-1605:23

Biased Data

Biased Data

2021-04-0907:29

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Data Hoarding

Data Hoarding

Sten Vesterli