Listen Top Shows Blog

Acquiring and sharing high-quality data

Acquiring and sharing high-quality data

Update: 2019-07-18

Share

Description

In this episode of the Data Show, I spoke with Roger Chen, co-founder and CEO of Computable Labs, a startup focused on building tools for the creation of data networks and data exchanges. Chen has also served as co-chair of O’Reilly’s Artificial Intelligence Conference since its inception in 2016. This conversation took place the day after Chen and his collaborators released an interesting new white paper, Fair value and decentralized governance of data. Current-generation AI and machine learning technologies rely on large amounts of data, and to the extent they can use their large user bases to create “data silos,” large companies in large countries (like the U.S. and China) enjoy a competitive advantage. With that said, we are awash in articles about the dangers posed by these data silos. Privacy and security, disinformation, bias, and a lack of transparency and control are just some of the issues that have plagued the perceived owners of “data monopolies.”

In recent years, researchers and practitioners have begun building tools focused on helping organizations acquire, build, and share high-quality data. Chen and his collaborators are doing some of the most interesting work in this space, and I recommend their new white paper and accompanying open source projects.

Sequence of basic market transactions in the Computable Labs protocol. Source: Roger Chen, used with permission.

We had a great conversation spanning many topics, including:

Why he chose to focus on data governance and data markets.

The unique and fundamental challenges in accurately pricing data.

The importance of data lineage and provenance, and the approach they took in their proposed protocol.

What cooperative governance is and why it’s necessary.

How their protocol discourages an unscrupulous user from just scraping all data available in a data market.

Related resources:

Roger Chen: “Data liquidity in the age of inference”

Ihab Ilyas and Ben lorica on “The quest for high-quality data”

Chris Ré: “Software 2.0 and Snorkel”

Alex Ratner on “Creating large training data sets quickly”

Jeff Jonas on “Real-time entity resolution made accessible”

“Data collection and data markets in the age of privacy and machine learning”

Guillaume Chaslot on “The importance of transparency and user control in machine learning”

Comments

In Channel

Machine learning for operational analytics and business intelligence

Machine learning for operational analytics and business intelligence

2019-10-1051:38

Machine learning and analytics for time series data

Machine learning and analytics for time series data

2019-09-2640:31

Understanding deep neural networks

Understanding deep neural networks

2019-09-1239:31

Becoming a machine learning practitioner

Becoming a machine learning practitioner

2019-08-2933:22

Labeling, transforming, and structuring training data sets for machine learning

Labeling, transforming, and structuring training data sets for machine learning

2019-08-1540:51

Make data science more useful

Make data science more useful

2019-08-0135:04

Acquiring and sharing high-quality data

Acquiring and sharing high-quality data

2019-07-1839:20

Tools for machine learning development

Tools for machine learning development

2019-07-0339:24

Enabling end-to-end machine learning pipelines in real-world applications

Enabling end-to-end machine learning pipelines in real-world applications

2019-06-2042:53

Bringing scalable real-time analytics to the enterprise

Bringing scalable real-time analytics to the enterprise

2019-06-0937:12

Applications of data science and machine learning in financial services

Applications of data science and machine learning in financial services

2019-05-2342:32

Real-time entity resolution made accessible

Real-time entity resolution made accessible

2019-05-0927:09

Why companies are in need of data lineage solutions

Why companies are in need of data lineage solutions

2019-04-2534:29

What data scientists and data engineers can do with current generation serverless technologies

What data scientists and data engineers can do with current generation serverless technologies

2019-04-1136:32

It’s time for data scientists to collaborate with researchers in other disciplines

It’s time for data scientists to collaborate with researchers in other disciplines

2019-03-2836:08

Algorithms are shaping our lives—here’s how we wrest back control

Algorithms are shaping our lives—here’s how we wrest back control

2019-03-1444:15

Why your attention is like a piece of contested territory

Why your attention is like a piece of contested territory

2019-02-2843:05

The technical, societal, and cultural challenges that come with the rise of fake media

The technical, societal, and cultural challenges that come with the rise of fake media

2019-02-1430:53

Using machine learning and analytics to attract and retain employees

Using machine learning and analytics to attract and retain employees

2019-01-3146:54

How machine learning impacts information security

How machine learning impacts information security

2019-01-1739:49

00:00

00:00

1.0x

Acquiring and sharing high-quality data

Acquiring and sharing high-quality data

O'Reilly Media