Listen Top Shows Blog

Workday: Detecting and Redacting Identifiers in Datasets

Workday: Detecting and Redacting Identifiers in Datasets

Update: 2025-04-07

Share

Description

The provided material centres on the critical importance of data privacy and the techniques employed for identifier redaction within datasets, specifically highlighting Workday's methodologies as detailed in their engineering blog. It examines the various categories of identifiers requiring protection, such as personal, sensitive, and financial information, and then explores Workday's sophisticated identifier detection framework, which combines machine learning, natural language processing, and custom regular expressions. The text further outlines Workday's scalable redaction tools and technologies, built upon Apache Spark and integrated with AWS S3, emphasising the use of configuration files for defining scrubbing specifications. Finally, it touches on the challenges and best practices associated with accurate redaction and looks towards future trends in data privacy and redaction technologies.

Comments

In Channel

MultiOn.ai: Autonomous Web Interaction and Industry Applications

MultiOn.ai: Autonomous Web Interaction and Industry Applications

2025-04-0421:01

Databricks for Machine Learning: An End-to-End Guide

Databricks for Machine Learning: An End-to-End Guide

2025-04-0329:49

mFlow: Python Module for ML Experimentation Workflows

mFlow: Python Module for ML Experimentation Workflows

2025-04-0223:01

Vertex AI: Google Cloud's Unified AI/ML Platform

Vertex AI: Google Cloud's Unified AI/ML Platform

2025-04-0225:36

AWS SageMaker: Machine Learning on AWS

AWS SageMaker: Machine Learning on AWS

2025-04-0232:53

CrewAI: A Overview of Multi-Agent AI Systems

CrewAI: A Overview of Multi-Agent AI Systems

2025-03-3134:12

Workday: Detecting and Redacting Identifiers in Datasets

Workday: Detecting and Redacting Identifiers in Datasets

2025-04-0727:26

Retrieval-Augmented Generation @ Workday

Retrieval-Augmented Generation @ Workday

2025-04-0720:29

GenAI Unit Cost Analysis: Workday's Measurement Approach

GenAI Unit Cost Analysis: Workday's Measurement Approach

2025-04-0718:00

Workday's LLM for Skill Inference: Analysis and Impact

Workday's LLM for Skill Inference: Analysis and Impact

2025-04-0712:15

Workday's Aviato: Platform for Efficient LLM Development

Workday's Aviato: Platform for Efficient LLM Development

2025-04-0728:21

Named Entity Recognition (NER)

Named Entity Recognition (NER)

2025-04-0315:24

Vector Databases and Large Language Models

Vector Databases and Large Language Models

2025-04-0223:24

Concept Drift in Machine Learning: Understanding and Addressing Change

Concept Drift in Machine Learning: Understanding and Addressing Change

2025-04-0222:49

Common Crawl: Archiving the Web for AI and Research

Common Crawl: Archiving the Web for AI and Research

2025-04-0121:02

Navigating the California Consumer Privacy Rights Act: Implications for SaaS and AI Providers

Navigating the California Consumer Privacy Rights Act: Implications for SaaS and AI Providers

2025-04-0330:29

ISO 42001: The Global AI Management Standard

ISO 42001: The Global AI Management Standard

2025-03-3126:25

Illinois Human Rights Act Amendments: Employment & AI

Illinois Human Rights Act Amendments: Employment & AI

2025-03-2915:21

DORA : The digital operational resilience of the financial sector legislative text from the European Union

DORA : The digital operational resilience of the financial sector legislative text from the European Union

2025-03-1331:03

AI Accountability: Impact Assessments, Audits, and Conformity

AI Accountability: Impact Assessments, Audits, and Conformity

2025-03-0519:34

00:00

00:00

x

Workday: Detecting and Redacting Identifiers in Datasets

Workday: Detecting and Redacting Identifiers in Datasets

Benjamin Alloul 🗪 🅽🅾🆃🅴🅱🅾🅾🅺🅻🅼