DiscoverData Engineering WeeklyDEW #133: How to Implement Write-Audit-Publish (WAP), Vector Database - Concepts and examples & Data Warehouse Testing Strategies for Better Data Quality
DEW #133: How to Implement Write-Audit-Publish (WAP), Vector Database - Concepts and examples & Data Warehouse Testing Strategies for Better Data Quality

DEW #133: How to Implement Write-Audit-Publish (WAP), Vector Database - Concepts and examples & Data Warehouse Testing Strategies for Better Data Quality

Update: 2023-07-05
Share

Description

Welcome to another episode of Data Engineering Weekly. Aswin and I select 3 to 4 articles from each edition of Data Engineering Weekly and discuss them from the author’s and our perspectives.


On DEW #133, we selected the following article




LakeFs: How to Implement Write-Audit-Publish (WAP)


I wrote extensively about the WAP pattern in my latest article, An Engineering Guide to Data Quality - A Data Contract PerspectiveSuper excited to see a complete guide on implementing the WAP pattern in Iceberg, Hudi, and of course, with LakeFs.


https://lakefs.io/blog/how-to-implement-write-audit-publish/




Jatin Solanki: Vector Database - Concepts and examples


Staying with the vector search, a new class of Vector Databases is emerging in the market to improve the semantic search experiences. The author writes an excellent introduction to vector databases and their applications.


https://blog.devgenius.io/vector-database-concepts-and-examples-f73d7e683d3e




Policy Genius: Data Warehouse Testing Strategies for Better Data Quality


Data Testing and Data Observability are widely discussed topics in Data Engineering Weekly. However, both techniques test once the transformation task is completed. Can we test SQL business logic during the development phase itself? Perhaps unit test the pipeline?


The author writes an exciting article about adopting unit testing in the data pipeline by producing sample tables during the development. We will see more tools around the unit test framework for the data pipeline soon. I don’t think testing data quality on all the PRs against the production database is not a cost-effective solution. We can do better than that, tbh.


https://medium.com/policygenius-stories/data-warehouse-testing-strategies-for-better-data-quality-d5514f6a0dc9


LakeFs: How to Implement Write-Audit-Publish (WAP)Jatin Solanki: Vector Database - Concepts and examplesPolicy Genius: Data Warehouse Testing Strategies for Better Data Quality

Comments 
In Channel
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

DEW #133: How to Implement Write-Audit-Publish (WAP), Vector Database - Concepts and examples & Data Warehouse Testing Strategies for Better Data Quality

DEW #133: How to Implement Write-Audit-Publish (WAP), Vector Database - Concepts and examples & Data Warehouse Testing Strategies for Better Data Quality

Data Engineering Weekly