How to Make More Reliable Predictions in Machine Learning with Brian Lucena

Update: 2025-03-06

Description

In this episode, we sit down with Brian Lucena, Principal at Numeristical, and an experienced educator, consultant, and open-source contributor. Brian has advised companies of all sizes on applying modern machine learning techniques and is the creator of popular Python packages like StructureBoost, ML-Insights, and SplineCalib. He has also taught at UC-Berkeley, Brown University, and USF, bringing a unique mix of academic depth and real-world ML expertise.

Today, we explore how to make more reliable predictions in machine learning. From the dominance of gradient boosting for tabular data to the power of probabilistic regression and uncertainty quantification, Brian shares expert insights into building trustworthy ML models. We also dive into probability calibration, model drift, and best practices for ensuring model reliability in production.

Whether you're an ML engineer, data scientist, or business leader looking to improve your AI models, this episode is packed with practical takeaways you won’t want to miss.

Key Topics Covered

✅ Gradient Boosting vs. Deep Learning – Why decision trees still dominate tabular data and structured business problems.✅ Probabilistic Regression – Moving beyond point estimates to provide probability distributions and confidence intervals.✅ Uncertainty Quantification – Understanding the limits of machine learning predictions and why it matters.✅ Probability Calibration – How to ensure your model’s confidence scores are truly reliable.✅ Handling Model Drift – Strategies to maintain model performance in a changing world.✅ Real-World Use Cases – Applications in finance, healthcare, risk modeling, and business decision-making.

Resources & Tools Mentioned

Brian’s Youtube Channel: ⁠https://www.youtube.com/c/numeristical⁠
Brian’s Linkedin: https://www.linkedin.com/in/brianlucena/
🛠️ StructureBoost – Brian’s open-source package for structured categorical variables in gradient boosting:⁠⁠https://github.com/numeristical/structureboost
📦 ML-Insights – Tools for better understanding ML models: ⁠https://github.com/numeristical/introspective⁠
SplineCalib – A library for improving probability calibration: 🔗https://github.com/numeristical/splinecalib
📌 NGBoost – A gradient boosting approach for probabilistic regression: ⁠https://stanfordmlgroup.github.io/projects/ngboost/⁠
🔗 GitHub for NGBoost: ⁠https://github.com/stanfordmlgroup/ngboost⁠
📌 XGBoost – A powerful gradient boosting framework:⁠ ⁠https://github.com/dmlc/xgboost
📌 CatBoost – Gradient boosting with native support for categorical features: https://github.com/catboost/catboost
📌 LightGBM – A fast, efficient gradient boosting library: https://github.com/microsoft/LightGBM
📊 PyMC – A Bayesian probabilistic programming library for uncertainty modeling: https://github.com/pymc-devs/pymc

Memorable Quotes

💬 "Businesses don’t just want a number—they need to understand a range of possible outcomes. That’s where probabilistic regression makes all the difference."

💬 "One of the biggest challenges in real-world ML is that the world doesn’t stay the same—models can drift, and retraining isn’t always the best solution."

💬 "Gradient boosting still outperforms deep learning for structured data because it handles sharp decision boundaries better."

This episode was sponsored by:

🎤 ODSC East 2025 – The Leading AI Builders Conference –⁠ https://odsc.com/boston/⁠Join us from May 13th to 15th in Boston for hands-on workshops, training sessions, and cutting-edge AI talks covering generative AI, LLMOps, and AI-driven automation.