Learning Machines 101

85 Episodes

Reverse

LM101-086: Ch8: How to Learn the Probability of Infinitely Many Outcomes

2021-07-2035:29

This 86th episode of Learning Machines 101 discusses the problem of assigning probabilities to a possibly infinite set of outcomes in a space-time continuum which characterizes our physical world. Such a set is called an “environmental event”. The machine learning algorithm uses information about the frequency of environmental events to support learning. If we want to study statistical machine learning, then we must be able to discuss how to represent and compute the probability of an environmental event. It is essential that we have methods for communicating probability concepts to other researchers, methods for calculating probabilities, and methods for calculating the expectation of specific environmental events. This episode discusses the challenges of assigning probabilities to events when we allow for the case of events comprised of an infinite number of outcomes. Along the way we introduce essential concepts for representing and computing probabilities using measure theory mathematical tools such as sigma fields, and the Radon-Nikodym probability density function. Near the end we also briefly discuss the intriguing Banach-Tarski paradox and how it motivates the development of some of these special mathematical tools. Check out: www.learningmachines101.com and www.statisticalmachinelearning.com for more information!!!

LM101-085:Ch7:How to Guarantee your Batch Learning Algorithm Converges

2021-05-2130:51

This 85th episode of Learning Machines 101 discusses formal convergence guarantees for a broad class of machine learning algorithms designed to minimize smooth non-convex objective functions using batch learning methods. In particular, a broad class of unsupervised, supervised, and reinforcement machine learning algorithms which iteratively update their parameter vector by adding a perturbation based upon all of the training data. This process is repeated, making a perturbation of the parameter vector based upon all of the training data until a parameter vector is generated which exhibits improved predictive performance. The magnitude of the perturbation at each learning iteration is called the “stepsize” or “learning rate” and the identity of the perturbation vector is called the “search direction”. Simple mathematical formulas are presented based upon research from the late 1960s by Philip Wolfe and G. Zoutendijk that ensure convergence of the generated sequence of parameter vectors. These formulas may be used as the basis for the design of artificially intelligent smart automatic learning rate selection algorithms. The material in this podcast is designed to provide an overview of Chapter 7 of my new book “Statistical Machine Learning” and is based upon material originally presented in Episode 68 of Learning Machines 101! Check out: www.learningmachines101.com for the show notes!!!

LM101-084: Ch6: How to Analyze the Behavior of Smart Dynamical Systems

2021-01-0533:13

In this episode of Learning Machines 101, we review Chapter 6 of my book “Statistical Machine Learning” which introduces methods for analyzing the behavior of machine inference algorithms and machine learning algorithms as dynamical systems. We show that when dynamical systems can be viewed as special types of optimization algorithms, the behavior of those systems even when they are highly nonlinear and high-dimensional can be analyzed. Learn more by visiting: www.learningmachines101.com and www.statisticalmachinelearning.com .

LM101-083: Ch5: How to Use Calculus to Design Learning Machines

2020-08-2934:22

This particular podcast covers the material from Chapter 5 of my new book “Statistical Machine Learning: A unified framework” which is now available! The book chapter shows how matrix calculus is very useful for the analysis and design of both linear and nonlinear learning machines with lots of examples. We discuss how to use the matrix chain rule for deriving deep learning descent algorithms and how it is relevant to software implementations of deep learning algorithms. We also discuss how matrix Taylor series expansions are relevant to machine learning algorithm design and the analysis of generalization performance!! For additional details check out: www.learningmachines101.com and www.statisticalmachinelearning.com

LM101-082: Ch4: How to Analyze and Design Linear Machines

2020-07-2329:05

The main focus of this particular episode covers the material in Chapter 4 of my new forthcoming book titled “Statistical Machine Learning: A unified framework.” Chapter 4 is titled “Linear Algebra for Machine Learning. Many important and widely used machine learning algorithms may be interpreted as linear machines and this chapter shows how to use linear algebra to analyze and design such machines. In addition, these same techniques are fundamentally important for the development of techniques for the analysis and design of nonlinear machines. This podcast provides a brief overview of Linear Algebra for Machine Learning for the general public as well as information for students and instructors regarding the contents of Chapter 4 of Statistical Machine Learning. For more details, check out: www.statisticalmachinelearning.com

LM101-081: Ch3: How to Define Machine Learning (or at Least Try)

2020-04-0937:20

This particular podcast covers the material in Chapter 3 of my new book “Statistical Machine Learning: A unified framework” with expected publication date May 2020. In this episode we discuss Chapter 3 of my new book which discusses how to formally define machine learning algorithms. Briefly, a learning machine is viewed as a dynamical system that is minimizing an objective function. In addition, the knowledge structure of the learning machine is interpreted as a preference relation graph which is implicitly specified by the objective function. In addition, this week we include in our book review section a new book titled “The Practioner’s Guide to Graph Data” by Denise Gosnell and Matthias Broecheler. To find out more information visit the website: www.learningmachines101.com .

LM101-080: Ch2: How to Represent Knowledge using Set Theory

2020-02-2931:431

This particular podcast covers the material in Chapter 2 of my new book “Statistical Machine Learning: A unified framework” with expected publication date May 2020. In this episode we discuss Chapter 2 of my new book, which discusses how to represent knowledge using set theory notation. Chapter 2 is titled “Set Theory for Concept Modeling”.

LM101-079: Ch1: How to View Learning as Risk Minimization

2019-12-2426:07

This particular podcast covers the material in Chapter 1 of my new (unpublished) book “Statistical Machine Learning: A unified framework”. In this episode we discuss Chapter 1 of my new book, which shows how supervised, unsupervised, and reinforcement learning algorithms can be viewed as special cases of a general empirical risk minimization framework. This is useful because it provides a framework for not only understanding existing algorithms but also for suggesting new algorithms for specific applications.

LM101-078: Ch0: How to Become a Machine Learning Expert

2019-10-2439:181

This particular podcast (Episode 78 of Learning Machines 101) is the initial episode in a new special series of episodes designed to provide commentary on a new book that I am in the process of writing. In this episode we discuss books, software, courses, and podcasts designed to help you become a machine learning expert! For more information, check out: www.learningmachines101.com

LM101-077: How to Choose the Best Model using BIC

2019-05-0224:15

In this 77th episode of www.learningmachines101.com , we explain the proper semantic interpretation of the Bayesian Information Criterion (BIC) and emphasize how this semantic interpretation is fundamentally different from AIC (Akaike Information Criterion) model selection methods. Briefly, BIC is used to estimate the probability of the training data given the probability model, while AIC is used to estimate out-of-sample prediction error. The probability of the training data given the model is called the “marginal likelihood”. Using the marginal likelihood, one can calculate the probability of a model given the training data and then use this analysis to support selecting the most probable model, selecting a model that minimizes expected risk, and support Bayesian model averaging. The assumptions which are required for BIC to be a valid approximation for the probability of the training data given the probability model are also discussed.

LM101-076: How to Choose the Best Model using AIC and GAIC

2019-01-2328:17

In this episode, we explain the proper semantic interpretation of the Akaike Information Criterion (AIC) and the Generalized Akaike Information Criterion (GAIC) for the purpose of picking the best model for a given set of training data. The precise semantic interpretation of these model selection criteria is provided, explicit assumptions are provided for the AIC and GAIC to be valid, and explicit formulas are provided for the AIC and GAIC so they can be used in practice. Briefly, AIC and GAIC provide a way of estimating the average prediction error of your learning machine on test data without using test data or cross-validation methods. The GAIC is also called the Takeuchi Information Criterion (TIC).

LM101-075: Can computers think? A Mathematician's Response (remix)

2018-12-1236:26

In this episode, we explore the question of what can computers do as well as what computers can’t do using the Turing Machine argument. Specifically, we discuss the computational limits of computers and raise the question of whether such limits pertain to biological brains and other non-standard computing machines. This episode is dedicated to the memory of my mom, Sandy Golden. To learn more about Turing Machines, SuperTuring Machines, Hypercomputation, and my Mom, check out: www.learningmachines101.com

LM101-074: How to Represent Knowledge using Logical Rules (remix)

2018-06-3019:22

In this episode we will learn how to use “rules” to represent knowledge. We discuss how this works in practice and we explain how these ideas are implemented in a special architecture called the production system. The challenges of representing knowledge using rules are also discussed. Specifically, these challenges include: issues of feature representation, having an adequate number of rules, obtaining rules that are not inconsistent, and having rules that handle special cases and situations. To learn more, visit: www.learningmachines101.com

LM101-073: How to Build a Machine that Learns to Play Checkers (remix)

2018-04-2524:58

This is a remix of the original second episode Learning Machines 101 which describes in a little more detail how the computer program that Arthur Samuel developed in 1959 learned to play checkers by itself without human intervention using a mixture of classical artificial intelligence search methods and artificial neural network learning algorithms. The podcast ends with a book review of Professor Nilsson’s book: “The Quest for Artificial Intelligence: A History of Ideas and Achievements”. For more information, check out: www.learningmachines101.com

LM101-072: Welcome to the Big Artificial Intelligence Magic Show! (Remix of LM101-001 and LM101-002)

2018-03-3122:07

This podcast is basically a remix of the first and second episodes of Learning Machines 101 and is intended to serve as the new introduction to the Learning Machines 101 podcast series. The search for common organizing principles which could support the foundations of machine learning and artificial intelligence is discussed and the concept of the Big Artificial Intelligence Magic Show is introduced. At the end of the podcast, the book After Digital: Computation as Done by Brains and Machines by Professor James A. Anderson is briefly reviewed. For more information, please visit: www.learningmachines101.com

LM101-071: How to Model Common Sense Knowledge using First-Order Logic and Markov Logic Nets

2018-02-2331:40

In this podcast, we provide some insights into the complexity of common sense. First, we discuss the importance of building common sense into learning machines. Second, we discuss how first-order logic can be used to represent common sense knowledge. Third, we describe a large database of common sense knowledge where the knowledge is represented using first-order logic which is free for researchers in machine learning. We provide a hyperlink to this free database of common sense knowledge. Fourth, we discuss some problems of first-order logic and explain how these problems can be resolved by transforming logical rules into probabilistic rules using Markov Logic Nets. And finally, we have another book review of the book “Markov Logic: An Interface Layer for Artificial Intelligence” by Pedro Domingos and Daniel Lowd which provides further discussion of the issues in this podcast. In this book review, we cover some additional important applications of Markov Logic Nets not covered in detail in this podcast such as: object labeling, social network link analysis, information extraction, and helping support robot navigation. Finally, at the end of the podcast we provide information about a free software program which you can use to build and evaluate your own Markov Logic Net! For more information check out: www.learningmachines101.com

LM101-070: How to Identify Facial Emotion Expressions in Images Using Stochastic Neighborhood Embedding

2018-01-3132:04

This 70th episode of Learning Machines 101 we discuss how to identify facial emotion expressions in images using an advanced clustering technique called Stochastic Neighborhood Embedding. We discuss the concept of recognizing facial emotions in images including applications to problems such as: improving online communication quality, identifying suspicious individuals such as terrorists using video cameras, improving lie detector tests, improving athletic performance by providing emotion feedback, and designing smart advertising which can look at the customer’s face to determine if they are bored or interested and dynamically adapt the advertising accordingly. To address this problem we review clustering algorithm methods including K-means clustering, Linear Discriminant Analysis, Spectral Clustering, and the relatively new technique of Stochastic Neighborhood Embedding (SNE) clustering. At the end of this podcast we provide a brief review of the classic machine learning text by Christopher Bishop titled “Pattern Recognition and Machine Learning”. Make sure to visit: www.learningmachines101.com to obtain free transcripts of this podcast and important supplemental reference materials!

LM101-069: What Happened at the 2017 Neural Information Processing Systems Conference?

2017-12-1623:20

This 69th episode of Learning Machines 101 provides a short overview of the 2017 Neural Information Processing Systems conference with a focus on the development of methods for teaching learning machines rather than simply training them on examples. In addition, a book review of the book “Deep Learning” is provided. #nips2017

LM101-068: How to Design Automatic Learning Rate Selection for Gradient Descent Type Machine Learning Algorithms

2017-09-2621:49

This 68th episode of Learning Machines 101 discusses a broad class of unsupervised, supervised, and reinforcement machine learning algorithms which iteratively update their parameter vector by adding a perturbation based upon all of the training data. This process is repeated, making a perturbation of the parameter vector based upon all of the training data until a parameter vector is generated which exhibits improved predictive performance. The magnitude of the perturbation at each learning iteration is called the “stepsize” or “learning rate” and the identity of the perturbation vector is called the “search direction”. Simple mathematical formulas are presented based upon research from the late 1960s by Philip Wolfe and G. Zoutendijk that ensure convergence of the generated sequence of parameter vectors. These formulas may be used as the basis for the design of artificially intelligent smart automatic learning rate selection algorithms. For more information, please visit the official website: www.learningmachines101.com

LM101-067: How to use Expectation Maximization to Learn Constraint Satisfaction Solutions (Rerun)

2017-08-2125:40

In this episode we discuss how to learn to solve constraint satisfaction inference problems. The goal of the inference process is to infer the most probable values for unobservable variables. These constraints, however, can be learned from experience. Specifically, the important machine learning method for handling unobservable components of the data using Expectation Maximization is introduced. Check it out at: www.learningmachines101.com

#box-pro-ellipsis-171460146019939{-webkit-line-clamp:2;}Learning Machines 101

Bhaskar Reddy