DiscoverLearning Machines 101

Claim Ownership

# Learning Machines 101

Author: Richard M. Golden, Ph.D., M.S.E.E., B.S.E.E.

Subscribed: 3,395Played: 21,279Subscribe

Share

© Copyright (c) 2014-2017 by Richard M. Golden. All rights reserved.

Description

Smart machines based upon the principles of artificial intelligence and machine learning are now prevalent in our everyday life. For example, artificially intelligent systems recognize our voices, sort our pictures, make purchasing suggestions, and can automatically fly planes and drive cars. In this podcast series, we examine such questions such as: How do these devices work? Where do they come from? And how can we make them even smarter and more human-like? These are the questions that will be addressed in this podcast series!

76 Episodes

Reverse

In this 77th episode of www.learningmachines101.com , we explain the proper semantic interpretation of the Bayesian Information Criterion (BIC) and emphasize how this semantic interpretation is fundamentally different from AIC (Akaike Information Criterion) model selection methods. Briefly, BIC is used to estimate the probability of the training data given the probability model, while AIC is used to estimate out-of-sample prediction error. The probability of the training data given the model is called the “marginal likelihood”. Using the marginal likelihood, one can calculate the probability of a model given the training data and then use this analysis to support selecting the most probable model, selecting a model that minimizes expected risk, and support Bayesian model averaging. The assumptions which are required for BIC to be a valid approximation for the probability of the training data given the probability model are also discussed.

In this episode, we explain the proper semantic interpretation of the Akaike Information Criterion (AIC) and the Generalized Akaike Information Criterion (GAIC) for the purpose of picking the best model for a given set of training data. The precise semantic interpretation of these model selection criteria is provided, explicit assumptions are provided for the AIC and GAIC to be valid, and explicit formulas are provided for the AIC and GAIC so they can be used in practice. Briefly, AIC and GAIC provide a way of estimating the average prediction error of your learning machine on test data without using test data or cross-validation methods. The GAIC is also called the Takeuchi Information Criterion (TIC).

In this episode, we explore the question of what can computers do as well as what computers can’t do using the Turing Machine argument. Specifically, we discuss the computational limits of computers and raise the question of whether such limits pertain to biological brains and other non-standard computing machines. This episode is dedicated to the memory of my mom, Sandy Golden. To learn more about Turing Machines, SuperTuring Machines, Hypercomputation, and my Mom, check out: www.learningmachines101.com

In this episode we will learn how to use “rules” to represent knowledge. We discuss how this works in practice and we explain how these ideas are implemented in a special architecture called the production system. The challenges of representing knowledge using rules are also discussed. Specifically, these challenges include: issues of feature representation, having an adequate number of rules, obtaining rules that are not inconsistent, and having rules that handle special cases and situations. To learn more, visit: www.learningmachines101.com

This is a remix of the original second episode Learning Machines 101 which describes in a little more detail how the computer program that Arthur Samuel developed in 1959 learned to play checkers by itself without human intervention using a mixture of classical artificial intelligence search methods and artificial neural network learning algorithms. The podcast ends with a book review of Professor Nilsson’s book: “The Quest for Artificial Intelligence: A History of Ideas and Achievements”. For more information, check out: www.learningmachines101.com

This podcast is basically a remix of the first and second episodes of Learning Machines 101 and is intended to serve as the new introduction to the Learning Machines 101 podcast series. The search for common organizing principles which could support the foundations of machine learning and artificial intelligence is discussed and the concept of the Big Artificial Intelligence Magic Show is introduced. At the end of the podcast, the book After Digital: Computation as Done by Brains and Machines by Professor James A. Anderson is briefly reviewed. For more information, please visit: www.learningmachines101.com

In this podcast, we provide some insights into the complexity of common sense. First, we discuss the importance of building common sense into learning machines. Second, we discuss how first-order logic can be used to represent common sense knowledge. Third, we describe a large database of common sense knowledge where the knowledge is represented using first-order logic which is free for researchers in machine learning. We provide a hyperlink to this free database of common sense knowledge. Fourth, we discuss some problems of first-order logic and explain how these problems can be resolved by transforming logical rules into probabilistic rules using Markov Logic Nets. And finally, we have another book review of the book “Markov Logic: An Interface Layer for Artificial Intelligence” by Pedro Domingos and Daniel Lowd which provides further discussion of the issues in this podcast. In this book review, we cover some additional important applications of Markov Logic Nets not covered in detail in this podcast such as: object labeling, social network link analysis, information extraction, and helping support robot navigation. Finally, at the end of the podcast we provide information about a free software program which you can use to build and evaluate your own Markov Logic Net! For more information check out: www.learningmachines101.com

This 70th episode of Learning Machines 101 we discuss how to identify facial emotion expressions in images using an advanced clustering technique called Stochastic Neighborhood Embedding. We discuss the concept of recognizing facial emotions in images including applications to problems such as: improving online communication quality, identifying suspicious individuals such as terrorists using video cameras, improving lie detector tests, improving athletic performance by providing emotion feedback, and designing smart advertising which can look at the customer’s face to determine if they are bored or interested and dynamically adapt the advertising accordingly. To address this problem we review clustering algorithm methods including K-means clustering, Linear Discriminant Analysis, Spectral Clustering, and the relatively new technique of Stochastic Neighborhood Embedding (SNE) clustering. At the end of this podcast we provide a brief review of the classic machine learning text by Christopher Bishop titled “Pattern Recognition and Machine Learning”. Make sure to visit: www.learningmachines101.com to obtain free transcripts of this podcast and important supplemental reference materials!

This 69th episode of Learning Machines 101 provides a short overview of the 2017 Neural Information Processing Systems conference with a focus on the development of methods for teaching learning machines rather than simply training them on examples. In addition, a book review of the book “Deep Learning” is provided. #nips2017

This 68th episode of Learning Machines 101 discusses a broad class of unsupervised, supervised, and reinforcement machine learning algorithms which iteratively update their parameter vector by adding a perturbation based upon all of the training data. This process is repeated, making a perturbation of the parameter vector based upon all of the training data until a parameter vector is generated which exhibits improved predictive performance. The magnitude of the perturbation at each learning iteration is called the “stepsize” or “learning rate” and the identity of the perturbation vector is called the “search direction”. Simple mathematical formulas are presented based upon research from the late 1960s by Philip Wolfe and G. Zoutendijk that ensure convergence of the generated sequence of parameter vectors. These formulas may be used as the basis for the design of artificially intelligent smart automatic learning rate selection algorithms. For more information, please visit the official website: www.learningmachines101.com