DataCafé

26 Episodes

Reverse

A Culture of Innovation

2022-09-0633:52

Culture is a key enabler of innovation in an organisation. Culture underpins the values that are important to people and the motivations for their behaviours. When these values and behaviours align with the goals of innovation, it can lead to high performance across teams that are tasked with the challenge of leading, inspiring and delivering innovation. Many scientists and researchers are faced with these challenges in various scenarios, yet may be unaware of the level of influence that comes from the culture they are part of.In this episode we talk about what it means to design and embed a culture of innovation. We outline some of our findings in literature about the levels of culture that may be invisible or difficult to measure. Assessing culture helps understand the ways it can empower people to experiment and take risks, and the importance this has for innovation. And where a culture is deemed to be limiting innovation, action can be taken to motivate the right culture and steer the organisation towards a better chance of success.Futher ReadingPaper: Hogan & Coote (2014) Organizational Culture, Innovation and Performance (via www.researchgate.net)Book: Johnson & Scholes (1999) Exploring Corporate Strategy: Text and Cases Article: Understanding Organisational Culture - Checklist by CMI (via www.managers.org.uk)Article: The Cultural Web (via www.mindtools.com)Paper: Mossop et al. (2013) Analysing the hidden curriculum: use of a cultural web (via www.ncbi.nlm.nih.gov)Book: Bruch & Vogel (2011) Fully Charged: How Great Leaders Boost Their Organization’s Energy and Ignite High Performance (via reading.ac.uk)Webinar: Bruch (2012) Fully Charged: How Great Leaders Boost Their Organization’s Energy and Ignite High Performance (via hbr.org)Article: Pisano (2019) The Hard Truth About Innovative Cultures (via hbr.org) Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 12 Aug 2022Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Scaling the Internet

2022-07-3045:24

Do you have multiple devices connected to your internet fighting for your bandwidth? Are you asking your children (or even neighbours!) to get off the network so you can finish an important call? Recent lockdowns caused huge network contention as everyone moved to online meetings and virtual classrooms. This is an optimisation challenge that requires advanced modelling and simulation to tackle. How can a network provider know how much bandwidth to provision to a town or a city to cope with peak demands? That's where agent-based simulations come in - to allow network designers to anticipate and then plan for high-demand events, applications and trends.In this episode of the DataCafé we hear from Dr. Lucy Gullon, AI and Optimisation Research Specialist at Applied Research, BT. She tells us about the efforts underway to assess the need for bandwidth across different households and locations, and the work they lead to model, simulate, and optimise the provision of that bandwidth across the network of the UK. We hear how planning for peak use, where, say, the nation is streaming a football match is an important consideration. At the same time, reacting to times of low throughput can help to switch off unused circuits and equipment and save a lot of energy.Interview Guest: Dr. Lucy Gullon, AI and Optimisation Research Specialist from Applied Research, BT.Further reading:BT Research and Development (https://www.bt.com/about/bt/research-and-development)Anylogic agent-based simulator (https://www.anylogic.com/use-of-simulation/agent-based-modeling/)Article: Agent-based modelling (via Wikipedia)Article:Prisoner's Dilemma (via Wikipedia)Article: Crowd Simulation (via Wikipedia)Book: Science and the City (via Bloomsbury)Research group: Traffic Modelling (via mit.edu)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 5 May 2022Interview date: 27 Apr 2022Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

[Bite] Documenting Data Science Projects

2022-06-2916:45

Do you ever find yourself wondering what the data was you used in a project? When was it obtained and where is it stored? Or even just the way to run a piece of code that produced a previous output and needs to be revisited?Chances are the answer is yes. And it’s likely you have been frustrated by not knowing how to reproduce an output or rerun a codebase or even who to talk to to obtain a refresh of the data - in some way, shape, or form. The problem that a lot of project teams face, and data scientists in particular, is the agreement and effort to document their work in a robust and reliable fashion. Documentation is a broad term and can refer to all manner of project details, from the actions captured in a team meeting to the technical guides for executing an algorithm. In this bite episode of DataCafé we discuss the challenges around documentation in data science projects (though it applies more broadly). We motivate the need for good documentation through agreement of the responsibilities, expectations, and methods of capturing notes and guides. This can be everything from a summary of the data sources and how to preprocess input data, to project plans and meeting minutes, through to technical details on the dependencies and setups for running codes. Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Landing Data Science Projects: The Art of Change Management & Implementation

2022-05-3129:59

Are people resistant to change? And if so, how do you manage that when trying to introduce and deliver innovation through Data Science?In this episode of the DataCafé we discuss the challenges faced when trying to land a data science project. There are a number of potential barriers to success that need to be carefully managed. We talk about "change management" and aspects of employee behaviours and stakeholder management that influence the chances of landing a project. This is especially important for embedding innovation in your company or organisation, and implementing a plan to sustain the changes needed to deliver long-term value.Further reading & referencesKotter's 8 Step Change Plan Armenakis, Achilles & Harris, Stanley & Mossholder, Kevin. (1993). Creating Readiness for Organizational Change. Human Relations. 46. 681-704. 10.1177/001872679304600601. Lewin, K (1944a) Constructs in Field Theory. In D Cartwright(Ed):(1952) Field Theory in Social Science: Selected Theoretical Papers by Kurt Lewin. London: Social Science Paperbacks. pp30-42Lewin, K. (1947) ‘Frontiers in Group Dynamics: Concept, Method and Reality in Social Science; Social Equilibria and Social Change’, Human Relations, 1(1), pp. 5–41. doi: 10.1177/001872674700100103.Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 10 February 2022Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

[Bite] Version Control for Data Scientists

2022-05-0515:33

Data scientists usually have to write code to prototype software, be it to preprocess and clean data, engineer features, build a model, or deploy a codebase into a production environment or other use case. The evolution of a codebase is important for a number of reasons which is where version control can help, such as:collaborating with other code developers (due diligence in coordination and delegation)generating backupsrecording versionstracking changesexperimenting and testingand working with agility.In this bite episode of the DataCafé we talk about these motivators for version control and how it can strengthen your code development and teamwork in building a data science model, pipeline or product.Further reading:Version control via Wikipedia https://en.wikipedia.org/wiki/Version_control git-scm via https://git-scm.com/"Version Control & Git" by Jason Byrne via Slideshare https://www.slideshare.net/JasonByrne6/version-control-git-86928367"Learn git" via codecademy https://www.codecademy.com/learn/learn-git"Become a git guru" via Atlassian https://www.atlassian.com/git/tutorialsGitflow workflow via Atlassian https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow"A successful git branching model" by Vincent Dressian https://nvie.com/posts/a-successful-git-branching-model/ Branching strategies via GitVersion https://gitversion.net/docs/learn/branching-strategies/Recording date: 21 April 2022Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Deep Learning Neural Networks: Building Trust and Breaking Bias

2022-04-0751:25

We explore one of the key issues around Deep Learning Neural Networks - how can you prove that your neural network will perform correctly? Especially if the neural network in question is at the heart of a mission-critical application, such as making a real-time control decision in an autonomous car. Similarly, how can you establish if you've trained your neural network at the heart of a loan decision agent with a prebuilt bias? How can you be sure that your black box is going to adapt to critical new situations?We speak with Prof. Alessio Lomuscio about how Mixed Integer Linear Programs (MILPs) and Symbolic Interval Propagation can be used to capture and solve verification problems in large Neural Networks. Prof. Lomuscio leads the Verification of Autonomous Systems Group in the Dept. of Computing at Imperial College; their results have shown that verification is feasible for models in the millions of tunable parameters, which was previously not possible. Tools like VENUS and VeriNet, developed in their lab, can verify key operational properties in Deep Learning Networks and this has a particular relevance for safety-critical applications in e.g. the aviation industry, medical imaging and autonomous transportation. Particularly importantly, given that neural networks are only as good as the training data that they have learned from, it is also possible to prove that a particular defined bias does or does not exist for a given network. This latter case is, of course, important for many social or industrial applications: being able to show that a decisioning tool treats people of all genders, ethnicities and abilities equitably.Interview GuestOur interview guest Alessio Lomuscio is Professor of Safe Artificial Intelligence in the Department of Computing at Imperial College London. Anyone wishing to contact Alessio about his team's verification technology can do so via his Imperial College website, or via the Imperial College London spin-off Safe Intelligence that will be commercialising the AI verification technology in the future.Further ReadingPublication list for Prof. Alessio Lomuscio (via Imperial College London)Paper on Formal Analysis of Neural Network-based Systems in the Aircraft Domain using the VENUS tool (via Imperial College London)Paper on Scalable Complete Verification of ReLU Neural Networks via Dependency-based Branching (via IJCAI.org)Paper on DEEPSPLIT: An Efficient Splitting Method for Neural Network Verification via Indirect Effect Analysis (via IJCAI.org)Team: Verification of Autonomous Systems Group, Department of Computing, Imperial College LondonTools: VENUS and VeriNetSome links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to inveThanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

[Bite] Wordle: Winning against the algorithm

2022-03-1411:28

The grey, green and yellow squares taking over social media in the last few weeks is an example of the fascinating field of study known as Game Theory. In this bite episode of DataCafé we talk casually about Wordle - the internet phenomenon currently challenging players to guess a new five letter word each day. Six guesses inform players what letters they have gotten right and if they are in the right place. It’s a lovely example of the different ways people approach game strategy through their choice of guesses and ways to use the information presented within the game. WordlesWordle - the originalAbsurdle - it's Wordle but it fights you!Nerdle - Maths WordleQuordle - when one Wordle is not enough!Foclach - Irish WordleAnalysisStatistical analysis of hard-mode Wordle with Matlab by Matt Tearle (youtube)The science behind Wordle by Ido Frizler (medium.com)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 15 Feb 2022Intro music by Music 4 Video Library (Patreon supporter)Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Series 2 Introduction

2022-03-1405:31

Looks like we might be about to have a new Series of DataCafé!Recording date: 15 Feb 2022Intro music by Music 4 Video Library (Patreon supporter)Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

[Bite] Why Data Science projects fail

2021-06-2119:23

Data Science in a commercial setting should be a no-brainer, right? Firstly, data is becoming ubiquitous, with gigabytes being generated and collected every second. And secondly, there are new and more powerful data science tools and algorithms being developed and published every week. Surely just bringing the two together will deliver success... In this episode, we explore why so many Data Science projects fail to live up to their initial potential. In a recent Gartner report, it is anticipated that 85% of Data Science projects will fail to deliver the value they should due to "bias in data, algorithms or the teams responsible for managing them". There are many reasons why data science projects stutter even aside from the data, the algorithms and the people. We discuss six key technical reasons why Data Science projects typically don't succeed based on our experience and one big non-technical reason!And being 'on the air' for a year now we'd like to give a big Thank You to all our brilliant guests and listeners - we really could not have done this without you! It's been great getting feedback and comments on episodes. Do get in touch jeremy@datacafe.uk or jason@datacafe.uk if you would like to tell us your experiences of successful or unsuccessful data science projects and share your ideas for future episodes.Further Reading and ResourcesArticle: "Why Big Data Science & Data Analytics Projects Fail" (https://bit.ly/3dfPzoH via Data Science Project Management) Article: "10 reasons why data science projects fail" (https://bit.ly/3gIuhSL via Fast Data Science) Press Release: "Gartner Says Nearly Half of CIOs Are Planning to Deploy Artificial Intelligence" (https://gtnr.it/2TTYDZa via Gartner) Article: "6 Reasons Why Data Science Projects Fail" (https://bit.ly/2TN3sDK via ODSC Open Data Science) Blog: "Reasons Why Data Projects Fail" (https://bit.ly/3zJrFeA via KDnuggets)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 18 June 2021Intro music by Music 4 Video Library (Patreon supporter)Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Data Science for Good

2021-05-3136:14

What's the difference between a commercial data science project and a Data Science project for social benefit? Often so-called Data Science for Good projects involve a throwing together of many people from different backgrounds under a common motivation to have a positive effect.We talk to a Data Science team that was formed to tackle the unemployment crisis that is coming out of the pandemic and help people to find excellent jobs in different industries for which they have a good skills match.We interview Erika Gravina, Rajwinder Bhatoe and Dehaja Senanayake about their story helping to create the Job Finder Machine with the Emergent Alliance, DataSparQ, Reed and Google.Further InformationProject: Job Finder Machine Project Group: Emergent Alliance and DataSparQShout out: Code First Girls for fantastic courses, mentoring and support for women in tech and data scienceSome links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Interview date: 25 March 2021Recording date: 13 May 2021Intro audio Music 4 Video Library (Patreon supporter)Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

[Bite] Data Science and the Scientific Method

2021-05-0317:21

The scientific method consists of systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses. But what does this mean in the context of Data Science, where a wealth of unstructured data and variety of computational models can be used to deduce an insight and inform a stakeholder's decision?In this bite episode we discuss the importance of the scientific method for data scientists. Data science is, after all, the application of scientific techniques and processes to large data sets to obtain impact in a given application area. So we ask how the scientific method can be harnessed efficiently and effectively when there is so much uncertainty in the design and interpretation of an experiment or model. Further Reading and ResourcesPaper: "Defining the scientific method" via Nature https://www.nature.com/articles/nmeth0409-237Paper: "Big data: the end of the scientific method" via The Royal Society https://royalsocietypublishing.org/doi/10.1098/rsta.2018.0145Article: "The Data Scientific Method" via Medium https://towardsdatascience.com/a-data-scientific-method-80caa190dbd4Article: "The scientific method of machine learning" via Datascience.aero https://datascience.aero/scientific-method-machine-learning/Article: "Putting the 'Science' Back in Data Science" via KDnuggets https://www.kdnuggets.com/2017/09/science-data-science.htmlPodcast: "In Our Time: The Scientific Method" via BBC Radio 4 https://www.bbc.co.uk/programmes/b01b1ljmPodcast: "The end of the scientific method" via The Economist https://www.economist.com/podcasts/2019/11/27/the-end-of-the-scientific-methodVideo: "The Scientific Method" via Coursera https://www.coursera.org/lecture/data-science-fundamentals-for-data-analysts/the-scientific-method-Ha5hqCartoon: "Machine Learning" via xkcd https://xkcd.com/1838/Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 30 April 2021Intro music by Music 4 Video Library (Patreon supporter)Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Data Science on Mars

2021-04-1958:36

On 30 July 2020 NASA launched the Mars 2020 mission from Earth carrying a rover called Perseverance, and rotorcraft called Ingenuity, to land on and study Mars. The mission so far has been a resounding success, touching down in Jezero Crater on 18 February 2021, and sending back data and imagery of the Martian landscape since then.The aim of the mission is to advance NASA's scientific goals of establishing if there was ever life on Mars, what its climate and geology are, and to pave the way for human exploration of the red planet in the future. Ingenuity will also demonstrate the first air flight on another world, in the low-density atmosphere of Mars approximately 1% of the density of Earth's atmosphere.The efforts involved are an impressive demonstration of the advances and expertise of the science, engineering, and project teams. Data from the mission will drive new scientific insights as well as prove the technical abilities demonstrated throughout. Of particular interest is the Terrain Relative Navigation (TRN) system that enables autonomous landing of missions on planetary bodies like Mars, being so far away that we cannot have ground communications on Earth in the loop.We talk with Prof. Paul Byrne, a planetary geologist from North Carolina State University, about the advances in planetary science and what the Mars 2020 mission means for him, his field of research, and for humankind.Further Reading and ResourcesWebsite: Profile page for Prof. Paul Byrne at the Center for Geospatial Analytics at NCSU (https://bit.ly/3gkP4vD via ncsu.edu)Website: Mars 2020 (https://mars.nasa.gov/mars2020/)Paper: Mars 2020 Science Definition Team Report (https://go.nasa.gov/3x5d6AF via nasa.gov)Video: Perseverance Rover's Descent and Touchdown on Mars (https://bit.ly/32o6248 via youtube)Website: Lunar rocks and soils from Apollo missions (https://curator.jsc.nasa.gov/lunar/)Article: Terrain Relative Navigation (https://go.nasa.gov/2RMd9RZ via nasa.gov)Paper: A General Approach to Terrain Relative Navigation for Planetary Landing (https://bit.ly/3mXCN1z via aiaa.org)Video: Terrain Relative Navigation, NASA JPL (https://bit.ly/2QCcTEB via youtube)Video: Studying Alien Worlds to Understand Earth (https://bit.ly/3tpZ1f3 via youtube)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Interview date: 25 March 2021Recording date:Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

[Bite] How to hire a great Data Scientist

2021-04-0514:16

Welcome to the first DataCafé Bite: a bite-size episode where Jason and Jeremy drop-in for a quick chat about a relevant or newsworthy topic from the world of Data Science. In this episode, we discuss how to hire a great Data Scientist, which is a challenge faced by many companies and is not easy to get right.From endless coding tests and weird logic puzzles, to personality quizzes and competency-based interviews; there are many examples of how companies try to assess how a candidate handles and reacts to data problems. We share our thoughts and experiences on ways to set yourself up for success in hiring the best person for your team or company.Have you been asked to complete a week-long data science mini-project for a company, or taken part in a data hackathon? We'd love to hear your experiences of good and bad hiring practice around Data Science. You can email us as jason at datacafe.uk or jeremy at datacafe.uk with your experiences. We'll be sure to revisit this topic as it's such a rich and changing landscape.Further ReadingArticle: Guide to hiring data Scientists (https://bit.ly/2OjnALi via kdnuggets.com)Article: Hiring a data scientist: the good the bad and the ugly! (https://bit.ly/3cMpLR5 via forbes.com)Article: How to Hire (https://bit.ly/3dCLTfO via Harvard Business Review)Podcast: How to start a startup (https://bit.ly/3sOWxGU via Y-Combinator/Stanford University)Video: Adam Grant: Hire for Culture Fit or Add? (https://bit.ly/3cNGWl3 via YouTube/Stanford eCorner)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 1 April 2021Intro music by Music 4 Video Library (Patreon supporter)Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Bayesian Inference: The Foundation of Data Science

2021-03-2342:16

In this episode we talk about all things Bayesian. What is Bayesian inference and why is it the cornerstone of Data Science?Bayesian statistics embodies the Data Scientist and their role in the data modelling process. A Data Scientist starts with an idea of how to capture a particular phenomena in a mathematical model - maybe derived from talking to experts in the company. This represents the prior belief about the model. Then the model consumes data around the problem - historical data, real-time data, it doesn't matter. This data is used to update the model and the result is called the posterior.Why is this Data Science? Because models that react to data and refine their representation of the world in response to the data they see are what the Data Scientist is all about.We talk with Dr Joseph Walmswell, Principal Data Scientist at life sciences company Abcam, about his experience with Bayesian modelling. Further ReadingPublication list for Dr. Joseph Walmswell (https://bit.ly/3s8xluH via researchgate.net)Blog on Bayesian Inference for parameter estimation (https://bit.ly/2OX46fV via towardsdatascience.com)Book Chapter on Bayesian Inference (https://bit.ly/2Pi9Ct9 via cmu.edu)Article on The Monty Hall problem (https://bit.ly/3f1pefr via Wikipedia)Podcast on "The truth about obesity and Covid-19", More or Less: Behind the Stats podcast (https://bbc.in/3lBqCGS via bbc.co.uk)Gov.uk guidance:Article on "Understanding lateral flow antigen testing for people without symptoms" (https://bit.ly/313JDs9)Article on "Households and bubbles of pupils, students and staff of schools, nurseries and colleges: get rapid lateral flow tests" (https://bit.ly/3c5ZXih)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 16 March 2021Interview date: 26 February 2021Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Apple Tasting: Reinforcement learning for quality control

2021-02-2235:27

Have you ever come home from the supermarket to discover one of the apples you bought is rotten? It's likely your trust for that grocer was diminished, or you might stop buying that particular brand of apples altogether. In this episode, we discuss how the quality controls in a production line need to use smart sampling methods in order to avoid sending bad products to the customer, which could ruin the reputation of both the brand and seller.To do this we describe a thought experiment called Apple Tasting. This allows us to demonstrate the concepts of regret and reward in a sampling process, giving rise to the use of Contextual Bandit Algorithms. Contextual Bandits come from the field of Reinforcement Learning which is a form of Machine Learning where an agent performs an action and tries to maximise the cumulative reward from its environment over time. Standard bandit algorithms simply choose between a number of actions and measure the reward in order to determine the average reward of each action. But a Contextual Bandit also uses information from its environment to inform both the likely reward and regret of subsequent actions. This is particularly useful in personalised product recommendation engines where the bandit algorithm is given some contextual information about the user.Back to Apple Tasting and product quality control. The contextual bandit in this scenario, consumes a signal from a benign test that is indicative, but not conclusive, of there being a fault and then makes the decision to perform a more in-depth test or not. So the answer for when you should discard or test your product depends on the relative costs of making the right decision (reward) or wrong decision (regret) and how your experience of the environment affected these in the past.We speak with Prof. David Leslie about how this logic can be applied to any manufacturing pipeline where there is a downside risk of not quality checking the product but a cost in a false positive detection of a bad product.Other areas of application include:Anomalous behaviour in a jet engine e.g. low fuel efficiency, which could be nothing or could be serious, so it might be worth taking the plane in for repair.Changepoints in network data time series - does it mean there’s a fault on the line or does it mean the next series of The Queen’s Gambit has just been released? Should we send an engineer out?With interview guest David Leslie, Professor of Statistical Learning in the Department of Mathematics and Statistics at Lancaster University.Further ReadingPublication list for Prof. David Leslie (http://bitly.ws/bQ4a via Lancaster University)Paper on "Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty" in Journal of the ORS (http://bitly.ws/bQ3X via Lancaster University)Paper on "Apple tasting" (http://bitly.ws/bQeW via ScienceDirect)Paper by Google Inc. on "AutoML for Contextual Bandits" (https://arxiv.org/abs/1909.03212 via arXiv)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning thThanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Optimising the Future

2021-01-0435:54

As we look ahead to a new year, and reflect on the last, we consider how data science can be used to optimise the future. But to what degree can we trust past experiences and observations, essentially relying on historical data to predict the future? And with what level of accuracy? In this episode of the DataCafé we ask: how can we optimise our predictions of future scenarios to maximise the benefit we can obtain from them while minimising the risk of unknowns?Data Science is made up of many diverse technical disciplines that can help to answer these questions. Two among them are mathematical optimisation and machine learning. We explore how these two fascinating areas interact and how they can both help to turbo charge the other's cutting edge in the future.We speak with Dimitrios Letsios from King's College London about his work in optimisation and what he sees as exciting new developments in the field by working together with the field of machine learning.With interview guest Dr. Dimitrios Letsios, lecturer (assistant professor) in the Department of Informatics at King's College London and a member of the Algorithms and Data Analysis Group.Further readingDimirios Letsios' publication list (https://bit.ly/35vHirH via King's College London)Paper on taking into account uncertainty in an optimisation model: Approximating Bounded Job Start Scheduling with Application in Royal Mail Deliveries under Uncertainty (https://bit.ly/3pLHICV via King's College London)Paper on lexicographic optimisation: Exact Lexicographic Scheduling and Approximate Rescheduling (https://bit.ly/3rS8Xxk via arXiv)Paper on combination of AI and Optimisation: Argumentation for Explainable Scheduling (https://bit.ly/3oobgGF via AAAI Conference on Artificial Intelligence)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 23 October 2020Interview date: 21 February 2020Intro music by Music 4 Video Library (Patreon supporter)Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

US Election Special

2020-11-0131:54

What exciting data science problems emerge when you try to forecast an election? Many, it turns out!We're very excited to turn our DataCafé lens on the current Presidential race in the US as an exemplar of statistical modelling right now. Typically state election polls are asking around 1000 people in a state of maybe 12 million people how they will vote (or even if they have voted already) and return a predictive result with an estimated polling error of about 4%.In this episode, we look at polling as a data science activity and discuss how issues of sampling bias can have dramatic impacts on the outcome of a given poll. Elections are a fantastic use-case for Bayesian modelling where pollsters have to tackle questions like "What's the probability that a voter in Florida will vote for President Trump, given that they are white, over 60 and college educated".There are many such questions as each electorate feature (gender, age, race, education, and so on) potentially adds another multiplicative factor to the size of demographic sample needed to get a meaningful result out of an election poll.Finally, we even hazard a quick piece of psephological analysis ourselves and show how some naive Bayes techniques can at least get a foot in the door of these complex forecasting problems. (Caveat: correlation is still very important and can be a source of error if not treated appropriately!)Further reading:Article: Ensemble Learning to Improve Machine Learning Results (https://bit.ly/34MW3HO via statsbot.co)Paper: Combining Forecasts: An Application to Elections (https://bit.ly/3efx5nm via researchgate.net)Interactive map: Explore The Ways Trump Or Biden Could Win The Election (https://53eig.ht/2TIlAvh via fivethirtyeight.com)Podcast: 538 Politics Podcast (https://53eig.ht/2HSkwCA via fivethirtyeight.com)Update US polling map: Consensus Forecast Electoral Map (https://bit.ly/2HY1FWk via 270towin.com)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 30 October 2020Intro music by Music 4 Video Library (Patreon supporter) Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Forecasting Solar Radiation Storms

2020-10-1940:03

What are solar storms? How are they caused? And how can we use data science to forecast them?In this episode of DataCafé we talk about the Sun and how it drives space weather, and the efforts to forecast solar radiation storms that can have a massive impact here on Earth. On a regular day, the Sun has a constant stream of charged particles, or plasma, coming off its surface into the solar system, known as the solar wind. But in times of high activity it can undergo much more explosive phenomena: two of these being solar flares and coronal mass ejections (CMEs). These eruptions on the Sun launch energetic particles into space in the form of plasma and magnetic field that can reach us here on Earth and cause radiation storms and/or geomagnetic storms. These storms can degrade satellites, affect telecommunications and power grids, and disrupt space exploration and aviation. Although we can be glad the strongest events are rare, this means they are hard to predict because of the difficulties in observing, studying and classifying them. So the challenge then becomes, how can we forecast them?To answer this we speak to Dr. Hazel Bain, a research scientist specializing in the development of tools for operational space weather forecasting. She tells us about her efforts to bring together physics-based models with machine learning in order to improve solar storm forecasts and provide alerts to customers in industries like aviation, agriculture and space exploration. With special guest Dr. Hazel M Bain, Research Scientist at the Cooperative Institute for Research in Environmental Sciences (CIRES) at the University of Colorado, Boulder and NOAA’s Space Weather Prediction Center (SWPC).Further readingOnline Presentation: Solar Radiation Storms by Dr. Hazel Bain (HAO colloquium via YouTube https://bit.ly/3k8WuBc)Article: NASA Space Weather (via NASA https://go.nasa.gov/2T3v5VG)Algorithm: AdaBoost (via scikit-learn https://bit.ly/35bkfSU)Press Release: New Space Weather Advisories Serve Aviation (via CIRES https://bit.ly/3dyqDHI)Paper: Shock Connectivity in the 2010 August and 2012 July Solar Energetic Particle Events Inferred from Observations and ENLIL Modeling (via IOP https://bit.ly/2IEtGTs)Paper: Diagnostics of Space Weather Drivers Enabled by Radio Observations (via arXiv https://arxiv.org/abs/1904.05817)Paper: Bridging EUV and White-Light Observations to Inspect the Initiation Phase of a “Two-Stage” Solar Eruptive Event (via Springer or arXiv https://arxiv.org/abs/1406.4919)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning themThanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Entrepreneurship in Data Science

2020-09-1950:45

How do you get your latest and greatest data science tool to make an impact? How can you avoid wasting time building a supposedly great data product only to see it fall flat on launch?In this episode, we discuss how you need to start with the idea before you get to a data product. As all good entrepreneurs know, if you can't sell the idea, you're certainly not going to be able to sell the product. We take inspiration from a particular way of thinking about software engineering called Lean Startup, and learn how it can be applied to data science projects and to startups in general. We are lucky enough to talk with Freddie Odukomaiya, CTO of a startup that is aiming to revolutionise commercial property decision-making. He tells us about his entrepreneur journey, creating an innovative data tech company and we learn how Lean Startup has influenced the way he has approached developing his business.With interview guest Freddie Odukomaiya, CTO and Co-founder of GeoHood.Further readingArticle: The Lean Startup Methodology by Eric Ries (article via theleanstartup.com)Article: Data science and entrepreneurship: Business models for data science (blogpost via thedatascientist.com)Article: A Lean Start-up Approach to Data Science by Ben Dias (article via LinkedIn)Podcast: Linear Digressions with Katie and Ben (via lineardigressions.com)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 11 September 2020Interview date: 16 June 2020Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

Viruses: Keep Calm and Use Statistics

2020-08-1246:59

What is a virus? How can we spot human viruses in danger of becoming pandemics? How can we use statistics to understand their origins and transmission? This turns out to be a hard problem - not least because there can be many hundreds or thousands of slightly modified strains of a virus in a small sample of blood. It is of great importance which version of a virus will become a pandemic in a population and which will merely peter out.Viral geneticists have to be expert statisticians to be able to disentangle this story. Fundamentally if we can use statistical techniques to understand which versions of a virus are prevalent and where they originated from we can start to design counter measures to defeat the further spread of the virus.We speak to statistician and data scientist Dr. Kat James about her DPhil and post-doctoral work on the statistical genetics of animal-human viruses, in particular HIV-2, at the Nuffield Department of Medicine and the Wellcome Trust Centre for Human Genetics, University of Oxford. She is now Head of Data Science at Royal Mail and has some some valuable insights on the crossover between statistical genetics and data science.As we discover, the current coronavirus pandemic is a so-called zoonotic virus - which means it transitioned from animals to humans at some point and has become a very successful virus in the human population. COVID-19 has similarities to influenza, HIV-1 and HIV-2, MERS and SARS as we will discover in this episode and Kat gives us some interesting lessons to learn from previous pandemics.Background on HIVHIV-1 is one of the major viral pandemics of the 20th century. Untreated, it has a greater than 95% probability of death and it has killed 33 million people (it still accounts for 750,000 deaths per year).Using statistical genetics, researchers have been able to identify 3 spillover events into humans for HIV-1. Human viruses often interact with developments in human geography as part of the infection dynamics and this is certainly true of HIV-1 over the course of its emergence as a pandemic virus.HIV-2 is a distinct but similar virus to HIV-1 and people who are infected with HIV-2 often demonstrate resistance to HIV-1. Eight spillover events from Mangabey monkeys have been identified for HIV-2.With interview guest Dr. Kat James who is now Head of Data Science at Royal Mail.Further readingPaper: Low-Bias RNA Sequencing of the HIV-2 Genome from Blood Plasma (via Journal of Virology, American Society for Microbiology)Article: Introduction to PCR amplification (via Kahn Academy)Article: Tracking COVID19: Coronavirus came to UK 'on at least 1,300 separate occasions' (reporting on work by Universities of Birmingham and Oxford via BBC News website)Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.Recording date: 7 July 2020Interview date: 9 June 2020Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.

#box-pro-ellipsis-171406154313259{-webkit-line-clamp:2;}DataCafé