Tagging.tech interview with Matthew Zeiler
Description
Tagging.tech presents an audio interview with Matthew Zeiler about image recognition
Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn.
Keywording Now: Practical Advice on using Image Recognition and Keywording Services
Now available
Transcript:
Henrik de Gyor: [00:02 ] This is TaggingTech. I’m Henrik de Gyor. Today I’m speaking with Matthew Zeiler.
Matthew, how are you?
Matthew Zeiler: [00:06 ] Good. How are you?
Henrik: [00:07 ] Good. Matthew, who are you, and what do you do?
Matthew: [00:12 ] I am a founder and CEO of Clarifai. We are a technology company in New York, that has technology that lets the computer see automatically. You can send us an image or a video, and we’ll tell you exactly what’s in it. That means, all the objects like car, dog, tree, mountain.
[00:32 ] Even descriptive words like love, romance, togetherness, are understood automatically by our technology. We make this technology available to enterprises and developers, through very simple APIs. You can literally send an image with about three lines of code, and we’ll tell you a whole list of objects.
[00:53 ] As well as how confident we are that those objects appear within the image or video.
Henrik: [00:58 ] Matthew, what are the biggest challenges and successes you’ve seen with image and video recognition?
Matthew: [01:03 ] It’s really exciting. We started this company about two years ago, in November, 2013. We scaled it up to now over 30 people. Since the beginning, we kicked it off by winning this competition, called ImageNet. This competition is held every year. An international competition where researchers submit, and the largest companies submit, and we won the top five places.
[01:27 ] That was key in order to get recognition. Both in the research community, but even more importantly in enterprise community. Since then we’ve had tremendous amount of inbound across a wide variety of verticals. We’ve seen the problems in wedding domain, travel, real estate, asset management. In consumer photos, social media.
[01:50 ] Every possible vertical and domain you can think of that has image or video content. We have paying customers. We’re solving problems that range from organizing the photos inside your pocket…we actually launched our own consumer app for this in December [2015], called Forevery, which is really exciting. Anyone with an iPhone [could] check it out.
[02:10 ] All the way to media companies, being able to tag their content for internal use. The tagging is very broad, to understand every possible aspect in the world. We can also get really fine‑grained. Even down to the terms and conditions that you put up for your users to upload content to your products.
[02:33 ] We can tailor our recognition system to help you moderate that content, and filter out the unwanted content before it reaches your live site. Lots of really exciting applications, and huge successes for both image and video.
[02:48 ] I think one of the early challenges, when we started two years ago, was really demonstrating that the value of this technology can provide to an enterprise, and explaining what the technology is. A lot of people heard about image recognition, or heard the phrase at least, for decades.
[03:06 ] It’s because it’s been in research for decades. People have been trying to solve this problem, in making computers see. Not until very recently has this happened. Now they’re seeing this technology actually work in real applications. Not just on the demo that you can see at clarifai.com, where you can throw in your own image.
[03:26 ] You see it happen in real‑time, but in actual products that people use every day. From customers like Vimeo to improve their video search, or Style Me Pretty to improve their management of all of their wedding albums. Or Trivago, to improve search over hotel listings.
[03:43 ] When you start seeing these experiences be improved, Clarifai is at the forefront there, of integrating with these leading companies across these different verticals. It went from this challenge of educating the community and enterprises about what this technology does to, now finding the best ways to integrate it.
Henrik: [04:03 ] As of early March 2016, how do you see image and video recognition changing?
Matthew: [04:09 ] When I started the company about two years ago, a general model that could recognize a 1,000 concepts, was pretty much state of the art. That’s what won ImageNet, when we kicked off the company. Now, we’ve extended that to over 11,000 different concepts that we can recognize and evolved it to recognize things beyond just objects, like I mentioned.
[04:33 ] Now, you can see these descriptive words, like idyllic, which will bring up beach photos. Or scenic, which will bring up nice mountain shots. Or nice weather shots, where it’s snowing, and snow on the trees. Just beautiful stuff like that. That people would describe images in this way, but we’ve taught machines to do the same thing.
[04:56 ] I think, going forward, you’ll see a lot more of this expansion in the capability of the machine learning technology that we use. Also a whole personalization of it. What we’ve seen with the expansion of concepts is, it’s never going to be enough. You want to give the functionality to your users, to let them customize it in the way they talk about the world.
[05:21 ] There’s a few concrete examples here. In stock media, we sit at the upload process of a lot of stock media sites. A pro photographer might upload an image, and they used to have to manually tag it, but this is a very slow process. We do it in real‑time. We give them the ability to remove some tags, and add some tags, and then it’s uploaded to the site.
[05:45 ] What this does with the stock media company, is give a much more consistent experience for buyers. If you let different people who don’t know each other, and grew up in different backgrounds, in different parts of the world, all tag their own content, they all talk with different vocabularies.
[06:01 ] When a buyer comes and talks with their vocabulary, and searches on the site, they get pretty much random results. It’s not the ideal and optimal results. Whereas using Clarifai, you’ll get a consistent view of all of your data, and it’s tagged in the same way. It’s much better for the buyer experience as well.
[06:19 ] Another example is, in our app Forevery, we’ve baked in some new technology, that’s coming later this year to our enterprise customers, which is the ability to really personalize it to you. This is showing in two different parts of the application. One is around people, where you can actually teach the app your friends and family.
[06:42 ] The other is around things. You can teach it anything in the world. Whether it’s the name of your specific dog, or it’s the Eiffel Tower, or any of your favorite sports car. Something like that. You can customize it. It actually is training a model on the phone to be able to predict these things.
[07:01 ] I think, the future of machine learning and image and video recognition is this personalization. Because it becomes more emotionally connected to you, and more powerful. It’s the way you speak about the world and see the world. We’re really excited about that evolving.
Henrik: [07:17 ] As of March, 2016, how much of the image and video recognition is done by people versus machines?
Matthew: [07:24 ] That’s a great question. I don’t know the concrete