Tagging.tech interview with Martin Wilson
Description
Tagging.tech presents an audio interview with Martin Wilson about image recognition.
Listen and subscribe to Tagging.tech on Apple Podcasts, AudioBoom, CastBox, Google Play, RadioPublic or TuneIn.
Keywording Now: Practical Advice on using Image Recognition and Keywording Services
Now available
Transcript:
Henrik de Gyor: This is Tagging.tech. I’m Henrik de Gyor. Today I’m speaking with Martin Wilson. Martin, how are you?
Martin Wilson: I’m very well, thank you. How are you?
Henrik: Good. Martin, who are you and what do you do?
Martin: I am a director at Asset Bank. Being a director, I’ve done an awful lot of different things over the years. I have done some development on our product, Asset Bank. I’ve done sales and I’ve done consultancy while rolling out the product.
Just to explain a little bit about what Asset Bank is as a product, it is a digital asset management solution. Digital asset management is often shortened to DAM. A DAM solution helps clients and the users to organize the digital assets that almost every organization owns and makes use of nowadays.
By digital asset, we mean primary files. Things like images, videos, documents and all of those. A digital asset has an awful lot of value to an organization and it’s very important that they can find them easily, that they don’t waste money recreating digital assets that they already have, and that the assets themselves are used properly in a way that’s consistent with the brand of the organization.
Henrik: Martin, what are the biggest challenges and successes you’ve seen with image and video recognition?
Martin: Let me first start by saying how I think that image recognition has a potential to have a really big impact on my industry, digital asset management. Digital asset management is all about being able to find images and then use them properly. That’s the purpose of the DAM system. There’s an old adage which people use and it says that a DAM system is only as good as the metadata that is associated with the assets. The reason for that is, a million images, if you have a million images in any system it’s almost impossible to find the image you want without some sort of a search and or a browse function. Those searches and browse functions at the moment rely on what we call metadata that it is associated with the assets. That metadata is things like title or caption of an image, description, perhaps some keywords that been put in, maybe some information about how that can be used, the image can be used.
The result of this is that people, humans, spend an awful lot of time entering the metadata that is associated with digital assets. Usually, within an organization, the processes, the workflows that are associated with using a DAM application involve uploading one or more or many digital assets, typically images or videos, and then manually entering the data by, for example, looking at the image, seeing what it’s about, what the subject is, who’s in it maybe if it’s of people and then just actually typing in that data.
As you can imagine, that takes a lot of time. It’s also considered quite boring by most people. For that reason, it’s often skipped or not done really well. If it’s not done really well, the data associated with the assets is incomplete and therefore it’s very hard for it to turn up in the right searches.
The idea that it could be automated, this process, and have a computer work out what’s in the image and tag the digital assets appropriately is enormous. It’s almost like the Holy Grail of the upload process for DAM systems.
There was an awful lot of excitement when, for example Google Cloud Vision came out with their service. It’s what called an API which enables other applications to make use of the image recognition functionality. There’s a lot of other services as well that have come out in the last couple of years like Clarify, is another one.
When they came out, lots of DAM vendors got very excited and rushed to add the functionality into their own applications. We did the same. About a year ago we started a project with the objective of developing a component that could be used with Asset Bank in order to add auto-tagging capabilities to asset bank.
Let me just describe some of the challenges then that we found in doing that and when we rolled out some of our clients, the challenges they found. One of the challenges, I suppose which is always like a umbrella challenge over all of it, is people’s expectations.
Humans are very good at looking at images and working out what’s in it. They’ve also got a lot of domain knowledge. Usually, they understand, for example, their products. They can look at a product shot and say, “Yeah, that’s product F-567”, or whatever the code is. It’s actually very hard for computers to do that well. That problem hasn’t been solved that well yet.
What we found is, when compared with how humans tag images, the results coming from the auto-tagging software or APIs was not, to be frank, not of good enough quality for most cases. That’s the second specific challenge then, really. The quality of the raw results coming back from the software. The image, the visual recognition software was not quite good enough for use in most organizations, especially in a commercial sense.
That’s not say that it’s not useful. I’ll come on to that in a bit. What we found, on to the successes, what we found was that certain clients who had more generic or general images, the results were much better. We’ve got some clients who are tourists boards. They’ve got images of landscapes and scenery. Most of the image recognition software is quite good at finding the subjects and suggesting keywords for those types of images.
One of the reasons for that is that most of them have been trained on image data sets, that are images that are found on the internet for example. Of course they’re going to be generic. The other end of the spectrum, where we found it didn’t work that well was for clients that have got quite bespoke business domains or subject domains, images of their own product range. Very hard for these fairly generic image recognition software APIs to be able to come up with the right keywords for those sorts of image.
That’s possibly where there are still gaps. That might be something we’ll talk about in a minute about the future, which is the inability for a lot of this tagging software to learn from bespoke data sets.
Henrik: Martin, as of December 2016, how do you see image and video recognition changing?
Martin: I think it’s fair to say that it’s in it’s infancy at the moment. It’s only since it’s become available through the online cloud services or web services that people have found it very easy to start using this technology in their own applications. It’s only been the last couple of years, that really has kind of taken off as something that can be openly or easily used.
Now I think the vendors of this sort of software are learning very quickly from real use cases. I think it’s quite an exciting for where the commercial or non-commercial application of this software can go. I think if we first focus a little bit more on the current problems, that gives some insight into where the software might go, what direction it might go in.
I was just talking then about one of the problems being that is very generic at the moment, the tags that you get back from the online services are going to be fairly generic. That’s obviously the case if you understand how they work and how they learn. I think very quickly we’re going to see these services, and I know some are already, offering the ability for you to train them with your own data sets. That then opens up the application a lot more widely.
One of the things that image recognition and artificial intelligence, in general, is the context in which they’re operating. It’s much e