The ability to recognise objects in an image or photograph is a technology every major tech company is dabbling with including Google and Facebook. It’s like, supply the image that you want to the image recognising artificial intelligence software, to recognise, and it tells you what is there in the photograph. And Google AI, according to this Google research update, can recognise objects in a photograph with more than 93% accuracy, 93.9% to be precise. And Google has just now made the code open source. So if you want to incorporate the feature of recognising objects in a photograph with great accuracy, you can use this source code in your own application.
As you can see in the above image, as the image recognition technology improves, the descriptions are more detailed. For example, in the images on the left, the first caption reads “A brown beer is swimming in the water”, but the second caption is uncannily accurate, as it says, “Two brown beers sitting on top of rocks.” Similarly, on the right-hand side, the above image says, “A train that is sitting on the tracks,” and below image says, “A blue and yellow train travelling down train tracks.”
Soon, the image recognition AI may also be able to tell how many people are standing at the platform, how many are men among the people who are standing at the platform, how many women, how many kids and if there are some dogs on the platform or if there is a bird sitting on the cable above the train. Really.
The basic question is, and it has also been addressed in the above blog post, does the image recognition software recognise unique images or it needs to be pre-fed data so that it simply recognises something that has already been recognised by a human and fed into the AI database? Google claims that it can also identify unique images. What about an image that has never been recognised? Another example from Google:
Multiple images, recognised and described by humans, are fed into the system. The Google AI can process various images of dogs in various conditions and then it uses this information to recognise a totally new image with same animals and objects. For example, the three images on the left have dogs, two have the sand and the beach and two have one or two dogs sitting. So in the image on the right, the Google AI automatically captions two dogs sitting next to each other on the beach.
But what if there are two dogs sitting and in the background, one dog is trying to catch a frisbee and a small kid is trying to hold the dog that is trying to catch the frisbee back? Will Google AI be able to recognise such complex images?