Google has been investing heavily in machine learning, and intelligent vision capabilities.
Think about how easy humans can identify things in photos. For example, a photo of a cat. Computers and computer programs have always had difficulty performing similar tasks.
Now, with advances in the field, computers are able to do this quite effectively, in some cases better than humans, but in other cases, worse than humans.
Google are opening up their capabilities in vision and machine learning, and making them accessible via an API (a kind of interface that developers can use easily).
How it works: if you feed the Google Vision API a photo, it will tell you what it thinks the photo shows. So if it thinks it’s a cat, it will tell you it’s a cat, and provide the associated probability (how likely it thinks it’s correct).
What can it recognise, well, here’s small list provided by Google:
- Identifying animals such as cats and dogs;
- Identifying land marks such as the Eiffel tower;
- Identifying facial expressions, such as smiles, frowning, etc.;
- Identifying text, in lots of forms such as hand writing, and also lots of languages;
- Identifying safe / unsafe content (e.g. NSFW content);
- Identifying logos;
That’s a big list, and it’s likely to get bigger in the future and the technology gets more sophisticated.
We’ll show you what the API does with some examples from Google.
The image below shows there’s an 83% probability that the photo contains fruit according to the Google Vision API, which it does. But it also tells you things like produce, and baccaurea ramiflora.
The image below shows there’s a 88% chance that there’s a rapid in the photo according to the Google Vision API.
The image below shows the text in the photo, of a street sign, and it shows that the language is French, according to the Google Vision API.
Now it’s up to developers to use this technology to do things we haven’t imagined yet.