Computers need to be trained to recognise images, says Devangshu Datta.
Illustration: Uttam Ghosh/Rediff.com
Net surfers who wander around the darker recesses of the Web often have to prove they are not robots.
Various Web sites use different types of tests to try and ensure they are not being surfed, or scraped, by automated programs.
These can consist of CAPTCHAs -- Completely Automated Public Turing Tests to tell Computers and Humans Apart.
That is usually an alphanumeric sequence where the letters and numbers have been distorted to make it hard for a machine to read.
A more time-intensive and supposedly more fool proof version, involves presenting a panel of images to the surfer and asking them to tick off images that contain something specific.
It could be a traffic light, or a parking meter, or a dog, for instance.
This is something that a robot will find hard to do, unless it has been taught to specifically identify those images, by category.
Computers need to be trained to recognise images.
Indeed, this is one of the biggest stumbling blocks to machine learning applications, and many artificial intelligence (AI)-dependent applications, such as the use of self-driving cars, or facial recognition programmes.
This is less of a problem in a completely controlled environment, such as a factory floor, but it is a huge barrier to using AI in natural environments.
Any human driver, for example, is used to seeing literally thousands of things on the road.
Apart from other vehicles of various types, one may see something like a child tying her shoelaces at a school zebra crossing, or an elephant relieving itself if you happen to be driving through a wild-life reserve.
We automatically identify these images, classify them in terms of risk, and take what we consider to be an appropriate action.
A computer has to be trained to recognise such images.
What's more, a computer has to be trained to recognise composites of those images, and sometimes to recognise partial images seen from peculiar angles in uncertain light.
Heading down a highway in Corbett national park, a driver who sees a raised grey trunk emerging from the foliage, usually has the sense to realise that it is attached to a 4,000 kg animal.
Another driver seeing the scrunched rear-end of a child tying shoelaces identifies the same as a small human kneeling, in a bent posture.
Computers don't do this sort of thing easily at all.
One of the nastiest accidents involving self-driving cars occurred when a car tried to go under an advertising billboard.
It had correctly identified the billboard, and it calculated that there was enough clearance under the board for the car to pass.
Where it failed was in not realising that the billboard was attached to the side of a truck.
This difficulty has led to the creation of supervised learning.
AI is trained to recognise images by throwing databases of millions of related images at them.
By the time it has processed several million related images, taken from different angles with different levels of fidelity, it is hoped that the programme would have learnt enough to recognise those objects if they pop up while it's working.
The problem is that all those images need to be labelled.
While this is easy work for humans, it is also mind-numbingly boring.
And it needs to be done on scales that are mind-boggling, for a multitude of categories, depending on what the programmes are designed to handle.
This is becoming the AI-generation equivalent of the call centre in terms of scut work.
AI has, to a large extent, taken over the role of the call centre worker and the personal assistant.
Google, Siri, Cortana, Alexa, etc meet most of our PA requirements, and AI works reasonably at limited tasks, such as providing information about insurance policies and airline schedules.
But putting together databases and labelling them -- 'dog', 'human', 'human face with dilated nostrils', 'cat washing itself', 'car with advertising slogans', 'Politician yelling his head off', etc is a job that only humans seem to be able to do.
The IT antfarms of the next decade will be focused on image labelling.
It would be an odd way to make a living as a PA to an AI which may well provide PA services to humans once it has been trained.