How Facebook Is Getting Better at Recognizing Your Photos
Facebook
has long been able to recognize the people in your photos and sort
images by where they were taken. But it hasn't been as precise at
understanding what's actually happening in a photo. That's now beginning
to change, thanks to new developments in the Menlo Park, Calif.'s artificial intelligence software.
Facebook
says the new tech will improve its user experience in two ways. First,
it'll make it possible to search Facebook for photos based on what's in
them, rather than just by date taken, tags, or location. If you're
trying to find a photo of a paella dish you cooked last year, for
example, you'll be able to simply type "paella" in the Facebook search
bar. This, Facebook hopes, will help its users quickly find images
without having to remember when they were taken or how they were tagged.
Second, the upgrade will improve Facebook's automatic alt text feature,
which describes photos aloud to the visually impaired. Before the
update, Facebook could describe a photo's subjects on a rudimentary
level – when describing a concert photo, for instance, Facebook might
say the shot contains a person, a stage, and a guitar. After the update,
Facebook will be able to tell users the specific action that's
occurring in a scene, like "this is a picture of a person playing guitar
on stage." That might seem like a minor upgrade, but it's a big step
forward for image-identification software.
Facebook
previously said it's working on improved photo recognition technology,
but the new search capability has only just begun to launch publicly.
Services from other technology firms, like Google and Apple, also allow users to search their photos by content.
The
special sauce powering Facebook's new technology is its computer vision
engine, called Lumos. Lumos analyzes the troves of images shared to the
social platform each day, giving it plenty of data to crunch and learn
from. Lumos relies on a form of computer science known as "neural
networks," which aim to mimic the behavior of the human brain. Among
many other tasks, neural networks can be trained to recognize specific
pieces of information – a network designed to recognize images, for
example, would learn how to identify a cat after being shown thousands
of photos of different cats.
Joaquin
Candela, director of Facebook's Applied Machine Learning team, says
that being able to recognize specific actions, like running or jumping,
requires a deeper neural network. But these networks are harder to
train. The deeper the network, the harder it becomes for error signals
-- vital for the software to learn correct from incorrect -- to permeate
every layer of said network.
To
solve this problem, Facebook is using a "residual network," which makes
it possible to send error signals deeper into a network, says Candela.
"By doing that, you open up the possibility to train networks of a depth
that have never been trained before," he adds.
Neural
networks may be able to process billions of images at lightning-fast
speeds, but they're a long way from understanding single images as well
as humans can. That's largely because their knowledge is limited to the
data on which they were trained. A neural network may have been shown
hundreds of kinds of chairs, for instance, but it might get stuck when
trying to identify a type of chair it's never seen. A person, on the
other hand, would be able to use context clues (i.e. "oh, there's a
person sitting on it") to quickly ID a previously unknown chair.
There's no doubt that artificial intelligence's abilities are progressing faster
than many observers expected. Still, it's unclear when, if ever,
computers will be as good as humans at recognizing images. Candela,
however, remains optimistic that new innovations in AI will enable
further image-recognition developments. "I think it's going to be even
more exciting as we keep making progress on what we call semantic
segmentation, or a semantic understanding of images," he says. "It's not
only detecting the objects and what's going on, but also understanding
the relations between things and bringing common sense into it."
No comments