If you want to blame someone for the hoopla around artificial intelligence, Google researcher Geoff Hinton is a good candidate.
Today neural networks transcribe our speech, recognize our pets, and fight our trolls.
But Hinton now belittles the technology he helped bring to the world. “I think the way we’re doing computer vision is just wrong,” he says. “It works better than anything else at present but that doesn’t mean it’s right.”
In its place, Hinton has unveiled another “old” idea that might transform how computers see—and reshape AI. That’s important because computer vision is crucial to ideas such as self-driving cars, and having software that plays doctor.
Late last week, Hinton released two research papers that he says prove out an idea he’s been mulling for almost 40 years. “It’s made a lot of intuitive sense to me for a very long time, it just hasn’t worked well,” Hinton says. “We’ve finally got something that works well.”
Hinton’s new approach, known as capsule networks, is a twist on neural networks intended to make machines better able to understand the world through images or video. In one of the papers posted last week, Hinton’s capsule networks matched the accuracy of the best previous techniques on a standard test of how well software can learn to recognize handwritten digits.
In the second, capsule networks almost halved the best previous error rate on a test that challenges software to recognize toys such as trucks and cars from different angles.
The Two Capsule Networks Research Papers
1. Dynamic Routing Between Capsules at https://arxiv.org/abs/1710.09829
Abstract: A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part. We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation parameters. Active capsules at one level make predictions, via transformation matrices, for the instantiation parameters of higher-level capsules. When multiple predictions agree, a higher level capsule becomes active. We show that a discriminatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. To achieve these results we use an iterative routing-by-agreement mechanism: A lower-level capsule prefers to send its output to higher level capsules whose activity vectors have a big scalar product with the prediction coming from the lower-level capsule. [Sara Sabour, Nicholas Frosst, Geoffrey E Hinton]
2. Matrix Capsules with EM Routing at https://openreview.net/forum?id=HJWLfGWRb¬eId=HJWLfGWRb
Abstract: A capsule is a group of neurons whose outputs represent different properties of the same entity. We describe a version of capsules in which each capsule has a logistic unit to represent the presence of an entity and a 4x4 pose matrix which could learn to represent the relationship between that entity and the viewer. A capsule in one layer votes for the pose matrices of many different capsules in the layer above by multiplying its own pose matrix by viewpoint-invariant transformation matrices that could learn to represent part-whole relationships. Each of these votes is weighted by an assignment coefficient. These coefficients are iteratively updated using the EM algorithm such that the output of each capsule is routed to a capsule in the layer above that receives a cluster of similar votes. The whole system is trained discriminatively by unrolling 3 iterations of EM between each pair of adjacent layers. On the smallNORB benchmark, capsules reduce the number of test errors by 45\% compared to the state-of-the-art. Capsules also show far more resistant to white box adversarial attack than our baseline convolutional neural network. [Anonymous]