Gradient Ascent #11
3rd AI wave, Neurosymbolic AI, System 1 and 2 DL
Welcome to the 11th edition of Gradient Ascent. I’m Albert Azout, a prior entrepreneur and Partner at Level Ventures and Venture Partner at Cota Capital. On a regular basis I encounter interesting scientific research, startups tackling important and difficult problems, and technologies that wow me. I am curious and passionate about machine learning, advanced computing, distributed systems, and dev/data/ml-ops. In this newsletter, I aim to share what I see, what it means, and why it’s important. I hope you enjoy my ramblings!
Is there an emerging VC fund manager or founder I should meet?
Send me a note
Want to connect?
As I learned the painful way 😭, Neurosymbolic AI is a dense and involved topic. I will attempt to give high level introduction here and then expand in future newsletters. Firstly, my neurosymbolic AI mind map:
Despite incredible achievements to date, some would argue that Deep Learning is hitting an upper bound on its capacity for intelligence, caused not by the limits of data availability or computational power, but by the increasing levels of abstraction required for higher-order knowledge representation and reasoning. The non-consensus view is that this chasm cannot be crossed by brute force (i.e. increases in computational power, improvements in microarchitecture, availability of even more data, new families of deep neural nets, etc). Rather, a breakthrough more fundamental is needed.
As always, we can look to our own brains for guidance. We can think about how we think. Even in early infancy, we utilize higher-order reasoning quite effortlessly to draw conclusions and act in our world. Sure, we extract patterns from the world’s high-dimensional physical stimuli, through processes much akin to neural nets. But this pattern recognition is not an end in itself. It is rather aimed at constructing more abstract (symbolic) models of the world, which then can be manipulated. See this great talk by Jeff Tenenbaum:
I read an interesting paper recently in ACM that further illuminated Deep Learning’s state of affairs. The authors train networks that perfectly fit to a random labeling of the training data while evaluating various optimization approaches. Subsequently, they conclude that deep learned networks have sufficient expressivity to memorize data sets (given some model capacity), whether the training data is structured or random. The ability for the network to generalize, they find, is unrelated to what makes optimization of deep networks easy in practice. Rather a network’s generalization ability follows, as Gary Marcus writes, from an “interpolation within a cloud of points that surround the training examples” (i.e. we are ok so long as our real-world environment resembles our training environment).
We know that deep learning models are very expressive function approximators— layered neurons with non-linear activations—for complex functions that, in the case of supervised classification, map an input volume to a set of discrete classes. The promise of deep learning (and statistical learning generally) is, of course, to generalize to examples not seen during the training phase. In a perfect world, a deep learning model would robustly generalize or transfer to new environments, be resilient to adversarial attacks (i.e. deliberate alterations to pixels in images) or small perturbations in the inputs, and be able to properly explain itself using commonly (human-)understood semantics. Unfortunately, none of this is fully possible in today’s deep learning models for both vision and natural language (examples below, see Gary Marcus’ paper).
“A growing body of evidence shows that state-of-the-art models learn to exploit spurious statistical patterns in datasets…instead of learning meaning in the flexible and generalizable way that humans do.” (Facebook Research)
Now enters the 3rd AI wave: neurosymbolic AI.
The bridge to higher-order intelligence, neurosymbolics contentiously argue (via various Twitter wars
Pɾҽɱ Kυɱαɾ Aραɾαɳʝι 🏡😷🤖💬🦾🎫 @prem_kA very crucial & critical point from Bengio about the criticisms on #DL by @GaryMarcus: as researcher, Bengio's trying to find limitations, build new tools too overcome & thus expand the scope of #AI. https://t.co/rPEeqo9ssG
) requires building systems that can absorb, represent, and process knowledge, and then reason over complex models of the world. These models must combine or integrate neural networks with logic/symbolics, closing the gap between knowledge representation (see my last newsletter, Gradient Ascent #10) and deep learning networks (i.e. distributed vs localized representations).
Somehow, in our brains, we move from hierarchical distributed representations to symbolic machinery, from physical stimuli, to abstract features, to high-order symbolics and knowledge, to symbolic computations which reason and act in complex environments. The boundary between neural networks and symbolics is not yet fully understood, where, essentially, brain becomes mind.
How and where symbolics emerge from deep neural networks is the question Neurosymbolic AI aims to solve. I will hope to expound on this topic area in future newsletters (in the meantime, definitely read Neuralsymbolc AI: The 3rd Wave paper). As for the taxonomy for neurosymbolic AI, Henry Kautz suggests six types:
Type 1: standard deep learning.
Type 2: hybrid systems like DeepMind’s AlphaGo (neural network coupled with symbolic MCTS).
Type 3: focus on one task, where inputs and outputs interact with symbolic systems (i.e. neurosymbolic concept learner).
Type 4: a neural-symbolic system, where symbolic knowledge is compiled into the training set of the neural network.
Type 5: tightly-coupled but distributed neural-symbolic systems where symbolic logic is mapped onto an embedding and acts as a regularizer 😜 (i.e. Logic Tensor Networks).
Type 6: Fully integrated system, with true symbolic reasoning inside a neural engine.
Great, so what?
Here is what I believe are the long-term implication of Neuralsymbolic AI on Applied AI and enterprise infrastructure.
More effort and innovation forthcoming for knowledge extraction, representation, reasoning, and learning on graphs (i.e. node classification, link prediction, graph embedding). Knowledge graphs will become critical enterprise infrastructure, whereas today they are very brittle and difficult to maintain. Graph neural network methods become more essential.
The dream of the semantic web evolves into complex shared knowledge bases (at various hierarchal levels), extracted using deep learning methods applied to knowledge base construction. These systems will form the next layer of intelligent enterprise infrastructure and become the basis of reasoning engines for modern neuralsymbolic AI. After all, societies, organizations, etc. are built on layers of knowledge, and we can’t expect networks to have to relearn models of the world each and every time.
Enterprise MLOps will have new concerns at the front-end of AI systems: logic and reasoning over knowledge, explainability, etc. vs only large-scale training, optimization, and model serving.
Check out Yoshua' Bengio’s talk From System 1 Deep Learning to System 2 Deep Learning: