Common Sense in AI

Water is wet

Common Sense is the background information about the world, that we have come to accumulate and have without specifically being aware of it. These are the facts that everyone knows and restating them would be redundant. Some common sense is universal, like gravity causes things to fall, water makes things wet, fire burns and hot. When you think about two different tribes evolved in two isolated islands, they both will have these basic knowledge because they experienced them the same way.

When we train a machine learning algorithms we start with a blank universe, and feed it with specific data. For example, when we train a face recognition algorithm, we feed it with images of faces. They don't contain most other information about how a face relate to other objects in our physical universe. Even the simple facts like, face is attached to the body by a neck and it sits at the top of the torso.

"The man was hungry. So he went to the kitchen and prepared a sandwich. His mood improved after."

We implicitly know that the man ate the sandwich. This is because we all have experienced this scenario before. If you asked this from a chatbot trained on GPT-2⁽¹⁾, it would be unlikely to give this answer. Famously Gary Marcus tested something similar with the following:

LMAO.

and a good example how statistics ≠ understanding. pic.twitter.com/BhKnSaYKkG
— Gary Marcus (@GaryMarcus) October 27, 2019

Previously at DeepMind came to prominence from their Deep Learning bot that could play many Atari games, and was able to surpass human level in most of them, and lately with Agent 57⁽²⁾, all of them. This bot was trained on the pixel frames of the game and learned to play without any external knowledge fed to it. An AI startup Vicarious⁽³⁾, recreated the same bot to play well on Breakout and then introduced perturbations to the environment such as the position of paddle, or unbreakable blocks. But bot failed miserably,

The criticism here is that current machine learning systems are pattern matchers, and doesn't have means of building reusable concepts for reasoning. Deep Learning methods that had excellent success in very narrow domains such as Computer Vision tasks are notoriously bad at adapting to change of input data.

The first to tackle the problem of common sense in AI was Jon McCarthy with his Advice Taker in 1959. McCarthy is the creator of Lisp programming language and coined the term Artificial Intelligence and was one of the founders of AI as a scientific discipline. In a paper called 'Programs with common sense'⁽⁴⁾ McCarthy proposed Advice Taker. This was a program that would improve its knowledge and understanding just by acquiring symbolic information about its environment fed by an outsider. It is supposed to have this 'ability' to decompose the statements given into immediate logical consequences. And acquiring a large amount of these logical consequences would constitute a system of Common Sense.

Real world scenarios mostly present themselves with incomplete information. A system being able to infer assumptions given incomplete information is more likely to succeed. This also touches on bounded rationality, where any system is bound by its resources and time to make a decision that improves or continues its existence. Therefore the ability to make assumptions and infer further, creates the cause and effect. Understanding of cause and effect would be an essential part of an human level intelligent system. It is quite apparent from observing animals, they all have some level of cause and effect understanding that was evolved for their survival^(5),⁽⁶⁾.

From McCarthy's era, the building block of machine based Common Sense reasoning is done with predictate logic. One of the tools Prolog is an implementation of predicate logic. It is based on facts, rules, constants and variables. This symbolic approach to AI is known as 'Good Old Fashioned Artificial Intelligence' (GOFAI). Although this approach had some early promise, it was brittle and unscalable.

One of the earliest and longest running Common Sense projects is called CYC⁽⁷⁾. It was started in 1984 by Douglas Lenat. Aim of the project is to create a knowledge base of common sense concepts, and rules, and generate inferences from them. Although it has acquired a large set of common sense rules and assertions, it is a long way from matching a human level common sense. The acquisition of this knowledge requires a vast amount of human labour, which is not scalable in order to understand a world that is changing quickly.

ConceptNet⁽⁸⁾ is another notable effort which started in 1999 by MIT. This is an Open Source endevour, main purpose of it is to create a semantic network of words, used for natural language understanding. It is a knowledge graph of common sense knowledge in various languages and is motivated by the goal of making programmable common sense. The word embeddings created out of the ConceptNet are representations of word meanings similar to word2vec or Glove and claim to be better. The source of data is mainly contributed by internet users, but also include data from DBPedia, Wiktionary, and imported ontology from OpenCyc.

Mosaic⁽⁹⁾ is part of by Allen Institute founded by Paul Allen. Commonsense Knowledge Graphs is a project that aims to explore semi structured representations of Common Sense by way of knowledge graphs of concepts combined with neural networks. Visual Common Sense Reasoning (VCR)⁽¹⁰⁾ is another project that aims to build a large dataset of visual concepts. They describes it as:

With one glance at an image, we can effortlessly imagine the world beyond the pixels (e.g. that [person1] ordered pancakes). While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world.

Winograd Schema Challenge¹¹ is a test created to evaluate common sense reasoning in NLP systems. This contained 273 questions with single word variation. The task was to find the correct answer that the pronoun was refereing to.

The trophy would not fit in the brown suitcase because it was too big (small). What was too big (small)? Answer 0: the trophy Answer 1: the suitcase

Although this challenge is solved with around 90% accuracy with Deep Learning methods, there have been criticism of the effectiveness of the benchmark, because of the size of the dataset. It is easy for a Deep Learning model not to generalise for a small dataset such as this. WinoGrande is a similar challenge, but with a larger set (44000) of sentences. With this larger dataset the Deep Learning method that were achieving 90% accuracy, fell drastically.

Yejin Choi is an associate professor at the University of Washington and a research manager at Allen Institute, who is researching on techniques to incorporate structured knowledge with deep learning. Her team created COMET which combines a neural language model with common sense knowledge, which tries to solve the brittleness and coverage problems simultaneously. COMET can reason out with existing information from the knowledge graph and for unknown information, inferences are made from the neural language model. Surprising part of COMET is that it can work with imperfect english with typos and grammar mistakes.

Although with promising results, this approach does not make machines understand common sense. Language is a compressed form of expression that we use communicate about the real world. Real world itself is bewilderingly complex and vast in arrangements. Animals perceive the world through many senses and incorporate them into a singular model of the world, and make inferences from it all the while updating the model, from the sensory feedback.

Can a neural model create a system that can truly understand the world? We can hypothesis that it is possible by observing small animals and their brains. But what structure is required to acquire this feat is the key problem. With our current technology it is unlikely that this could be achieved purely using neural methods. The reason being the computation required to the brain of a small insect like a bee requires immense resources.

It would be interesting to see if multimodel knowledge graphs that could be generated and updated from neural methods, is a viable way forward.

References

[1] Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text https://openai.com/blog/better-language-models/

[2] Agent57: Outperforming the human Atari benchmark https://deepmind.com/blog/article/Agent57-Outperforming-the-human-Atari-benchmark

[3] General Game Playing With Schema Networks https://www.vicarious.com/posts/general-game-playing-with-schema-networks

[4] PROGRAMS WITH COMMON SENSE John McCarthy Computer Science Department, Stanford University 1959 http://www-formal.stanford.edu/jmc/mcc59.pdf

[5] Predictive behavior and causal learning in animals and humans - Many animals, including human beings, possess several cognitive abilities to adapt to the external environment. It is particularly important to predict possible future events based on past experiences and to select the appropriate behavior accordingly KOSUKE SAWA

[6] Causal Reasoning in Rats - Experiments show that rats, like humans, can discriminate between events that are coincident in time and those that are causally related to one another. Aaron P. Blaisdell, Kosuke Sawa, Kenneth J. Leising, Michael R. Waldmann _Science_17 Feb 2006 : 1020-1022 https://science.sciencemag.org/content/311/5763/1020

[7] Cyc is a long-term artificial intelligence project that aims to assemble a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works. Hoping to capture common sense knowledge, Cyc focuses on implicit knowledge that other AI platforms may take for granted https://en.wikipedia.org/wiki/Cyc

[8] ConceptNet is a freely-available semantic network, designed to help computers understand the meanings of words that people use. http://www.conceptnet.io/

[9] Mosaic -commonsense knowledge graphs https://mosaic.allenai.org/projects/commonsense-knowledge-graphs https://mosaickg.apps.allenai.org/

[10] Visual Common Sense

https://visualcommonsense.com/

[11] Can Winograd Schemas Replace Turing Test for Defining Human-Level AI

https://spectrum.ieee.org/automaton/artificial-intelligence/machine-learning/winograd-schemas-replace-turing-test-for-defining-humanlevel-artificial-intelligence

https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html

https://en.wikipedia.org/wiki/Winograd_Schema_Challenge