GPT4 & The Impact of Capturing Our Emotions
Language plays a central role in how we communicate, how we learn and teach, how we organize and take political action and how we convey the emotions we experience in our lives. While Google Search is used to profile the user based on the topics of his interests, GPT4 can capture our emotions as pure text prompts are enhanced with multimodal capabilities with a new camera-based visual question-answering mode provided by Microsoft’s KOSMOS-1 software. Observing the user visually raises many issues about the application of this technology.
To some, GPT4 is approaching Artificial General Intelligence (AGI), considered the ‘Holy Grail’ of AI. To others, capturing and profiling our emotions will result in losing individuality and personality.
1. Some Theories about Emotions
Marvin Minsky, one of the founding fathers of AI, was once questioned about his view on machine emotions. His response was:
The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions.
Indeed, without emotions, we would not have survived as a species. While our intelligence has improved, emotions such as fear have caused our evolutionary survival mechanism to take effect. However, emotions can also be invoked due to an internal thought process. Finding a solution to a complicated mathematical problem can trigger happiness. It may be a purely introspective action with no external cause, but solving the problem still triggers an emotion.
Emotions like joy, sadness, surprise, disappointment, fear and anger can now be simulated with computational methods such as GPT4, thereby capturing our emotions as we interact and communicate with an intelligent machine. There are different theories to explain what emotions are and how they operate. The following is a summary of the two most popular:
One can distinguish between two different ways the evolutionary process has emerged. The first is based on the claim that emotions result from natural selection in early hominids. The second claims that emotions are indeed adaptations but suggests that the selection occurred much earlier.
Robert Plutchik, a well-known evolutionist, claims eight basic emotions exist. Each one is an adaptation to external events and all eight are found in living organisms. According to Plutchik, emotions are similar to human traits like DNA.
They are so important that they arose once and have been conserved ever since. In the case of emotions – which he calls ‘basic adaptations needed by all organisms in the struggle for individual survival’ – Plutchik suggests that the selection occurred in the Cambrian era, 600 million years ago. In his view, the eight major factors that form the basis of our emotions are incorporation, rejection, destruction, protection, reproduction, reintegration, orientation and exploration.
According to these theories, emotions are acquired or learned by individuals through experience. Emotions typically occur in social settings and during interpersonal transactions rather than being one’s individual response to a particular stimulus. Emotions and their expressions are regulated by social norms, values and expectations. These norms and values influence the appropriate reactions to emotions and what events should make a person angry, happy or jealous.
2. From Emotion to Perception
Emotion and perception are closely related. In some cases, this relationship may be internal; for example, a thought or the memory of a past personal experience. The early part of the emotion process defines the activity between the perception and the triggering of the emotion. The later part describes the bodily response with changes in heart rate, blood pressure, facial expression and skin conductivity.
William James (1884) was the first to develop a somatic feedback theory, which was recently revived and expanded by the neuroscientist Antonio Damasio, a professor from the University of Southern California and the philosopher Jesse Prinz, a professor at the City University of New York. Somatic feedback theories suggest that the mind registers these bodily activities once the bodily response has been generated.
This mental state caused by the bodily response defines Emotion. A consequence of this view is that there cannot be an emotion without a bodily response. Perception defines the organization, identification and interpretation of sensory information to understand the information provided by the environment. Perception involves signals that go through the nervous system, resulting from physical or chemical stimulation of the sensory system. Vision involves light entering the eye’s retina, smell is mediated by odor molecules, and hearing involves the sensing of pressure waves. A process related to this sensory input transforms this low-level information to higher-level information to extract shapes for object recognition, for example.
Perception depends on complex nervous system functions, but subjectively seems mostly effortless because this activity occurs outside our conscious awareness.
Although individuals traditionally viewed the senses as passive receptors, the study of illusions and ambiguous images has demonstrated that the brain’s perceptual system actively and pre-consciously attempts to make sense of this input. The brain’s perceptual system enables individuals to see the world around them to be stable, even though the sensory information is typically incomplete and rapidly varying. Human brains are structured in a modular way. Different areas of the brain process different kinds of sensory information. This data is recorded and mapped by the brain’s biological structure and used for decision-making, for example.
3. GPT4: Aligning Perception with Large Language Models
The convergence of language, multimodal perception and world modeling is considered a prerequisite for achieving some form of advanced artificial intelligence. In a paper published by a Microsoft Research Unit in March 2023, a software dubbed KOSMOS-1 is introduced. It describes a Multimodal Large Language Model (MLLM) that can sense and perceive general modalities such as vision, learn from context and follow self-assigned instructions.
The capability of perceiving multimodal input is critical to Large Language Models (LLMs) for the following reasons:
- First, multimodal perception enables LLMs to acquire common-sense knowledge beyond text descriptions.
- Second, aligning perception with LLMs opens the door to new tasks, such as guiding robots’ behavior or analyzing the intelligence embedded in documents.
- Third, the capability of perception unifies various applications with graphical user interfaces that represent the most unified way to interact with the system.
The KOSMOS-1 transformer model is trained on web-scale multimodal large-scale text resources. In addition, high-quality images and arbitrarily interleaved image and text documents are mined from the web. The research team evaluated various model settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting on various tasks without any gradient updates or finetuning. Experimental results show that KOSMOS-1 achieves impressive performance in language understanding, generation of document-based images, perception tasks including multimodal dialogue, visual question answering and vision tasks such as image recognition.
The backbone of KOSMOS-1 is a transformer-based causal language model. It can perceive general modalities, follow instructions, learn based on context and generate output. Apart from text, other modalities are embedded and fed into the language model, while the transformer decoder serves as a general-purpose interface to multimodal input.
As the range of available sensors is continuously expanding, it seems likely that transformer-based models such as GPT4, in combination with Adversarial Neural Networks (ANNs), signal a paradigm shift in the application of AI. Artificial General Intelligence (AGI) – with all its potential benefits and risks – has moved just one step closer to reality.
4. More Transparency
According to researchers from Stanford University, MLLMs epitomize a broad paradigm shift towards so-called foundation models, which are specified as machine learning models that can be adapted to an impressively wide range of tasks.
Billions of individuals will be touched by the impact of this revolutionary technology. With all the excitement and fear surrounding language models, we need to know what this technology can and cannot do and what risks it poses.
We must have a deeper scientific understanding and a more comprehensive account of its societal impact.
Transparency is the vital first step towards these goals. Transparency begets trust and standards. By taking a step towards transparency, the researchers aim to transform foundation models from immature emerging technology to a reliable infrastructure that embodies human values.
Learn more about recent AI developments in our related Journal articles: