Artificial Intelligence (AI) is an interdisciplinary field of computer science that aims to create machines and systems capable of performing tasks that typically require human cognitive abilities, such as learning, reasoning, problem-solving, understanding natural language, and more. By extension, we also call AI the systems that have been produced by this field of research. ChatGPT is an AI. Netflix uses AI in its recommendation engine, for example. Here are some formal definitions (source):
- A branch of computer science devoted to developing data processing systems that performs functions normally associated with human intelligence, such as reasoning, learning, and self-improvement.
- The capability of a device to perform functions that are normally associated with human intelligence such as reasoning, learning, and self-improvement.
From: ANSI INCITS 172-2002 (R2007) Information Technology—American National Standard Dictionary
of Information Technology (ANSDIT) (Revision and Redesignation Of ANSI X3.172-1996).
- Capability of a system to acquire, process, and apply knowledge (ie, facts, information, and skills acquired through experience or education) and skills.
- Technical system that uses artificial intelligence to solve problems (AI system).
From: ISO/IEC 3WD 22989 Information Technology—Artificial Intelligence—Artificial Intelligence Concepts and terminology.
AI was present in Greek mythology, operational centuries ago, and "mainstream" in the 1960's. Since then, its seemingly exponential growth has largerly been fueled by scaling laws, but also abruptly interrupted by two AI winters. This article explores that history, describes the essential concepts and is constantly updated to reflect the latest developments in what has become a varied field of research, with one dominant commercial branch.
What is Intelligence?
Defining Artificial Intelligence is difficult without first agreeing on a definition of the human intelligence AI is trying to mimic. This is complex topic out of the scope of this page. But the elements of human intelligence that AI is striving to copy comprise the following :
- Adaptation to the environment. When your environment changes (a road is blocked, eg) you automatically find another path to your destination. This requires abilities that preprogrammed machines don't always have.
- Some form of learning. That is, improving performance after being exposed to information. That information can be training data, user feedback ...
- Perception and Communication skills. Interaction with the environment is essential to AI. Traditional systems such as a car, or a computer require human input to perform their tasks. AI systems strive to become more autonomous (such as a self-driving car). And for this, they need to be able to perceive their environement and communicate with it.
- Reasoning and problem solving. Computers are great at number crunching, and evaluating a great number of possible outcomes, as in a game of chess. But, for autonomous behaviour, AI systems must be able to decide what relevant topic to apply this power to, and to find solutions to problems on their own. AI reasoning is also used in science, for example to prove new mathematical theorems.
What does not qualify as Artificial Intelligence (AI)?

(c) Pascal Bernardon
Consider the seesaw on the left. It is man-made, therefore artificial. It can also be given two data inputs (a human on either end) and decide which is heaviest. In spite of this, it wouldn't typically be considered artificial intelligence.
More significant: a pocket calculator can perform complex mathematical operations based on user inputs, with greater speed and precision than the human mind. But that would not be considered artificial intelligence either, because its operations are deterministic and preprogrammed. It cannot extrapolate, learn, adapt to new data or exhibit any form of autonomous decision making.
What qualifies as AI?
No definition of Artificial Intelligence is universally accepted by all relevant institutions. And what is generally considered to be interesting AI changes over time. The first perceptron, while a major algorithmic landmark, would not generate much interest by itself today. But Artificial Intelligence systems today are expected to exhibit some level of decision-making capabilities beyond simple rule-based computation.
But, at its core, AI is software. A robot isn't AI. AI is software. And a robot may or may not use AI to perform the tasks it was designed for. For instance, robots in a car factory are programmed to repeat the same tasks over and over again, while robots in logistics depots adapt to the landscape in front of them, and to the object being carried. That adaptation capability is fundamental to what describes AI.
So, here are 6 fundamental properties that would be expected of any computer program to qualify as Artificial Intelligence. Note that these are key capabilities commonly associated with AI systems, but not all AI systems necessarily possess all of these properties or would exhibit all those properties equally. Some might even fall short on certain properties. Consider these as fundamental aspects by which to evaluate any system to check whether it qualifies or not.
Click on any title to read more.
During the training phase, AI models are fed data and identify trends or patterns that can be used for prediction or decision-making. Up to a certain point, larger (and better) datasets and larger models improve the performance of AI systems.
Example: Fraud detection systems are trained on historical transaction data, including both legitimate and fraudulent transactions. They learn patterns in the features of payments (transaction amount, location, time of day, merchant information, payment method) that differentiate normal spending behaviors from anomalous or suspicious activities. Once deployed, the systems analyze new transactions and flag those that trigger a fraud altert base on the learned patterns.
This assumes that there are common aspects to fraudulent transactions. The systems learn the most frequent patterns in the initial sample and detect them in new transactions.
Once the model is trained and deployed for real-world use, the model can continue to adapt and improve based on the interaction with new data and user feedback.
This adaptation process is crucial for several reasons: data drift, novelty, feedback from users, reinforcement ...
Example: Netflix or Spotify recommend different movies / TV shows / songs to different users based on their prior history and preferences. But they also monitor feedback from users. If a user didn't like a suggested movie, Netflix will decide to alter the model it built around that user.
A model trained on meteorological data may no longer be accurate after a few years if the local climate chages. This is called data drift. The model can be retrained from scratch using more recent data, or adapted on-the-go from new weather data.
Fraudsters might learn new techniques that elude the current model in a fraud detection system. Continually feeding the system the details of new faudulent transactions can help it adapt to this novelty.
Counter example : ChatGPT learns from a large dataset of text from the internet and draws its responses based on patterns it has learned during the training phase. However, ChatGPT does not actively adapt to individual user feedback during real-time interactions.
Adaptation to Input refers to the ability of an AI system to respond differently to varying inputs or queries based on the specific context or user preferences. It focuses on tailoring responses to individual queries in real-time, often during the deployment phase when interacting with users. This property allows AI systems to provide personalized and context-aware responses.
Example: ChatGPT never gives you the same answer twice. You can "regenerate" an answer and it will take into account your insatisfaction with the first to create a different one, based on the same question. It also partly "remembers" past queries, which allows its answers to take past answers into account.
Some AI systems are able to interpret and understand sensory data from the environment, such as images, sounds, and videos. This capability enables AI systems to interact with the world in ways that mimic human sensory perception.
This comes in the form of Computer Vision (interpreting and understanding visual information from the world, to identify objects, people, scenes, and patterns), Image Classification ( into predefined categories based on their visual features, for example for content moderation and filtering of inappropriate imagery), Object Detection and Tracking, Speech Recognition and more ...
Natural Language Processing is what makes your phone understand the address you want to navigate to, for example. Or makes ChatGPT understand a long-winded question about tomorrow's essay.
NLP involves interpreting and understanding human language. AI systems with NLP capabilities can analyze text and speech to extract meaning, sentiment, and context. They can be used in chatbots, sentiment analysis, and language translation.
This is a core aspect of artificial intelligence (AI) that allows systems to analyze information, apply logic, and arrive at informed decisions.
AI systems apply rules of logic to derive new insights or facts from existing knowledge. They excel at tackling complex problems that involve numerous variables and potential solutions, as they are able to explore different paths, evaluate trade-offs, and generate solutions that may not be immediately obvious (see Waze's route creating algorithm, for instance).
They can be designed with domain-specific knowledge, allowing them to reason effectively within a particular field such as healthcare. They can recognize patterns in data and draw conclusions from them.
Reasoning in AI also extends to ethical considerations. Some AI systems are designed to make decisions that align with ethical guidelines and principles. For instance, autonomous vehicles may need to reason about potential ethical dilemmas, such as how to prioritize the safety of passengers and pedestrians in critical situations.
It's important to understand that the lines are somewhat blurry and that no accepted definition exists to fully encompass AI. Only the way in which a system is programmed and the typical set of properties we expect from it help decide whether a system is an AI or not.
In this regard, it is easy to understand even a very complex calculator or computer isn't an AI, but a simple program on either can be. The calculator will always perform exactly in the same way, unless faulty. An AI such as ChatGPT might never give the same answer twice and is very reactive to prompting. This makes AI systems much more flexible and powerful, but also less easy to test and control.
It is also important to understand the distinction between traditional software, which can be specified and automatically tested, and AI which is statistical and approximative by design. Traditional software can be used when the rules to implement in the program are well understood. But not all things can be described accurately. How would you teach a program to recognize a cat using formal specifications, for example? You just can't. In such cases, AI (and in particular Machine Learning and neural networks) come into play, learning implicit rules from a large number of examples (plenty of photos of kitties, for example).
This mirrors the way humans learn, hence the current fascination with the process.
What's next?
Vast affordable computing power, and huge amounts of "free" content published online have made training large neural networks possible and has enabled the current generation of superstar generative AI apps (ChatGPT, Midjourney ...). But those are just a fragment of neural networks, which are just a fragment of machine learning, which is just a fragment of artificial intelligence. Albeit a strong flavour of the moment fragment.
The rest of this page is devoted to presenting the multiple forms AI can take, its applications, how it works and its rich history, which reveals the background political and technological trends behind the evolution we now know.
For those interested in understanding how to use the current wave of AI, but also understand what will come next, the buttons below will take you to the remaining sections of the page.
Questions? Just use the form below. Every question will be answered personally!
Types of AI
Why this matters
1. The ANI AGI ASI view
A popular, but relatively unhelpful typology focuses on the capabilities of AI systems. It slices the world into the following three types :
- Artificial Narrow Intelligence (ANI). Also known as weak AI, ANI refers to any AI system designed to perform a single task and which cannot perform this task without independant intervention. It cannot independantly learn a new type of task either. ChatGPT, Siri, Midjourney, face recognition, fraud detection are all ANI.
- Artificial General Intelligence (AGI). Also known as strong AI, AGI refers to systems with more advanced cognitive capabilities (such as understanding and creativity) and which can perform multifunctional tasks autonomously to learn beyond their initial programming based on observations of the world.
AGI is still largely science fiction. And, depending on who you ask and how ambitious the definition, some developers believe it can never be created. But systems such as Sophia (the talking robot), logistics robots and autonomous cars are inching closer to AGI because they involve multiple "senses" (cameras, radar, lidar, speech recognition) and combine them to make a decision such as following a lane or answering a question. But they are still performing one single task and largely unable to learn anythng significant during their use.
Note that AGI will not take your job. But someone with good understanding of ANI might well. - Artificial Super Intelligence (ASI) is pure fantasy. Think Terminator, or any system that reasons faster and better than humans on any topic in any conditions. The theory is that once a system can learn autonomously, it can learn anything and everything. Ignoring the obvious challenge of resources available to such a system, little research seems to be currently devoted to algorithms and data models able to perform this type of miracle.
It's important to understand that the current AI we are seeing beat humans at video games, chess, go, or chat like humans, are optimized for one specific task. The commercial trend is (as of mid 2023) towards more of those system, because they can dislodge costly human jobs or help enhance the abilities of a team.
Technologically, the recent tendency to specialise neural network topologies to make them better suited for a specific task (at the expense of others), and the mass production of chips dedicated exclusively to the training of those specialized networks, point towards a more-of-the-same near future. AGI is on nobody's radar, as far as we know.
This first classification is popular but unhelpful because all current systems fall into one same category. To get a better grasp of how AI systems work, it is more interesting to consider the branches of algorithms they use and the constraints they face in real life.

ITU Pictures from Geneva, Switzerland — https://www.flickr.com/photos/itupictures/27254369347/
2. Branches of AI techniques
This second type of classification focuses on the type of algorithms and data used by AI systems. This is far more useful, as it allows us to gain a better understanding of what each system is able to achieve or not achieve. Artificial Intelligence is a broad field that has evolved over time, employing multiple techniques to adapt to goals and context. This section only aims at providing a high-level, bird's eye view of the landscape, as well as an index to more deep-dive articles as they get published.
Consider the following branches as a list of some of the fields that have been explored in the AI scientific community over time. Machine Learning, the most popular one, has largely replaced those, and will be described separately below.
Those are rule-based systems which mimic human expertise in a specific domain. The rules explicitely programmed into the system are based on expert human knowledge and sored in "If ... then ... else" form in a database.
When presented with new data or a specific scenario, the inference engine matches the data against the rules in the knowledge base to determine the appropriate answer.
Although highly valuable for automating decision-making in specific domains, expert systems are limited to the knowledge and rules provided by human experts, and will struggle with situations that fall outside of their scope.
In order to perform logical and deductive reasoning on real-life problems, it is necessary to represent information using symbols, which can be abstract representations of objects, concepts, relationships, and rules.
Formal languages are used to that effect, with propositional logic, predicate logic, and first-order logic being typical examples. Inference and deductive reasoning are then used to draw conclusions from given knowledge.
This is often used in the context of building expert systems (see above) as well as planning and problem solving, or NLP, in which it has been used to model the structure of language.
Lorem Ipsum
Lorem Ipsum
Lorem Ipsum
Lorem Ipsum
Genetic algorithms are used to find approximate solutions to optimization and search problems.
They simulate the process of natural selection, where the fittest individuals are more likely to pass on their genetic traits to the next generation :
- A a set of potential solutions to the problem being tackled (the design parameters of complex structures in engineering, or of a financial strategy in economics, for example) is initialized.
- The fitness of each (how well they optimise the problem) is calculated.
- The best are retained.
- Sections of the "chromosomes" (ie the parameter values) of the best parents are swapped over to create a new generation, and a little randomness is introduced in the process.
- The selection process begins again, using the best parents and the various offsprings.
- And we go back to step 2. Over and over until the process converges towards a best solution.
This approach can yield very good solutions to difficult optimisation problems, but is never guaranteed to find the optimal solution, and it arely does.
This involves creating algorithms that can generate formal proofs for mathematical statements using logical deduction and inference rules.
The process typically starts with encoding the theorem and its assumptions into a logical language that the computer can understand. The system then applies a series of logical rules, axioms, and inference techniques to manipulate these statements and derive new conclusions. The goal is to construct a formal proof that demonstrates the validity of the theorem based on established principles of logic.
Automated theorem proving has applications in various fields, including mathematics, computer science, and formal verification of software and hardware systems.
Machine Learning
Welcome to the current star of Artificial Intelligence.
Typical software is developped as follows : the goals and features are specified (i.e. formally described), the code is written, and automated or manual tests ensure that the code matches expectations, delivers the required features and does not contain serious bugs. This is only possible when explicit rules and specifications can be written, for the developers to follow.
In many fields, such as face recognition (surveillance) animal recognition (camera autofocus), self-driving, and more, this not possible. When a camera sees a field of pixels, there is no explicit way to describe what a face, a truck or a deer look like. Enter Machine Learning, in which software is not written but trained statistically on data.

Image (c) Karina Vorozheeva.
Is this a cat or a butterfly? How do you describe this, pixel by pixel?
A machine sees only pixels. You can't tell it "this is an ear". There is no this. Only pixels. And it's impossible to formally describe the pixels that constitute a cat image. You could specify one. But what if the lighting changes? What if the cat turns sideways? What if the cat is more fluffy than you have described? What if it has a missing paw or a butterfly on its nose? What if it's a baby kitten, not an adult cat? ... In all those cases, your specification will fail.
Rather than provide tens of thousands of specifications for individual cats, positions, lighting scenarios ..., Machine Learning provides data (in our case a large number of cat images, but it could be texts, meteorological data, flight paths, historical stock market price series, ... anything that can be statistically crunched). The Machine Learning Algorithm then builds a (statistical) Model that is used to recognise the cat, drive the car, land the plane, focus the lens on the deer's eye ..,
The model starts off being rubbish, and gets incrementally better as more and more data is fed to it by the algorithm. Hence the analogy to the idea of human learning.
Note 1. The model isn't traditional software. It's a mathematical object that can be used (as a kind of statistical oracle) inside software that gets developped the traditional way, as described at the beginning of this section. Typically that mathematical object is a neural network. For now, just imagine a matrix (a grid of rows and columns contaning numbers, like an Excel file). The model and the software that uses it are distinct. The model will recognize a truck on your lane, the sofware will tell the car to slow down.
Note 2. The model is ... a statistical model. It can sometimes do what no traditional software can do (recognize cats with butterflies on their nose) but it can never guarantee to be right!!! It's often correct, but not always. Statistical models cannot always be correct. It's in their nature to make predictions, not establish facts. This makes testing very different from traditional software quality assurance. It also often leads to criticism of those models, even when they significantly outperform the abilities of human beings.
Note 3 (closely related to Note 2). The quality of the model depends - more than anything else - on the quality of the data it receives. All the gender or racial biases decried in some recent AI systems stem not from the algorithm used to train the model, but from poor data. Again, the analogy with human learning is strong. Feed a child incorrect facts or values, and it will grow to be a dysfunctional adult.
Scaling laws
Machine Learning has gained traction because of the simultaneous :
- need for advanced data analysis and other, less easy to specify, functionalities in business
- abundance of available data
- realization of Moore's Law (an exponential growth of computational power available to us).
4 types of Machine Learning are worth exploring in some detail to gain a deeper intuition of how this subfield of AI operates.
Supervised learning
A data analysis team provides cleaned up data, with each item (image, word, sound, number ...) receiving a label. Pictures of cats receive the label 'Cat', for instance. A stock market data series can be labelled 'Head and Shoulders reversal'. A human face can be labelled as 'Asian', 'Female', 'Smile'. So the model is able to know what data corresponds to what category.
The labelled data is typically split into 2 parts. Maybe 80% will be used to train the model, and the last 20% will be used to evaluate the model's performance once trained. Typically, more labelled data means better model performance.
Once your model has been trained, and provided the real-world data it is being used on has similar statistical characteristics as the training data, the evaluation gives you as good estimate of how well the model will do. If the real-world data has different characteristics, it will be necessary to retrain the model using a sample of that new data. For example, if you train your model exclusively with photos of hairless Shpynx cats, it may have trouble identifying fluffy Maine Coons.
The two traditional uses for supervised learning are regression and classification.
Unsupervised learning
Sometimes it isn't possible or realistic to obtain a training set of labelled data. In that case, the algorithm applies various methods to the existing data to find patterns in it. For example, it can group customers based on their purchasing behavior or preferences, allowing businesses to tailor their marketing strategies for different segments. A similar approach can be used for video encoding. By identifying clusters of similar video frames, even out of sequence, the algorithm can optimise compression for each type of cluster.
Unsupervised learning is often used for data analysis, not after data analysis has been performed (as in supervised learning). Through data preprocessing, feature extraction, and identifying hidden structures in data, it can be very useful to help a company make sense of large quantities of accumulated data.
Its limitations come from the absence of supervisation. Since the predictions of the algorithm cannot be validated or invalidated by comparing them to labelled test data, the performance of the model are more erratic. On the plus side, it can be used in real-time and is far less costly, since it doesn't require time-consuming cleaning and labelling by armies of humans.
Common techniques of unsupervised learning include clustering, dimensionality reduction, density estimation and association rules.
Reinforcement learning
This technique is inspired by human learning in that it rewards or punishes the system based on its choices, in order to help it optimise its adaptation to an environment, called a policy.
Self flying model helicopters have been trained in this way and outmanoeuvre human pilots. By analysing what causes imbalance or crashes, the helicopter (which initially flies more or less randomly) learned to fine-tune its flight inputs and became increasingly better. AlphaGo, the model that beat Chess and Go champions, started off knowing only the rules of the game and some basic moves learned from amateur games. It then played thousands of games against itself to tune the predictive ability of its algorithm, and became arguably the best player in history.
The system, called an agent in this context, learns to make decisions in a specific environment (a chess board, air and hard ground for the helicopter, large corpuses of text for NLP ...) In any given state (situation) is decides on an action and receives feedback in the form of a reward or penalty. This helps it create the optimal policy for the environment, ie the map of best actions in any possible situation in this environment.
This approach makes it possible to adapt to conditions where training data cannot be obtained, in dynamic or unstable conditions, for example. One downside of the approach is that the system's only goal is to optimise the policy for greatest reward. When it is difficult for human beings to predict the impact or accurately define the rewards for any action, this can lead to unethical and biased decision making. Self driving cars will sometimes send information about a situation and decision to the manufacturer for the AI team to monitor the decision and decide whether or not it was the optimal one. This helps tune the reward system in an attempt to avoid this issue.
Deep learning
Deep learning is not a distinct type of Machine Learning. Only the above 3 are. Deep Learning is a subsection of supervised learning, only mentioned separately because it has become famous and deserves a little bit more explaining.
As mentioned above, neural networks are often used for the model in Machine Learning. Those networks can (in the general case) be assimilated to mathematical matrices, i.e. rows and columns of numbers. To generalize, we can say that a matrix with lots of rows can learn more and more complex features in the input data (the number of variables you want to think about), and that more and more columns allow the system to learn more and more abstract concepts (ie learning to recognize a smile vs estimating the price of a house in your area). Each column in the matrix corresponds to a layer in the neural network.
Now, in many - most - cases, one layer (besides the obligatory input and output layers) is enough. That's because the neurons in that single network can generally perform all the calculations needed to tackle most problems. Adding layers to the network, in order to tackle more abstract problems, is called deepening the network.
Deep learning merely refers to machine learning that uses neural networks that have multiple layers (plus the input and output layers). No standard number of layers defines what is deep learning and what isn't. But deep learning essentially means learning abstract concepts. Note that problems that require very few layers can typically be solved without using neural networks, so Deep Learning has largely become synonymous with neural networks, but they are really two different things.
Conclusion
Machine learning is a subset of artificial intelligence that focuses on enabling computers to learn from data and improve their performance over time. It's uniquely adept at tasks where explicit programming is difficult or unfeasible.
Instead of being programmed explicitly, machine learning algorithms learn rules and relationships from large datasets to make predictions, classifications, or decisions. This creates a model of the concept/image/voice/... being learned. The training process involves iteratively adjusting parameters in the model to minimize the difference between predictions made by the model and actual data.
Machine learning employs various techniques, including supervised learning (using labeled data), unsupervised learning (finding patterns in unlabeled data), and reinforcement learning (learning by trial and error). It has proven highly effective in tasks like image and speech recognition, recommendation systems, fraud detection, and medical diagnoses.
Neural networks have proven particularly suited to machine learning (ML) tasks that involve large datasets. And the use of multi-layered neural networks in ML has been called Deep Learning.
In summary
When someone talks about AI today, they most likely refer to Deep Learning, to the exclusion of all other branches mentioned above (and the numerous ones not touched on in this page). It is the technique most suited to the crunching of vast amounts of data and the one with the most numerous business and personal applications (voice assistants, automated translators, strategy optimisation, automated customer support, image generation, text generation, image recognition, speech recognition ...)
But neural networks and Machine Learning face their own sets of challenges - mostly related to data (bias, fairness, labelling ...) - and limitations (ML cannot replace other techniques for every conceivable problem).
Besides, most real-life AI systems combine multiple techniques into one seamless piece of software. For example, self-driving cars recognize voices, other cars, road signs, traffic lanes, pedestrians, animals, ... They also read radars and lidars. They also make ethical decisions in case of unavoidable crashes, predict trajectories, and more.
So, it is important to understand that the primary commercial and public focus is on deep learning, but that science is also exploring the many other fields that could one day lead to AGI (whereas ML will always remain ANI, albeit high performance ANI). With this in mind, you can choose your primary focus based on the type of application that matters most to you.
A final note on Generative AI. This is the subset of Deep Learning dedicated to creating content (images, texts, music ...) based on a prompt (a set of instructions typically formulated in human language).
How AI works
Why this matters
A quick AI workflow
- Collect and Preprocess data: This data can be text, images, audio, or any other type of relevant information. Raw data is often noisy and unstructured. Before feeding it to AI algorithms, data needs to be cleaned, organized, and transformed into a unique and suitable format.
- Select an algorithm: An algorithm is a set of instructions that determine how data is handled. The study of algorithms is old and rich, and a wide variety of algorithms are available to choose from depending on the goal of the AI projet. Some can optimize a solution, others can find the shortest path between two points on a map, others can handle the feeding of feedback to a neural network, or lower the number of independant variables in a tough problem ...
- Train the model: Strictly speaking this applies only to machine learning. But you could argue that even Expert Systems get trained when the fed the rules defined by human experts. This phase aims to bring the model from a point of quasi-randomness to one where it deals with a given task (image recognition, eg) with high statistical performance.
- Test and Validate: After training, feed the AI model unseen data set aside from the training set to assess its performance and accuracy, on this training set. If the model performs well on this data, it can be considered ready for deployment.
- Deploy: Insert the model inside computer softare or inside an app, to be used in the outside world.
- Update: The characteristics of data may change over time. Some cat breeds may fall out of fashion and disappear, while new ones appear and can trick the model. Or new driving conditions, not encountered during training, may cause the self-driving car to misbehave. AI systems can be further improved through a feedback loop. Any new data or user interactions can be used to continuously update and refine the model, making it more accurate and efficient over time. Or the model can be retrained with a new up-to-date data set or even a new structure.
Example 1: Sophia the robot (speculative)
This robot is pictured above. It has cameras in the eyes, microphones in the ears, a speaker to synthesize human voice (presumably in the mouth), motors to change facial expressions and tilt the head. One Deep Learning model probably helps it understand human speech. Another probably helps it understand what the cameras are seeing. An early version of generative AI probably suggests responses, and some sort of Expert System might help determine the corresponding emotion/state of mind. Speech generation will read out the reply and the motors will translate the state of mind into a set of positions for the various parts of the face that can be controled.
Example 2: The self-driving electric vehicle (speculative)
Cameras observe the road in all directions. Lidar readings monitor the distances with surrounding vehicles / pedestrians / objects and their relative motion. One Deep Learning model probably identifies the types of objects present in the camera feeds. Some manufacturers are trying to get rid of Lidar and rely only on those cameras to assess position and relative motion. One reinforcement learning model probably trained in a controled environment and fine-tuned in test cars decides how to navigate based on the input from the other systems and models. Software transforms those decisions into current into the motors, braking motors, wheel angle ... A second reinforcement learning model involves the car sending back a situation and how it reacted to headquaters for analysis. A team can then decide whether to upgrade the navigation model and how, based on this feedback.
In both cases, it is interesting to distinguish between:
- The mechanical parts
- The traditional software (how to turn the wheel or tilt the head based on information from a model)
- The Machine Learning models.
All three live at different levels and rhythms. It takes months to alter a factory line, so having to waite for such a change to be able to upgrade a fault in the image recognition model would be a liability for the manufacturer and society. Ball joint recalls may happen on a yearly scale at the dealers or at the factory. But important model changes need to be made much more flexible, over the air, whenever the situation warrants it. Failure to enable such over the air updates caused recent entrants in the EV world severe pain and losses. An electric vehicle is dependent on software. Important software (traditionally coded and maintained) can be updated over the air on the basis of agile methodology iterations (maybe every two weeks). ML models can theoretically change up to several times a day. All three components of the car live separately and at their own rhythm.
History of AI
Why this matters
The history of AI takes us back longer than most people realise. It's interesting in its own right, but also useful to understand the driving forces that got us to the point we are, as well as terms such as AI winters, get to know important figures and milestone events and gear. You might find recurring themes (war and commerce, mainly) familiar, as well as the hype cycles and following disappointments. You will also gain a deeper intuitive understanding of the field through this timeline than by reading some fragmented and opinionated articles.
The modern age of AI
Coming soon
The booms and busts of Artificial Intelligence
1987-1994: AI Winter N°2

The Symbolix 3640 LISP Machine. Photo (c) Michael L. Umbricht and Carl R. Friend
As predicted in 1984, a second period of disinterest in AI and reduced funding occurred in 1987. Its causes were a mix of the hype cycle turning downwards and market dynamics :
- Expert Systems (the epitomy of top-down AI) showed serious limitations. Being optimized for specific knowledge and context, they became useless out of those parameters. The rise of neural network shifted attention elsewhere. In 1987, Alactrious Inc launched Alacrity, an Expert system using over 3000 rules, which was one of the last of its kind.
- Expensive, complex, computers build spefically to run expert systems were mae redundant by cheaper, more powerful, generic workstations, that could also run LISP and expert systems (SUN, IBM, Apple). A whole industry collapsed in the year of 1987.
- Japan's Fifth Generation program came to an end, to be replaced by a 6th generation dedicated to neural-nets. Results didn't come close to initial hopes, and all had to be rebuilt in a different direction, which would take years.
- DARPA, which had started to fund AI again in 1983, to follow Japan's lead, decided once again that AI did not deliver on its promises and directed funding to alternative fields of research, which did.
1986: New successes in AI

Geoffrey Hinton. Photo (c) Ramsey Cardy / Collision
1986: VaMoRs, Enrst Dickmanns' robotic vehicle based on a Mercedes van, becomes the first autonomous vehicle able to drive itslef. By the next year, it would drive up to 60mph on empty roads. And by 1994, it was tested (in Mercedes 500 form) on public French roads at speeds up to 130km/h.
1986: Geoffrey Hinton and his colleagues publish an article about Backpropagation. This algorithm is used to adjust the weights of connections between individual neurons in neural networks to minimize the difference between the network's prediction and the exact answer.
Before Backpropagation, the potential of neural networks was understood, but training them was messy and complex business. Backpropagation introduced a systematic approach that made the training much more efficient and easy. Until then, top-down AI approaches such as experts systems had been more prominent, but this sparked renewed interest in bottom-up approaches such as neural networks, which learn not from rules but from examples.
1986: UCI Professor Rina Rechter coins the term "Deep Learning".
1984: Minsky and Schank warn against a second AI winter
Having experienced the full brunt of first, and understood its hype-cycle causes, Marvin Minsky and Roger Schank spoke up to their community during the 1984 annual American Association of Artificial Intelligence conference. It was their warning about the current over-enthusiasm and inevitable domino-effect of pains to come that coined the term AI Winter. And their forecast became truth towards the end of the decade.
1982: Japan's Fifth Generation Computer Systems (FGCS)
- For computers: electromechanical, tubes, transistors and integrated circuits (of increasing size and density)
- For programming languages : machine languages, assembly language, structured languages, and object oriented languages.
Japan's 10-year FGCS program aimed at producing massively parallel computers (able to deal with many computations at the same time, much like the graphics cards in today's gaming computers) and programmed in high-level symbolic languages, as a platform for logical programming and AI.
The program lasted until 1994, after which it gave way to a sixth generation centered around neural nets. However, it was never commercially succesful because cheaper alternatives were being developed elsewhere.
1979 - 1980: A New Hope

Kunihiko Fukushima. (Source)
Not all research stopped during the first AI winter, and some good news began to thaw the ice towards the end of the decade.
1979: The "American Association for Artificial Intelligence" is founded, signaling a structuration of the academic community. It was a nonprofit scientific society devoted to "advancing the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines", renamed to the current "Association for the Advancement of Artificial Intelligence" (AAAI) in 2007.
1979: Kunihiko Fukushima (father of ReLU) creates a form of multilayer neural networks callad Convoluted Neural Networks (CNN) still in use today in image recognition. The subsequent resurgence of interest in AI in Japan, particularly the Fifth Generation government program, got other governments back in the race, helping end AI Winter n°1.
1980: DEC releases R1 - known internally as XCON, for eXpert CONfigurer - the first ever commercial expert system, used to guider potential buyers of DAC's VAX computer solutions. This commercial dimension also contributed to end the first AI winter.
1973 (to 1980): It all comes tumbling down. The first AI winter

Sir James Lighthill
The enthusiastic scientists and media of the time forgot a crucial rule in business (in this case, funding): underpromise and overdeliver. Officials in the various governments and agencies funding AI research began to doubt that advances matched espectations, among other problems: inoperational machine translation, limitations of perceptrons, speed issues. DARPA started cutting finances of projects that had no direct application, feeling the no-strings-attached funding of theoretical research was getting them nowhere.
The UK government, via the Science Research Council, commissioned British Mathematician and innovator James Lighthill to asses the field's progress.
Lighthill's report was scathing. Having conducted a compilation of the academic research in AI and robotics, he concluded that most research had ignored complexity theory ("combinatorial explosion") and was unfit to address real-life situation. Consequently, funding was stopped for all but 2 British universities.
And the domino effect that caused other funding agencies to follow suit was compared to the chain reaction that occurs in nuclear explosions, causing nuclear winters in the atmosphere. So the field enterted its first major pause after decades of optimism. The first AI winter. At this time, public interest also waned.
1972: WAseda roBOT 1

WABOT 1, in person. Photo (c) Waseda University
WAseda roBOT 1, a.k.a. WABOT 1, had legs. It was the first robot with slightly antropomorphic appearance. This was deemed important if robots were to enter society. It was designed to function in a human environment and interact with human beings.
It was built by Waseda University in Japan around 1972, and could communicate in Japanese, walk, measure the environment around it to avoid obstacles, and could carry objects with artificial hands.
While Marvin Minsky (see quote below) enthusiastically - and somewhat optimistically - promised robots with human intelligence within the 1970s, WABOT 1 was estimated to possess the IQ of a 1 year-old child.
“ ... in three to eight years we will have a machine with the general intelligence of an average human being.”
Marvin Minsky, talking to LIFE Magazine, in 1970
1969: Kunihiko Fukushima introduces ReLU
ReLU (Rectified Linear Unit) is a mathematical function introduced by Kunihiko Fukushima in the calculations that happen inside neural networks. In that context, it is called an activation function, and we will take a closer look in the dedicated chapter.
That function is what determines whether a neuron in the network "fires" or not, and its formulation is both simple (hence fast) and helps the network learn more efficiently than others. ReLU was not without its mathematical problems, but results were so interesting that variants were introduced as solutions to those issues, and ReLU is still usd to this day.
1968: Alexey Ivakhnenko fathers Deep Learning

Alexey Ivakhnenko in Kyiv. Photo (c) Perelom
Alexey Ivakhnenko was a Ukranian computer scientist who specialized in pattern recognition, forecasting in complex systems and self-organisation. He developed a technique called inductive learning which was a direct competitor to the more mainstream stochastic approaches.
In 1968, he published an important article in the journal "Avtomatika" about Group Method of Data Handling (GMDH), a method which competed with the traditional deductive processes of science. Instead of working (top-down) from a theory to a model of it, GMDH proposed to work (bottom-up) from the observation of data to a model which might explain the data.
This work not only proposed to change the way general science could be carried out, but also offered a new direction to Artificial Intelligence, away from Expert Systems and towards Machine Learning as we know it today.
He is considered the father of Deep Learning.
1966: Joseph Weizenbaum and the First chatterbot
Siri and ChatGPT share a common ancestor: ELIZA. Created by Joseph Weizenbaum at MIT to explore the field of human-machine communications. The program accepted "bolt-on" scripts that contained the linguistic capabilities. One of them was called Doctor and famously created humorous conversations between a patient and a therapist. ELIZA was the first program able to take (and fail the Turing test). In spite of its numerous shortcomings - it had absolutely zero understanding - and obviously programmed nature (drawing heavily on LISP structures and Expert Systems) it was attributed human-like sentiments by observers, a trait shared today by ChatGPT and the current generation of chatbots.
The term chatterbot was later changed to the chatbot we are now familiar with.
1965: Moore's Law

Gordon Moore. Photo (c) Intelfreepress
In 1965, Intel co-founder Gordon Moore makes a bold prediction: "The number of transistors incorporated in a chip will approximately double every 12 months". He later (1975) revised this estimation to a doubling every 24 months. This may not sound like much but, over 20 years, this equates to 10 doublings, i.e. over 1000 x more transistors in a chip.
This exponential growth not only provided a fantastic business model for Intel, it also created one of the scaling laws that led to the rise of Deep Learning and today's generation of Generative AIs such as GPT and Midjourney.
1965: First Expert System by Edward Feigenbaum and Joshua Lederberg
Expert systems are designed to solve complex problems by emulating the sort of reasoning a human expert would use: "If this happens, then I do that". The first of those systems was intruduced by the the Stanford Heuristic Programming Project led by Feigenbaum.
Those systems are important for two reasons. First, they were the first true success of AI, leading to hope and more funding. Secondly, they embody top-down AI, the sort of Artificial Intelligence that begins with human knowledge and tries to program it explicitely into computer databases. This is the exact opposite to the bottom-up AI connectionist approach of Machine Learning (neural networks) that start with an empty network and feed it data so that it can learn rules by itself.
1963: DARPA funding
In June 1963, MIT received a $2.2M grant by the Defense Advanced Research Projects Agency (created to develop emerging technologies for the US army), to fund fund project MAC which included the AI team of Marvin Minsky and John McCarthy.
1961: Unimate does the (dangerous) job

Unimate at work. Photo (c) Meisam
Invented by George Devol and built by his company Unimation, Unimate was the first industrial robot, which automated machinery and handled jobs deemed too dangerous for humans, such as welding in toxic environments.
It was used on a GM construction line in New Jersey, and destined to work in nuclear plants.
1959: Arthur Samuel coins the term "Machine Learning"
While working at IBM, Samuel published the results of his checkers experimentation in the article "Some Studies in Machine Learning Using the Game of Checkers". Critically, he defines Machine Learning as a “field of study that gives computers the ability to learn without being explicitly programmed."
1958: LISP
In 1958, John McCarthy creates the programming language LISP (List Processing) building on Alonzo Church's Lambda calculus, and offering data structures and other features very useful to the AI research of the time.
1957-1958: Frank Rosenblatt's implementation of the Perceptron

Human and artificial neurons
Remember McCulloch and Pitts further down the page, from 1943? Their article "A Logical Calculus of Ideas Immanent in Nervous Activity” theorised the artificial neuron and provided a mathematical foundation for connectionism (the neural netwoks approach to AI)
In 1957, American psychologist created a program to emulate it on the IBM 704 computer, then built a hardware implementation in 1958, designed for image recognition with a 400 photosite array (20 x 20 Pixels, yoohoo).
In Machine Learning terms, a perceptron is a supervised learning binary classifier. It takes a number of inputs (akin to the dendrites in human neurons) and performs a calculation that answers Yes or No.
Media reception in 1958 was extremely enthusiastic, suggesting machines would soon "walk, talk, see, write, reproduce itself and be conscious of its existence", no less. But experimentation soon revealed the limitations of the single perceptron, leading to the understanding of the need for more elaborate neural networks.
"An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. … For the present purpose the artificial intelligence problem is taken to be that of making a machine behave in ways that would be called intelligent if a human were so behaving."
Proposal for the 1956 Dartmouth Workshop
1956: Logic Theorist and The Dartmouth Workshop

The star attendees at the 1956 Dartmouth Workshop: Trenchard More, John McCarthy, Marvin Minsky, Oliver Selfridge, and Ray Solomonoff. Photo (c) Joseph Mehling
1956 is a very important year for the growth of Artificial Intelligence.
First, Allen Newell, Herbert A. Simon, and Cliff Shaw write "Logic Theorist", the very first program aimed at automated reasoning. The program was able to prove 38 of the 52 mathematical theorems in a section of Principia Mathematica, even finding more elegants proofs for some of them. It is regarded as the first true Artificial Intelligence Program. Among other important concepts, it used search trees and heuristics.
And, over the summer of 56, researcher John McCarthy organised an 8 week workshop at Dartmouth university, which is now consered the first academic event for this field. Attendees included most of the famous names of the time in Computer Science and the term Artificial Intelligence was coined during the proposal of this event.
Artificial Intelligence was officially born and, with it a period of unbridled optimism that would last almost two decades.
Laying the foundations for Artificial Intelligence
1952: Arthur Samuel, beaten at his own game

Arthur Samuel and his computer playing checkers
Gamers of the world, unite and bow to your master 😉 Employed at IBM, Arthur Samuel had an obsession with developing a program that would exhibit actual intelligence. He programmed the IBM 701 computer to play draughts (checkers), which is a simple to program game but still benefits from strategy and thought.
The significance of this program stems from the learning strategy Samuel implemented. Every game played with a human being contributed to improved performance (similar to today's Reinforcement Learning strategies). Over time, the program got better and eventually beat Samuel! For this, Samuel is considered the true father of Machine Learning.
1950: Science and Science Fiction

Book cover by Edd Cartier
Alan Turing worked on cognitive science and computer science. His research helped launch connexionism (neural networks) and "artificial life". One of his major contributions came in the form of the Turing test, in which an observer of a conversation between a machine and a human being would have to determine which is the machine. If humans are unable to detect the machine, it is said to pass the test.
Simultaneously, Artificial Intelligence was powering the imagination of science-fiction authors such as Isaac Asimov. His book I, Robot introduced the now famous 3 laws of robotics intended to protect mankind from intelligent robots. The topic is still very fresh over 70 years later.
Interestingly, some research may have been influenced by the imagination of science fiction authors, though this it not widely documented, and may only be a pleasant idea 🙂
1949: Manchester Mark I: internal storage of instructions

It was designed in 1948, by Frederic C. Williams and Tom Kilburn, at Manchester University (UK), and ran its first complex program in 1949. It's purpose was to elevate computer literacy in the university, but also to serve as a prototype for the commercial development of the Ferranti Mark I, the first commercial computer in history, released in 1951.
Its main differenciator with respect to the previous generation was the presence of memory registers which could store the instructions of an algorithm (rather than have them on punched cards or via dedicated cabling, for instance). Several similar designs followed closely at Cambridge University (UK) and in the US army.
1943: The first computational theory of neural activity in the brain

In 1943, American neurophysiologist and cybernetician Warren Sturgis McCulloch, and American logician Walter Pitts, published a seminal paper that has received little attention from the AI community but was the first to use logic and computer science approaches to mental (neural) activity.
In that respect, it layed the ground work for the first artificial neuron, the Perceptron which would be announced in 1957. And, more importantly, it attempted to prove that a network finite number of such (formal, not biological) neurons could implement a Turing machine.
As such, "A Logical Calculus of Ideas Immanent in Nervous Activity” was probably the first scientific article on Artificial Intelligence.
1941 (to 1946): Fast, programmable computers
From the Pascaline to Alan Turing's message decyphering machines of WWII (the "bombes") all calculators had been (1) electromechanical and (2) designed to perform exactly one task, with human input. The next generation of computing machines would change that.
In 1941, Konrad Zuse designed the Z3 electromechanical computer, that was both programmable (so able to perform different tasks) and digital (its relays immplemented Boolean logic). In spite of a limited instruction set, it was proved to be Turing-complete, ie able to implement a Turing Machine, and run any algorithm. Zuse's requests for funding to replace electromechanic relays with electronic switches was turned down by the German government during WWII.
In 1945, IBM built the Automatic Sequence Controlled Calculator (ASCC), a.k.a. Harvard Mark I, designed by Howard Aiken, who took some inspiration from Charles Babbage's Analytical Engine (which could never be implemented before). It was still based on electromechanical technology but was fully programmable and required no human intervention after programming. One of the first users was John Von Neumann, during the Manhattan Project.
And, in 1946, ENIAC (Electronic Numerical Integrator and Computer) became the first machine to use electronic tubes instead of electromechanical relays. It was designed to calculate ballistic trajectories for the US army, a task it could complete in 3 seconds instead of over 2 days using Babbage's machine and 2 hours with Harvard Mark 1. But the first user was again John Von Neuman whom it helped design the hydrogen bomb. It was fully programmable, but needed recabling for every new program, and tubes heated up and were notorious for their lack of reliability.
In spite of the increases in speed and autonomy this trio of machines brought to the world of computing, they still suffered from three issues severely hindering their use for Artificial Intelligence. They were still too slow, too expensive and couldn't store the programming instructions internally, in memory. This would soon change again.
1936 (to 1950): Alan Turing's contibutions to war and Computer Science

Photo (c) National Portrait Gallery
Alan Turing's contributions were both theoretical and very down to Earth. After Goedel (in 1931) published a theorem proving that the then in-vogue dream of a "theory of everything" was impossible (the Incompleteness Theorem), Alan Turing (and his PhD tutor Alonzo Church) proved that some problems are not even "decidable", ie you can't even decide whether a solution can be found or not. In 1935, he designed a simple abstract machine (the TuringMachine) capable of implementing any algorithm a modern computer can.
His crucial work at Bletchley Park to break the German Enigma code is now legend. The interesting part, in the context of this page, is the electromechanical machines he designed and helped build to speed up the number crunching required for this task. After the war, he went on to design the first electronic stored-program general-purpose digital computer.
In 1950, he designed the famous Imitation Game, a test of machine's abilities to think in a way that is undistinguishable from human beings. Recent deep learning models such as GPT are now challenging this test.
Early Pioneers
1887: Herman Hollerith ties it all together

Photograph by Charles Milton Bell
Just as mechanical devices had been invented because manual calculations were too slow, Hollerith introduced electricity to punched card calculators to speed up calculations again.
As a statistician, his need was to tabulate large amounts of data for the United States Census Bureau.
His improvement on current systems was made through the use of electic connections to increment counters, rather than manual handling. And based on Boolean concepts of True or False, he used holes in punched cards to indicate specific characteristics of a person in a census (hole: married, no hole: unmarried, for example). Each card used the exact same layout to describe the various information about individual people.
Using these punched cards in his electric calculator made the calculation of population statistics much faster than would have been possible previously.
In 1896, the entrepreneurial Hollerith founds the Tabulating Machine Co, which was headed in
1914 by Thomas J Waston, and was renamed International Business Machines Corporation
in 1917 IBM was born!
1847: George Boole and modern logic

Image (c) unknown
If Jacquard, below, partly paved the way for the industrial revolution, English polymath George Boole played as significant a role in laying the foundations for the digital revolution and Computer Science (of which AI is a branch).
His major contribution in mathematics is boolean algebra, in which a variable is either true or false, and operations combine those values in clearly defined ways. His work was instrumental in separating logic from philosophy (the two fields were strongly connected at the time) and helped define symbolic logic, in an attempt to formalise the brain's reasoning.
He was also entirely self-taught, something equally possible today in the field of Artificial Intelligence.
1843: Babbage and Lovelace create the first algorithms

Young Ada Byron, future Countess of Lovelace
English engineer Charles Babbage described the first principles of what modern computers would become, hoping to create more accurate nautical, astronomical and mathematical tables than humans of the time could achieve. The design was close to the combination of mechanical calculators and punched cards. Babbage built a model of his computer, but was never able to complete it owing to high costs and falling support.
Young Ada Lovelace was fascinated by early designs and it was her idea to add the punched cards, so as to be able to program the machine rather than simply use it manually. She is credited with the first ever algorithm (evidence suggests Babbage created some too), the set of instructions to be punched into the cards to perform complex mathematical calculations.
Just as in AI today, she envisioned uses of computers that weren't limited to maths, but extended to the creation of music and poetic mathematics. Read about her extraordinary life and contribution to Babbage's machine.
1801: The Jacquard loom and Bouchon / Falcon / Vaucanson punched cards

Image (c) unknown
By using a sequence of punched cards (invented decades before by Mrs Bouchon, Falcon and Vaucanson) Joseph Marie Jacquard was able to bolt a machine onto a traditional loom and automate the weaving of complex patterns in fabric, more complex than could typically be handled manually.
Jacquard became the father of automation. He received praise from Napoleon and his machine was of great interest to the brilliant mind of Ada Lovelace (read her bio and impact on AI).
The machine was an important step on the road to the industrial revolution. But the canuts were worried about losing their jobs.
1694: Gottfried Wilhelm Leibniz's Stepped Reckoner

Image (c) Kolossos
Go cylindrical and multiply. In 1673, Leibniz invented a cylinder lined with 9 teeth of varying lengths that would couple with a counting wheel placed at a certain height with respect to the cylinder. At a medium height, it might ecounter 4 teeth for every turn of the cylinder and advance by 4 steps, to simulate the addition of 4 to a current total.
In 1694, the Stepped Reckoner machine used those cylinders to automate all four arithmetic operation, paving the way for all mechanical calculators of the next 3 centuries.
1642. The Pascaline

(c) David.Monniaux on Wikipedia
Much like the seesaw at the very top of this page, this would no longer be considered to be Artificial Intelligence today. But in 1642, the Pascaline presented a groundbreaking way for Blaise Pascal to calculate the taxes of the French region Haute Normandie on behalf of cardinal de Richelieu.
Via spoked metallic wheels, Pascal's calculator automated two arithmetic operations (addition and subtraction) and enabled others. Pascal received Royal Priviledge allowing him to sell copies of the machine, but costs cut the commercial career short. But it layed the groundwork for other designs to come in the future.
In a century far far away (700 BC). Pandora and Talos

Generated using Midjourney
In Greek mythology, the God Hepaestus was said to have built a bronze automaton to protect Crete, at the request of Zeus. It was designed to perform tasks without human intervention and appears to be the first record of our fascination for artificial beings. Among feats described in the Argonautica, it hurled giant boulders at the Argo to keep it at bay from Crete.
Since our photographic records of the times are scarce, I asked Midjourney for a representation. This image was one of the four the system gave me in return for the simple prompt. Yes, that's a camera, on the right. After all, a 30 m tall bronze robot performing war feats worthy of legend isn't going do all that without posting on the Gram, right? 😉
Pandora, described today as a woman, was also reported to be an artificial being by Hesiod in 700 BC, along with other beings such as mechanical servants made of gold (Source).