This post gives a hopefully concise and entertaining perspective on the history of AI with a slight focus on the neural approach. The author majored in a related field and likes to keep track of the state of the art ever since.
Why review this topic on the Juremy blog, you might ask, since Juremy doesn’t base its services on the use of AI? Indeed, Juremy’s value is being an independent and reliable research tool for original data on EU legislation in multiple languages. As most of our users work in the professional translation field, we can’t be ignorant of recent trends of AI-based tools popping up at every corner.
Therefore, we thought it would be useful to review the state of machine translation and how it affects translators. As a first step towards that, let’s see how AI developed until now.
Early AI
When computers started to get slightly more powerful than pocket calculators in the 1960’s, researchers were overly optimistic that soon we would have “AI” - a magical computer program, that has a database of facts about the world, and a logic with which it can answer any (or at least a sufficiently wide range of) question about the world.
Expert systems and their friends
These initial programs were very simplistic - think of kilometeres of if-this-then-that
statements. People quickly realized this won’t scale, so started to organize the
data on which the programs operated into “knowledge bases”. These databases
recorded facts such as (sky, has-color, blue)
. They also recorded rules such
as
if (X, has-color, C) and (Y, wants-to-draw, X) and (Y, has-no, C-colored-ink)
then (Y, should-buy, C-colored-ink)
The main initial use-cases were expert systems that would answer questions related to some domain (“What is the likelihood of blue ink drying within 2 hours?”), and planning, which attempted to devise a plan to reach a goal from a given initial state (“Jan has no blue ink but would like to have. How to get some?”; but also chess or other board games). Maybe translation too, but it fell short even more than the relatively “easy” cases.
![Example from 1990 rule based AI PhD. [https://commons.wikimedia.org/wiki/File:BackwardChaining_David_C_England_1990_p21.gif.]](https://www.juremy.com/blog/brief-history-of-ai/rule_based_ai.5ca43511ecb5288856fcc7c1dcc66c473f8c0a1afdfa69677a9ef8159beea57e.gif)
Periods of flops and gains
Soon computer people found out it is quite hard to make statements about the world and record universal rules, let alone make a computer program work with these efficiently. There’s ambiguity of meaning, probability of things happening or not, and generally just too many possibilities to examine (combinatorial explosion - in the end, everything can be used as a hammer).
So (and this became a repeating pattern) after the initial promise there was a big flop, also called an AI winter . Yet, computing-wise, not reaching the overly optimistic goals of AI was not all in vain. The research provided very valuable tools to computer people that over time became detached from AI, and are part of our computing toolchain to this day.
Distillation of AI tools into generic computing tools
These side-effects of AI attempts include various statistical models (decision trees, other statistical methods), data organization methods and query engines (ontologies, RDF, XML.. SQL..), whole families of programming languages , all kinds of algorithms (generic program patterns that can be adapted and re-used in certain situations to solve a specific kind of problem).
For example, route-finding algorithms – people of the time would consider a map software being able to find an efficient route between two points as AI. Image processing (for example edge-detection) or constraint solvers are other notable examples, among probably endless.
These periods of optimism (usually sparked by finding a new approach) and disillusionment repeated a few times, with the downturns bringing cuts on research budget spending and freezing AI development for quite some years. It is after some of these periods that neural things start to appear. Which brings us to…
The neural things..
There’s a certain category of problems, where we have example problems (inputs) and also know their expected answers (outputs). Computer people love these kind of problems, since it is easy to test if the program they wrote gives the right output for a given input.
Those same computer people eventually started to get bored and frustrated with inventing and writing programs for these problems, especially that those programs didn’t even work. So they started to look for alternatives.
Supervised learning was the name given to programs, that by just looking at the inputs and their expected outputs, could learn how to answer those kind of inputs (mostly correctly, in the ideal scenario). It is called “supervised”, since there is a program part that supervises the answer given by the answering part, compares with the expected answer, and then somehow adjusts the answering part away from wrong answers (“makes it learn”).
Inspired by biology, one of the attempts at a generic “answering part” was many neuron-like things connected together. In a true computer people fashion, computer neurons have a very far resemblance to actual brain neurons of living organisms – well, computer people also draw trees (the computational data structure kind) with their root at the top, so no surprise here.
..flop
Without boring technical details, the neuron-like thing (called a “perceptron” back then) was so simple that it was laughed at and got stomped into the ground by reviewers, not to surface for a good while. Okay, if the perceptron was so simplistic, how could it be of any use later?
![A perceptron. [https://commons.wikimedia.org/wiki/File:Perceptron_moj.png]](https://www.juremy.com/blog/brief-history-of-ai/perceptron.26f361c687b41ff9876822a9780e9e6939970a87dbd8048ad76890ec362c6e5d.png)
Turns out a few of those neurons are pretty incapable, but connecting many many of them together unlocks greater computing power (as in, calculating useful things). There even is theoretical proof that given a formulated problem, sufficiently many neurons together can compute the solution to any occurrence of that problem. Good theory, isn’t it?
The problem is (ignoring the whole “formulating the problem” part) that theory doesn’t give practical guidance on how many is actually enough, or in what pattern should these things be connected together to work well. And so we arrive to…
(Artificial) Neural Networks, or (A)NN-s
The researchers got busy for a long while attempting to figure the ideal pattern and count of the neurons to be connected, in order to be able to solve various problems (which figuring mostly goes on to this day).
The name neural network here hints at the generic pattern of connecting the neurons. Neurons in a neural network can roughly be imagined as the stitches in a knit hat or pullover, sitting side-by-side each other and also following each other layer by layer… only they are weird multi-dimensional pullovers, knit for each use-case separately. And of course your average knit pullover doesn’t take data as input at your waist, only to output answers around your neck.
![A multi-layer neural network. [https://commons.wikimedia.org/wiki/File:Neural_network_picture.png]](https://www.juremy.com/blog/brief-history-of-ai/multilayer-nn.ee5f2c2ffb0f41bbb5633e21326e647a74ca8ccd8753ff510e0ecd0fb88afea6.png)
The first rather simple but useful and somewhat successful pullovers were the Multi-Layer Perceptrons (MLP). They could not be knit large (that is, deep, having many layers) though, as they suffered from a big problem: the more layers there were, the harder (to the point of impossible) it was to adapt the behavior of the neurons by the supervising part of the program.
(To make sure this article doesn’t end up a comprehensive piece, we completely ignore alternatively knit pullovers, such as Recurrent Neural Networks (RNNs)).
So the neural things, as might be expected from history…
Flop again, and rebound with Deep Learning
Or at least stall. While MLPs had their specific use in some areas during the early 2000s (image and speech recognition), there was not much development expected from them.
This stall was broken by making Deep Neural Networks practical. A deep network is one with many layers of neurons, for more processing capacity. Problems of training deep networks were slowly tackled and improved on from the 1990s, and the efforts (along with development of faster hardware) reached practical breakthrough by the late 2000s.
![A deep neural network. [https://commons.wikimedia.org/wiki/File:Example_of_a_deep_neural_network.png]](https://www.juremy.com/blog/brief-history-of-ai/deep-nn.99aca25ae364d33c5542e861e4e4a93ea123069232fcb9feff969dddd89d0b10.png)
Computer vision, one of the earliest driving cases of AI (for example useful to recognize the contents of an image) along with other historical challenges – such as playing Go – are now on track to being solved.
What about another challenging area – translation? We should review the state of machine translation in our next article. Stay tuned!