Fighting the hype is as crucial as understanding the basics.
The hype is the worst enemy of transformation. And the word intelligence in A.I. does not precisely help. Recall it was a mere tag given by... someone... decades ago. When the algorithms and the technology clearly were nothing close to intelligence. The guy leveraged marketing in a very SciFi way. And that put unnecessary pressure to advanced statistics via commercial expectations.
Let's abound on that topic with the excuse of analysing chatGPT given its massive media coverage.
A.I. is not I. because it is bounded by our own I.
January, 20th, 2023
There has been a recent hype again around General I.A. due to chatGPT. It makes sense, as it is very impressive from a user point of view (see Turing Test). It is a disruptive tool - no doubt. But... is it really disruptive from an academic point of view so that we can even consider the discussion around the I of the machines?
Why? Well, in our view it is relatively easy to explain: try to publish a paper that is disruptive vs a paper that proposes an improvement in the third iteration of a trendy approach. You will soon go down the second route as it is overall best for you as an academic individual. But that behavior is precisely a rabbit hole for the entire society (similar to what happens with Digitalization at your company). Anybody's fault? Not quite - it is how the game has evolved. And players are just players - not super heroes. It took Marta Diez-Fernández and myself years to build and fund our own Centre of Excellence (CoE) precisely to enjoy the fun (and frustration) behind disruption.
In fact, we strongly believe CoEs will be at the core of the companies (rather than aside) which is probably the best news to unlock disruption's incentives (nothing new under the sun at a number of research-driven industries). But lets leave that discussion here (along with the one around whether Applied Science is Science or Science Applied is Applied Science, which I love) to use the hype of chatGPT as yet another example that explains why nowadays AI is still advanced statistics (with a lot of code) rather than intelligence.
How does chatGPT, which is in turn built upon GPT-3, works? Well, out of the myriad of details, I would highlight the following five:
1. Research: it is the result from a series of papers where a tone of smart scientists have added consecutive value add. This is, chatGPT's algorithm's evolution was smooth along time and across people, not an all-of-a-sudden discovery coming from an isolated company. Kudos to all those scientists from here, btw!
2. Data: the main barrier-to-entry in order to create this type of models is the data. Where does most of the data that trained the algorithm come from? From a non-profit organisation called Common Crawl (this one robustly non-profit) that has been saving data from all around internet and with the right structure to be then exploited.
3. Tokens: that's the right structure to exploit the data I mentioned above. And here lies the true pattern recognition behind the algorithm. The machine is not fed with sentences but also with vectors of information about the different components of the sentence (you can eloquently imagine that it distinguishes words from verbs from maths' symbols from...). Yet not their meanings.
4. Neural network: wait for it... wait for it: "for those of you who know regression and are hence used to dummy variables that, ultimately, break a model into two calibration vectors by splitting the data based on supervised features, you can assume neural nets are advanced forms of the same thing where the machine decides in an unsupervised manner how to combine and define the dummies across the sample". Well... no one really knows how to interpret neural nets. Anyhow, they are prone to overfitting which plays a massive role here: when you have pretty much the whole population of a distribution that accounts for a steady structure (e.g. our language) you can safely accept neural nets. This is one of the reasons why they don't work that well in, for instance, financial markets. The model chosen is autorregressive, meaning that it takes into account a sequence of tokens (a structure of the info of a sentence as explained above) which is the key behind the Q&A structure - this is, very basically, it looks up the structure of the question and answers with what typically immediately followed.
5. Budget: oooops, how many brilliant academics can spent hundreds of millions calibrating a machine upon the previous 4 points? Only those with access to funding. Very large funding. Beware, spending huge money the right way is not precisely easy either. It requires a lot of data engineering so kudos to that. But yes, there have been a lot of barriers to entry gently knocked down in the shape of open source for GPT-3 while chatGPT has not been open sourced. And that is not bad either until we take into account that is has been trained upon Common Crawl data which does not guarantee any royalties protection from all over the internet.
Plus some extras around reinforcement learning and supervised learning that are very significant yet I wouldn't say they add too much eloquence here.
So, now you should have some intuition behind what to expect from the I in AI given chatGPT. It is a great incremental academic innovation and a probably disruptive tool. It accounts for errors because it is not looking at concepts or ideas but at overfitted detection of patterns upon those tokens (which are improvable). It looks smart because the data is smart - it is us!. And it won't update fast because the data has to be first crawled from all over the internet and calibration is expensive.
Interestingly enough, Turing's chase was a very good provocation to get us here after all these years but his test can be hacked with mere advanced statistics and a lot of code. And that is awesome, don't get me wrong. I love it and it is complex. But it is not intel, my friends.
So, just wonder now: is the forecast of a linear regression intelligent? Clearly, it is more advanced than an average or last or... And it is a great scientific advance. But it is not intelligence. Guess what: neural networks, natural language processing, etc are pretty much the same: projections upon past data. And no one is yet letting the machine tag itself concepts, etc. We are not mixing supervised learning with unsupervised learning (nor simulation) as much as we should to even get closer to the word intelligence.
Why? Amongst other things, because we are missing the "Machine in Machine Learning" (the M in ML). We have models but who said intelligence leveraged a massive model? It leverages a myriad of them. That's probably the brain - the orchestrator of all those models. And for that we will have to go deeper on the way we think we think. An area where science meets philosophy which we should see grow in the coming decades. And here you go the hint: we, ourselves, are just starting with that M in ML in our centre of excellence, SciTheWorld.
Thanks again for reading!
This web page was started with Mobirise