AI winter, plus mindless skynet.

It now becomes obvious that the trouble with Artificial intelligence is that it is not intelligent and there is no one at home.

When they first came out, I said to myself. “Wow. Passes the Turing test.”

No it does not. If you actually try to make use of it, it then proceeds to fail the Turing test.

They are just really great search engines with a wonderfully compressed and very large database. They do a pattern match on the prompt, and it sounds like they are reasoning, sounds like there is someone at home, but they are just finding matches in their immense database and performing a find and replace on the pattern and giving you a transform of stuff in their database.

In other words, large language models are just Eliza on steroids.

Everyone with a cat thinks their cat is a person. No one with a tesla thinks their car is a person.

The only actual use cases of AI are self driving, spam generation, graphics generation, and programming. And if you use an AI assistant while programming, it rapidly becomes apparent that this is not an a helpful robot engineer, but merely a good way of finding code examples pulled off Github that do something similar to what you want, a useful way of finding example code to imitate.

It would be a lot more useful if it also gave you links to the material that it is transforming with its search and replace, since, predictably, the search and replace is apt to generate silliness, being done without any real comprehension of what it is doing. This is what the Brave Browser search engine seems to do, and it is quite useful. Seems like the best AI search engine. Probably far from being the best AI, but it is trying to do what so called AIs actually do do, not trying to be intelligent.

Microsoft cocreator, an art program which comes with spyware that watches everything you do on your PC and attempts to generate a concise summary for your enemies, has the interesting capability to take both a text prompt and a sketch prompt. But for this to be actually useful, it would have needed to train on a gigantic library of sketches mapped to completed drawings, which it obviously did not.

An actually useful capability of AIs is enormous compression. Well, not that enormous. It is a factor of twenty or so. Maybe a hundred or so, but compressing at a hundred or so, you get the usual problems of overcompression with a lossy algorithm. For actual usefulness, aim for a compression of around ten or twenty. Anything higher than that is going to cause trouble.

If trained on substantially more than twenty tokens per parameter, the search is going to bring up a result that may well be painfully garbled. All you are really doing is compressing and indexing a database. If you overcompress, going to get compression artifacts.

AI translation is just a pattern match and replace on an enormous database of existing translations. If all it had was the Rosetta stone, would not be able to do a thing with it.

So an AI that actually functioned as a compressed archive of data that you wanted around, and wanted to be able to locally search, would be actually useful. There is going to be value in producing curated archives that fit on your local computer, and since such archives must necessarily limit what they contain, the greatest value will be in curation, rather than chucking in the kitchen sink. It is just a database with an Eliza interface.

And since most of the value, and most of the cost, is going to be in the curation, what is curated needs lossless compression. The Eliza interface need to bring up both its transform of the the patterns it deems related to the query, and losslessly bring the original untransformed data.  It is just a database search with an Eliza UI.

And you should then be able to tell it “ignore this source and that source, find some more sources like this other source, and have another go at the transformation.”

One can do very good lossless compression with an AI — but as yet no one is doing anything useful with that, though a losslessly compressed archive with a free form querying mechanism would be very useful.

But, during AI spring, everyone was dazzled by what superficially looked like real intelligence, and just tried harder and harder to get real intelligence, instead of actually using it to do what it actually did.

It is just an archive with great compression, great search, and great transform capabilities.

A self driving Tesla has a big archive of driving situations, and pulls out whichever course of action worked for another driver in a seemingly similar situation. Which is largely how human drivers drive in practice, plus it has access to immensely more driving experience than you do, so that is quite workable, and potentially in many important ways better than a human driver. But it cannot do what a dog can do. It relies on superficial situation matching, without real understanding of the actual situation. And once in a while, the pattern match is going to come up with something silly.

Repeating. Just a wonderfully compressed database with wonderful search and replace. And you can tune the search and replace function so it sounds like it is holding a conversation, which may well be a good interface model, but a model that is somewhat detached from the reality of what it is actually doing.

Obviously, one routinely wants to search the entire internet. And, obviously, with a useful compression of around twenty or so, one cannot store the entire internet on one’s own computer. So one relies on a service, which does have a searchable compressed version of the internet stored.

And this service has immense and dangerous power to present a systematically falsified version of objective reality and social consensus. It is the ultimate tool of the priestly class, which has the potential to immensely increase their power.

We are now moving into a world where every priesthood is going to have to have a big large language model search engine at its center.

The Dark Enlightenment is far short of the resources necessary for that. Musk, however wants an AI (search engine) that will present a true account of reality and social consensus. Which is more difficult than it sounds when most of the most of the material being searched is AI generated search engine optimised spam spewed out by enemy AIs (search engines)

This problem is fixable by three measures.

1. Preferential credit to older data, stuff generated before search engine optimisation was a thing. At present there is a rule against including that data because sexist, racist, homphobic, imperialist, and unduly influenced by capitalism and modern capitalism, rather than our highly enlightened postmodern capitalism.

2. Find entities claiming direct first hand knowledge, and check if they truth tellers, by checking them against other entities that are already somehow known, or reasonably believed, to be truth tellers.

3. When evaluating social consensus, preferentially weight the social consensus of known truth tellers claiming first hand experience.

A large language model is just a compressed database with a powerful index. And though the value of the database depends on how much data is in it, throwing in lots of rubbish will diminish, rather than improve the value. So you need to curate the training data, and rank the training data.  And if you spend a lot of money on ranking and curation, need to losslessly compress the training data.

26 Responses to “AI winter, plus mindless skynet.”

  1. Thank you Jim, this is a wonderful synopsis of what large language models are actually doing. They are not “artificial intelligence” and the overuse of that phrase is maddening.

    Extra bonus points go to you for remembering ELIZA. And it’s not just a sarcastic comment either, it really is the exact same thing, just with a much larger data model and the ability to do far more parallel processing on that model.

    Large language models can be useful, but they are not *thinking*. They are natural language search engines. And they’re only as useful as the data that was put into them (which itself is a problem when Big Tech only programs them with the Globohomo epistemology). And as Tolkien correctly pointed out, it cannot create anything new, it can only mock.

  2. bob sykes says:

    China is using AI (on 5G networks) to optimize their highly automated factories and ports. These are highly restricted problems that AI might actually be good at.

  3. Upravda says:

    Of course it is not actually “intelligent”. And I’m using it (Chat GPT) every day since the beginning of 2024, for programming help. Very useful for that, and quite a time-saver, but intelligent? Absolutely not. And it never will be, it would be deus ex machina, an impossibility.

    Another field of usage is generating congratulations card (web-based MS Designer) for both kids and adults. I’m not kidding.

  4. Fidelis says:

    I’m not sure what the transformer is doing can be properly modeled as a search engine. I think that they have such a good memory that the training scheme selected for logic circuits using their memory to solve the problem, as they are not recursive this would be the most foolproof way of reducing error when predicting next tokens or appealing to the dumb tasks it was RLHF’d on.

    Just to be able to parse natural language and code to return anything relevent there has to be some internal abstract model and reasoning that occurs. It does parse and partition its input, and they increasingly parse and partition their input in ways humans find sensible. Look at how insensible they were just a few years ago, how weirdly they catagorized their data and compare to today.

    Now the next step is having them run in loops, having them consume their own output as they search for solutions in a problem space. This is difficult and fragile, but also very underexplored. We’ve only been treating model scale as a serious concern during the last few years, before then the people working on these focused on leveraging cleverness in architecture design alone. You couldnt have anything close to what happened with the gpt4 until you scaled at least the total compute to a similar level, and no one was doing that.

    I couldnt tell you what the market is going to do, or where the future hype cycles will go. I will say my intuition tells me we do have something that can train itself to think, when before we did not, and when serious efforts to apply recursion and self play are attempted, we are going to see step changes in capability.

    Typed the below before realizing the evals arent posted anywhere publicly yet. The git repo will be updated soon enough with them, but its runnable now if you want to do some validation

    For an example of just how much performance is locked up in dumb forward pass next token schemes, without retraining but having a — fixed and programmer hand crafted, not machine optimized — clever sampling method, there’s a sudden huge jump in capability.

    https://github.com/xjdr-alt/entropix

    • jim says:

      > I’m not sure what the transformer is doing can be properly modeled as a search engine.

      For it not properly modelled as a search engine, an index, and a compressed database, it would need to produce content interestingly different from content in its database.

      Ask for that, and we get the cowboy in a space suit, and renaissance paintings with dogs instead of people.

      Notglowings picture was pretty good, but it was the same picture one would get using Gimp and compositing three layers, top layer being a man on a country road, next layer a cityscape, and the next layer a nuke test.

      And if I did it by compositing, the nuke would have been in the middle of the city, with the foreground high buildings in the city layer in front of the nuke test, and background high buildings behind low level firestorm of the nuke. Its compositing was 2D, rather than fake 3D. Each layer was behind the previous layer.

      And the code similarly seems to be complete lifts from Github.

      > Just to be able to parse natural language

      I don’t think it is parsing natural language. I think it is pattern matching against the vast amount of natural language in the database. Eliza on steroids.

      If it understood natural language, I would expect the chain of reasoning approach to produce substantial improvements. Chain of reasoning sounds a lot more like parsing natural language than some LLMs, but does not produce results excitingly different from something that looks like a search engine. An LLM that just paraphrases those texts that seem relevant seems to produce an end result of similar quality.

      • Fidelis says:

        I believe we are coming at this from different angles.

        You’re talking about the apis out now, available for use. I’m talking about the potential of the transformer architecture when combined with clever frameworks. Sure, if you mentally model the available apis as a natural language query database, they’re perfectly usable as such. I dont think this is their limit, I dont think we need a large redrawing of the architecture to get thinking. Just better training frameworks that arent merely ‘predict the data distribution of this text/image’ and ‘match this query template please’.

        For an example of objective and useful creativity, I say alphatensor succeeded in finding a new more efficient way of implementing matrix multiplies.

  5. c4ssidy says:

    I had a function which felt extremely bloated and produced an elaborate tree of objects and navigated through them to solve an unique problem. One day, infuriated by the bloat, I decided to look carefully, and concentrate very hard, took walks while thinking about it and so on, and intuition told me I could do it with a much smaller i loop and without trees. Every time I attempted to describe the idea (new instance, no memory) it would just vomit out the giant tree, as if it had no ability to comprehend what I was getting at. Eventually I just applied my human brain and built it line by line. By the time LLM showed any comprehension it was pretty much already finished, and therefore unhelpful. The effort saved me a absurd number of lines. Since the problem is unique I concluded that this meant I had done something new. People have warned about it being a regurgitation of voices, but that experience was particularly faith shattering. Maybe someone somewhere tackled something close to my earlier problem, and build the giant object tree to solve it and published it, but not having a lot of free time to daydream on the topic, didn’t notice the option of an elegant tight i loop

    I’ve also at times used a combination of AI and web searches to solve a problem, and have found what is obviously the exact webpage which the AI is echoing in its own words, down the smallest (often inappropriate) details

    • jim says:

      > I’ve also at times used a combination of AI and web searches to solve a problem, and have found what is obviously the exact webpage which the AI is echoing in its own words, down the smallest (often inappropriate) details

      All this would be a lot more useful if they admitted to themselves that it is a search engine, and the engine gave you the links to the sources it is summarising and paraphrasing.

      This would also relieve the hallucination problem. Hallucinations are in large part artifacts of excess lossy compression, and would be less harmful if accompanied by links to primary sources.

      Since there is always an incentive to compress until the pips squeak, our large language models are always going to hallucinate.

      But the best cure for hallucination is less training data. If you dump spam and search engine optimised spam over it, it is going to forget the good stuff. You should only use as much training data as is actually going to fit.

      • Fidelis says:

        Hallucinations are actually bad sampling. It hits a basin in the probability space and because of the way the sampling is handled, plain ol beam search, it cannot find its way out. The entropix repo I posted in my other comment had great success in reducing hallucination, as now it does not double down when it falls into one of the probability basins (what I mean by basin in this context is, because of previous tokens determining probability of the next, if it is unsure and picks one token of many, that token produces a path dependence and it cannot backtrack out.)

        Also check out perplexity.ai, their rag is pretty good.

        • jim says:

          > The entropix repo I posted in my other comment had great success in reducing hallucination

          “had”? Or expect that in theory it should have.

          The Entropix repo says: “The goal is … This should allow us … to get much better results using inference time compute … This project is a research project and a work in progress”

          His theory is that it should be able to suspect that it is making stuff up, then back up and have another go. My theory is compression artifacts — that it will hallucinate when it does not quite remember the text that matches the query. And if it does not quite remember, it might avoid hallucination by changing the subject, much as LLMs currently do with thought crime prompts, which is not a great improvement.

          Hallucination looks to me like a lossy compression artifact, and I therefore predict it will be radically reduced if you do not go overboard on compression, if the number of training tokens is only about ten times the number of parameters and you do enough training on that limited training set, should not have hallucinations. What was the entropix ratio?

          He is going for unreasonably large model sizes, which take an unreasonably long time to train. If he speeds up the training by cutting back the training set, I predict outstanding immunity from hallucination.

          • Fidelis says:

            Yeah I realized upon posting that what is in certain chatrooms involved with the project’s creation is not entirely reflected in the repository. This will change, evals have been done and are currently being compiled. I did preface this in the comment, and do apologize for a half baked delivery.

            The hallucination thing is an artifact of the training, it is one way sampling. The entire training cycle from the raw text to supervised templates to RLHFing, there is only penalty for wrong answers, there is no reward for saying “I’m not entirely sure, however…”. Internally you can look at the logits, the probability space it returns, and see where it is unsure. These high entropy spaces are where hallucination occurs. You can manipulate the way it navigates these spaces manually, with a hard coded search on the variation of the entropy over token strings, eventually a less hardcoded search and instead something like MCTS with policy optimization, and when it hits these crossroads force it to go back and reflect, or inject a token that redistributes the probably (in practice this is inserting things like ellipses, or ‘wait, is that right?’, future iterations will handle this in a more general way). So what happens with beam search, is it is forced into one of these branches, and after choosing one low probability token of many the path is set, and it cannot exit. Has no idea what exit is, because they are trained linearly in a very fixed input string -> output string one pass fashion. If we are sticking to the database metaphor, it had a bad retrival and could not recover, but this metaphor I feel is very restrictive for conceptualizing the process, as what is really being retrieved at any point is a probability distribution.

            There is no training involved in the repo. It is entirely sampling, going from beam search to something that can move back and forward in the generated token string based on model certainty.

            Going back up the stack here, what exactly would be a sign to you of something that is more than just database retrieval? What are humans doing differently in their pattern matching that does not count as retriving from past experiences and swapping and combining concepts?

            • jim says:

              > what exactly would be a sign to you of something that is more than just database retrieval?

              LLMs making themselves useful by producing stuff that is not in their database. (And I don’t count trivial combinations of stuff that is in their databases, like an astronaut riding a horse, as stuff not in their databases.)

              Automatic summary that aggregates information from diverse sources to draw valid conclusions not in those sources.

              Coding assistants are undeniably useful. But my experience of coding assistants looks very like database retrieval.

              I just saw the first couple of minutes of a video on Ukraine, which acknowledged that Ukraine is running out of troops and the west is running out of weapons, but argues that things are fine because the Russian economy is feeling the stretch. So, I asked an AI to opine on this issue. Who, I asked, is being attrited more? Received some random facts mixed with a pile of rote official bullshit that verged on incoherence. Because its database contains mutually conflicting stories, it produced an answer that was profoundly unhelpful and uninformative.

              Because the data in its database did not add up, produced an answer that did not add up.

              It would have been appropriate to say “some say this, some say that, and some say the other”, but it reproduced many conflicting voices as a single voice. Much like putting an astronaut on a cowboy horse. It retrieves the data, and integrates diverse data into a response that has normal flow of syntax, but not normal flow of reason.

              • Fidelis says:

                I see, this helps me understandwhere you’recoming from. I suspect the functionality you’re describing is a long way yet. That is a fairly general reasoning capacity which I suspect will require very vague and general goal-directed training.

                In that case, when it comes to navigating information space like this, will remain a choppy database for a while yet. The next stage is very narrow goal directed cheaply simulated tasks. I expect them to get rather good at filling in code feature requests in a good enough fashion rather soon, and filling them in a fairly well optimized way soon after that. I don’t expect them going from spec to codebase anytime soon. Too expensive to simulate the creation of large codebases, and likely very hard to create a training objective general enough to even get single feature requests done well, but certainly plausible.

                To get the functionality you are describing, to get any functionality at all, it has to be an auxiliary task found within a more general training objective. A training objective general enough for “summarize a pile of text in a way that presents the information in a way that is maximally actionable” to be a common subloop, would require quite impressive feats outside of summarization alone. You get a jarbled lump because it is trying to compress the text information in such a way that the model itself would find useful for decompression, but the model only knows encoding and decoding. It does not know observe, orient, decide, act. So the information is not nearly as useful for someone who wants a compression that is actionable, even if the action is “test this information against previously aquired world knowledge”. Mostly because researchers and companies with access to lots of supercomputer time are interested in stepwise safe progress that can be presented at the next powerpoint, ahen, research conference. Its computationally expensive, conceptually hard, hard to implement in code, and very fragile so likely to fail and have you perish for lack of publish.

                So we will see very jittery tentative steps that very slowly and iteratively expand the capabilities of the present stack, with the most progress being made in domains with cheap and obvious to implement simulation. When it comes to search and compression, better RAG, better markers on certainty, and better compression. When it comes to coding assistance, better more general performance for things that are slightly more than pure boilerplate, followed by ability to implement entire features in a clumsy way, followed by a more optimized but more gpu time expensive way.

                • Fidelis says:

                  For further elaboration on what generalized training is and does, consider the image generation problem.

                  Your complaints amount to, summarizing crudely, it does not have a good world model. A world model was not required to generate a decent enough probability on pixel chunks that was the training framework. So it has some rough idea of shapes and forms, has some rough idea on the most likely placement of these rough shapes and forms.

                  When the video generation models start reaching similar quality, I expect the models placement of shapes and forms to be much more reasonable. In order to predict the next frame, it has to have an internal world model of how things move and interact, which will spillover into static single frame generation as well. And when longer videos are successfully reached, more than the few seconds we get now, you will see even more improvement, as the world model will have to become sufficiently general to correctly guess the progression of a scene.

                  Consider this a hypothesis. If the video generation models do not produce noticeably more reasonable responses, especially as the video context gets longer, I assent we have indeed hit a dead end.

                • jim says:

                  Create to slightly different unreal engine scripts. Train an AI whose inputs are one of the scripts, to output the other script. When it gets good at it, close the loop, that it generates a script that generates a video that matches the other video. Now you have an AI that can handle three dimensions.

                • jim says:

                  > Your complaints amount to, summarizing crudely, it does not have a good world model.

                  That is my complaint about the art programs. My complaint about AI in general is that it is a powerful search and replace engine running over a very large compressed database. Eliza on steroids. The Turing test pointed us towards creating Eliza on steroids.

                  If you want an art program that understands three dee from two dee, will need a very large database of unreal engine scripts and the videos they generate.

                  We do not understand what consciousness is, and the Turing test was a cop out saying we do not need to understand.

                  Perhaps it is something simple, which in retrospect will be as glaringly obvious as the Wright brothers realising a flying machine needed three axis control. Perhaps it is the breath of God.

                  When I asked an AI who was winning the war of attrition in the Ukraine, the answer was bad, because the AI was summarising mutually contradictory narratives into a single narrative, without awareness that there was a contradiction.

                  Fluyx ai gave a good image compositing a nuke and a city, viewed by a man on a mountain on a country road. But, of course, it did not quite grasp what “hit” means in this context.

                • jim says:

                  Yes, it is improving. And will improve a whole lot further. But it still Eliza on steroids.

                  “summarise a pile of text in a way that presents the information in a way that is maximally actionable” is necessarily going to be based on a huge pile of texts that human curators deem to be maximally actionable, and it is going to summarise arbitrary text in a way that resembles those texts — but it will not understand what “actionable” is. If the actions resemble actions that are in its enormous human curated database of texts deemed actionable, then it will work fine. If the actions are outside that scope, will suck. What it is going to do is a very sophisticated search and replace where it replaces stuff in matching text deemed actionable by human curators with stuff from the pile it is summarising, without actually knowing what an action is.

                  Musk believes the next big thing is robots. And he is usually right. Eventually. Usually takes a whole lot longer than he expects. He is having enough grief getting Tesla’s to drive autonomously. Navigating a living room or a factory floor is a whole lot more difficult. On the other hand, bumping into things, knocking things over, and dropping things in a living room is a lot less disastrous.

            • jim says:

              So he has an already trained model, and is modifying the way it works so that it will stop, backup, and start over when it finds itself in deep water.

              This is “a work in progress” which is visibly broken and not all that useful. He is asking people not to send him bug reports. If backing up cuts down hallucination, and all the other stuff continued to perform as well as the original model, then he can claim a major advance. If backing up stops hallucination by the model just breaking and not performing, somewhat less impressive.

              My prediction is that the model will find itself in deep water when what it should retrieve has been overcompressed, and whatever he does, and whatever it does, the outcome will be that it is not correctly retrieved. And, since he is asking people to stop raising issues, it seems that a whole lot of incorrect retrieval is going on. It might not be hallucinating, but obviously it is doing something his testers do not much like.

              • Fidelis says:

                Much of my impression on this project is the output is qualitatively better compared to regular sampling. It also performs better on the “reasoning” benchmarks, by a large margin. It doesn’t eliminate hallucination, but significantly reduces it by doing a better retrieval. A lot of the samples are on the guys public twitter account

                https://x.com/_xjdr/status/1840782196921233871

                Its very raw because it is a scrapped together side project that exploded in popularity very suddenly. The repo was just me pointing out that a very low hanging fruit, not doing forward driven beam search exclusively to sample, was not picked until some pseudonymous researcher spent some spare time tinkering with it. The point being that we have not come close to exhausting the utility and finding the limits of behavior on these rather early iterations of the ‘foundation model’ concept.

                • jim says:

                  According to the The Internet of Bugs, ChatGPT-01 is the best ai coder around.

                  And in the above video, at the linked time, he tells a story about a massive blind spot, and his explanation of the blind spot is that it is just a very powerful search and replace interface to a very large database. Eliza on steroids.

                  He calls this “lack of judgement”, but when he explains why it lacks judgement, it is because there is no one there. It is just a database search and replace.

                  ChatGPT-01 Preview model implements chain of reasoning. Which is not in fact train of reasoning, but rather repeated runs the smart search and replace to see if the pile of things it has cobbled together from the internet actually fit together and make sense. It is vastly better at programming than anything else around right now, though I expect that everyone else will also be running “chain of reasoning” soon.

      • It’s basically just cribbing from Stack Exchange, which is what most lazy programmers are already doing.

  6. Varna says:

    Still total pre-vr submergence, we can already see the outlines of the evil deceiver entity in the Descartes thought experiment. Sensory manipulation, cognitive and emotional manipulation, identity formation, behavior programing.

    For now, direct contact with the deceiver is only when consuming content.

  7. notglowing says:

    They are just really great search engines with a wonderfully compressed and very large database. They do a pattern match on the prompt, and it sounds like they are reasoning, sounds like there is someone at home, but they are just finding matches in their immense database and performing a find and replace on the pattern and giving you a transform of stuff in their database.

    It’s more than a search engine with pattern matching. At the very least, they contain programs fully capable of processing natural language.

    They may not think in the same sense humans do, or be self aware, but by training on trillions of tokens of text, while they may not have gained a true world-model from them, they seem to have at least gained full command of natural, human language itself, and are able to process it like software processes numbers.

    Able to deal with inputs that aren’t exact and which don’t narrowly fit criteria that have to be intentionally set by programmers. It’s a new world of programming where the behaviour of software is less predictable but more holistic. Software that understands the context of inputs better, and the intentions of the user to some extent. We can genuinely do so much more than we could previously. Things that were totally unthinkable even 3 years ago.

    “A database with great compression and search and replace” is a decent metaphor for what it can do, but also too reductionist and somewhat unfair in my opinion.

    It’s capable of more than just S&R, even if it lacks the understanding a human would have, it can automate processes, especially when used with chain-of-thought and tree-of-thought, that go significantly beyond that.
    It does “compress” data, but it does so by extracting the abstract patterns of information within it, rather than simply storing it like a lossy jpeg. That is a form of “understanding”, even if primitive. It’s why it’s capable of producing things that simply aren’t in its dataset. It doesn’t mean it can come up with totally original solutions to completely new problems, but it isn’t just searching and retrieving – it’s applying these patterns it learned to the new input, through these “programs” that were implicitly created in its “neurons” (ie, the weights) during training.

    You say they do not pass the Turing Test. This is true, but it depends on the examiners. They probably would’ve passed a few years ago, when our expectations were lower. Nowadays, not anymore.

    What I find most interesting about it, is that even in science fiction that has real “thinking” robots, the robots sound a lot less human than our “fake” AI. Of course, to an extent, they’re intentionally made to sound robotic (in the manner of speech, I’m not talking about the voice itself).

    Still, I can’t watch that stuff anymore without cringing, because it feels like they totally missed the mark. We invented AI that sounds like a human long before AI that can reason like one.

    Superficially, LLMs sound very human, and when that isn’t true for me it’s mostly because I’ve gotten so used to GPTisms that I spot them immediately.

    • jim says:

      > It’s why it’s capable of producing things that simply aren’t in its dataset.

      No it is not. Everyone’s first and most famous example of originality is a cowboy in space suit, which is just combining the cowboy pattern with astronaut pattern, and the reason no one ever did that before is because it is stupid, and the AI does not know it is stupid.

      I ask the AI for code that does X. And it spits out code that does X flawlessly. But X is in fact not exactly what I wanted, so I ask for code that does Xq. And it munges together someone’s code that does X, and someone’s code that does q, somewhat successfully. Then I ask it to generate code that does Xq+, and it fails dismally, I blow it off. But I now have helpful examples of code that does Xq, and code that does q+. Which is useful, but not “things that simply aren’t in its dataset”.

      When I actually need things that simply are not in its dataset, not useful in practice. Using a coding assistant, it becomes obvious it is just a very capable search engine for example code.

      If it could produce things that are not in its training set, it would be writing my code, and it is not.

      And when using a LLM search engine to search the internet, becomes obvious that including all the spam and search engine optimised spam in the training set was not a good idea. It generally does not bring up the spam in the list of links, because of the ranking algorithm, but it looks like training failed to employ the ranking algorithm, or failed to employ it correctly. Everyone uses the same training set, and the training set is full of spam.

      I wanted an image of a man on a country road on a mountain overlooking a big city in the distance, and the city is hit by a nuclear bomb. Failed because of inability to grasp the idea of things in three dimensional space, so it just banged country images, nuke images, and big city images together randomly in ways that don’t make sense physically or aesthetically. It looks like it had a pile of nuke test images, a pile of big city images, and a pile of country images, but was mighty short of any images containing any two of these elements in the same image, let alone all three.

      If you ask for cowboy plus astronaut, works because it has a pile of sitting astronauts, so just does a search and replace on the sitting cowboy. But it fails to understand that nuke plus city means city destruction, because it is short on images of nuke hitting skyscrapers.

      That is just a very powerful search and replace, not “producing things that simply aren’t in its dataset” becomes obvious when you are using it as a coding assistant.

      • notglowing says:

        I wanted an image of a man on a country road on a mountain overlooking a big city in the distance, and the city is hit by a nuclear bomb. Failed because of inability to grasp the idea of things in three dimensional space,

        You must’ve tried this a while ago, had bad luck, or not used a SOTA model, because it worked first try for me:
        https://files.catbox.moe/s4a7o0.jpg
        Flux isn’t intelligent, not even to the extent that ChatGPT could be called that, but it has better prompt following than previous models, and a better grasp of how to put objects in relation to each other. This task isn’t impossible, even with a curve-fitting model that can’t truly reason and has no real world-model inside. Because yes, it’s true that it doesn’t truly “grasp” 3-dimensional space, but it has enough examples to learn how to place them in a 2D images in many cases.

        I ask the AI for code that does X. And it spits out code that does X flawlessly. But X is in fact not exactly what I wanted, so I ask for code that does Xq. And it munges together someone’s code that does X, and someone’s code that does q, somewhat successfully. Then I ask it to generate code that does Xq+, and it fails dismally, I blow it off. But I now have helpful examples of code that does Xq, and code that does q+. Which is useful, but not “things that simply aren’t in its dataset”.

        I totally believe this, because AIs frequently get hangups of this sort, but they also frequently fail at tasks that *are* in their dataset and which they should be able to solve.
        I use it every day to assist me in coding, and I’ve had a better experience than what you’re describing. I find that it can be successful in doing X -> Xq, especially lately with the new o1-mini model from OpenAI, which takes several steps to do things iteratively and correct itself/split work in several steps.
        Even o1-mini sometimes fails at basic things that you could find examples for online. It’s hard to evaluate models based on observed behaviour when it’s not consistent.

        Again, I’m not calling the AI creative or “thinking” in the way a human is, but I think that its ability to manipulate text is more sophisticated than simple S&R. Perhaps you believe that the instances where I experienced its best capabilities are flukes (which is always possible given the stochastic nature of the system) or that I underestimate how much of what I’m doing is in its dataset. But I still think your take is a little uncharitable.

        Ultimately, we do agree that the AI does not have a real world model, and it is limited by its training as a text completer rather than a thinking machine. We don’t have a real AI yet, but we do have something capable of commanding natural language in an automated fashion. And it *is*, most of the time, mostly useful as a superior search engine. That is its main use case at the moment.

        • jim says:

          You got a way better nuke image than I did, good aesthetics, physically plausible, but it is still merely a bomb test image added to a city image — the nuke is not hitting the city center, but some place beyond the city. Picture of man on country road, picture of city, picture of nuke. But not a picture of the nuke hitting the city, which is what you asked for. It simply found three relevant items in its database, put them all in the same frame, but did not really combine nuke image with city image.

          The man is high enough that he should see some sky scrapers outside the blast radius to his left or right beyond the bomb, making it clear that the blast is flattening stuff. But that would have resulted in the blast radius image making the city image less similar to the city images in its database. Good image of city, good image of man watching from the distance, good image of the bomb, which is better than I got. Not so great at hitting the city.

          Your request for a hit, implies the city further away, the nuke closer than in the image provided, that they should be at comparable depth in the picture from the man watching from a country road. Instead, the layering is man, country road, city, nuke. Should be man, country road, city hit by nuke.

Leave a Reply