Because they don’t really search or index quality content (it’s very expensive and hard to do) and their search implementation really sucks, they don’t do any real improvement.

The process is like this:

  1. Take the user query and create 1-3 queries. For this process they use very stupid but fast and cheap models; because of that, sometimes they create very stupid search queries and, unlike a pro, they don’t really know how to use search engines, like filtering, ranking, focusing…
  2. Combine these search results (it contains slop AI-generated summary pages, YouTube videos, maybe forums, maybe Wikipedia…).
  3. Use RAG with an LLM to find answers. LLMs will always try to find answers quickly, and instead of making a thinking loop in a long article they will use that slop page with a direct answer.

As you can see, there are many, many problems in this implementation:

  • The biggest problem is citation: they cite confidently but it’s wrong.
  • They use low-quality data, like auto YouTube subtitles, improperly extracted tables and elements, content-farm sites, copycat sites, corporate blogs…
  • Their search results are low quality.
  • For the most important part (breaking down the user request) they use cheap, stupid models.
  • They handle all data in the same context instead of parallel requests (which is very expensive)

It’s still strange to me: we always say “they have all the data, all the money, all the hardware…” but they still can’t create a better AI search than random FOSS developers.

  • moakley@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    7
    ·
    1 month ago

    That’s impossible, because most of my searches are literally as fast as me typing the query, and then I get the answer.

    That’s why I’m asking what you guys are searching for, because this has been a dramatic improvement for me.

      • Zerush@lemmy.ml
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        1 month ago

        Andi said:

        The seahorse emoji phenomenon reveals a curious case of mass misremembrance - there has never been a seahorse emoji, yet both humans and AI language models firmly believe it exists[1][2].

        When asked about the seahorse emoji, large language models respond with complete confidence that it exists, then spiral into confusion when trying to display it, often outputting random fish or horse emojis instead[2:1]. This behavior stems from the models building an internal “seahorse + emoji” concept that crashes against reality when no matching token exists in their vocabulary[1:1].

        The technical explanation involves the models’ logit lens - as they process the request through their layers, they construct a conceptual blend of “seahorse” and “emoji” that seems perfectly valid until the final output stage, where they’re forced to select the closest available match[1:2].

        Many humans share this false memory, with Reddit threads and social media posts filled with people convinced they’ve seen a seahorse emoji before[1:3]. While a seahorse emoji was proposed to the Unicode Consortium in 2018, it was rejected and has never actually existed[2:2].

        Adding that it don’t exist in the official Unicode emoji pack, but they do in inofficial packs, eg. here


        1. Why do LLMs freak out over the seahorse emoji? - Theia Vogel ↩︎ ↩︎ ↩︎ ↩︎

        2. Emojipedia - Is There a Seahorse Emoji? ↩︎ ↩︎ ↩︎

    • pkjqpg1h@lemmy.zipOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 month ago

      maybe for simple queries but if your task is like this, currently there is no AI that can beat a human/me

      • “finding most popular communities in Lemmy”

      • “5 latest llm models”

      • “trump’s last 5 lies”

      • “any file finding”

      • “image finding”

      • “any tool or website suggestion”

      • “finding source of something”

      • “finding github issues with related something”

      • “finding all news about something”

      • “finding an broken webpage”

      • “finding original content”

      • “finding illegal content :D”

      Even when they “do” they do just good-enough and it’s not enough for me

      • moakley@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        2
        ·
        1 month ago

        maybe for simple queries

        Yeah. I’m referring to simple queries. That’s the vast majority of my queries.

        • its_kim_love@lemmy.blahaj.zone
          link
          fedilink
          arrow-up
          3
          arrow-down
          1
          ·
          1 month ago

          We’re we supposed to read your mind for that one? You literally said someone else’s experience was impossible before you back it up to “of course I just ment simple queries.”

          • moakley@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 month ago

            Maybe reread the conversation, because you seem to be assuming a tone on my part that isn’t there.

    • its_kim_love@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      3
      ·
      1 month ago

      I find that searching for anything older than 10 years ago that isn’t media or pop culture just doesn’t appear. I can’t find a way to exclude terms at all. I can’t find a reliable way to add terms without wildly changing the results instead of digging into the ones I have to find what I’m actually looking for.