source: https://exa.ai/blog/superknowledge

original title:We need superknowledge before superintelligence

Notes


Origin

The most important technical problem

最重要的技术问题

Ilya Sutskever thinks “building safe superintelligence is the most important technical problem of our time”.

伊利亚-苏茨基弗(Ilya Sutskever)认为 “构建安全的超级智能是我们这个时代最重要的技术问题”。

I disagree. I think there’s a more pressing technical problem, one that needs to be solved first – superknowledge.

我不同意。我认为还有一个更紧迫的技术问题需要首先解决,那就是超级知识。

The world is far shorter on knowledge than intelligence right now. We’ll soon have near-AGI intelligences (GPT-5) relying on knowledge systems built for humans in the late 1900s (Google).

目前,世界上的知识远远少于智能。我们很快就会拥有近似AGI的智能体(GPT-5),它们依赖的是20世纪末为人类构建的知识系统(谷歌)。

This is an absurd situation, even a dangerous one.

这是一个荒谬的局面,甚至是一个危险的局面。

We need to build superknowledge before superintelligence. Let’s explore why.

我们需要在超级智能之前构建超级知识。让我们来探讨一下原因。

Intelligence is bottlenecked by knowledge

智能的瓶颈在于知识

Intelligence is different from knowledge.

智力不同于知识。

Intelligence is reasoning over an input. Knowledge is retrieving from a data repository.

智能是对输入进行推理。知识是从数据存储库中检索。

All the recent advanced AI models have high intelligence, but surprisingly limited knowledge.

最近所有先进的人工智能模型都有很高的智能,但知识却出奇地有限。

For example, GPT-4 can nail any highschool physics problem, but if you ask it to retrieve a list of physics PhDs in NYC – a relatively simpler request – you get this:

例如,GPT-4 可以解决任何高中物理问题,但如果你要求它检索纽约市的物理学博士名单—这是一个相对简单的要求—你就会得到这个结果:

ChatGPT search for physics phds in NYC

GPT-4 does have some knowledge of the world, but it isn’t anywhere close to knowing everything — every phd webpage, every news article, blog post, youtube video, tweet, reddit post, meme, etc.

GPT-4 对这个世界确实有一定的了解,但还远远达不到无所不知的地步—每个博士网页、每篇新闻报道、博客文章、youtube 视频、tweet、reddit 帖子、meme 等等。

That’s why LLMs are often combined with a search engine. The LLM brings the intelligence, and the search engine brings the knowledge. At least, in theory. Unfortunately today’s search engines can’t handle simple knowledge requests either:

这就是 LLMs 经常与搜索引擎相结合的原因。LLM 带来智能,搜索引擎带来知识。至少在理论上是这样。遗憾的是,如今的搜索引擎也无法处理简单的知识请求:

Google search for physics phds in NYC

Knowledge systems like Google haven’t improved much over the past decade (arguably, they’ve gotten worse). In contrast, intelligence systems improve every month.

像谷歌这样的知识系统在过去十年中并没有什么改进(可以说是越来越差)。相比之下,智能系统每个月都在进步。

That means intelligence is increasingly bottlenecked by knowledge.

这意味着智能越来越受到知识的瓶颈制约。

Luckily, we now have technology like transformers, which enable radically new knowledge systems. That’s what our team at Exa is working on. I believe we’re only a few years away from building superknowledge.

幸运的是,我们现在有了像变压器这样的技术,可以实现全新的知识系统。这正是我们 Exa 团队正在研究的。我相信,只需几年时间,我们就能构建出超级知识。

What is superknowledge? 什么是超级知识?

Superintelligence is a system that can handle extremely complex reasoning requests.

超级智能是一种能够处理极其复杂的推理请求的系统。

Superknowledge is a system that can handle extremely complex retrieval requests.

超级知识是一种能够处理极其复杂的检索请求的系统。

We’ve achieved superknowledge when there exists an API that can handle any knowledge request over available information, no matter how complex.

当有一个应用程序接口可以处理任何关于可用信息的知识请求时,无论多么复杂,我们都已经实现了超级知识。

Superknowledge would handle requests like:

超级知识可以处理以下请求

  • ”all physics PhDs in NYC” (should return all 457 physics PhDs in NYC and any associated metadata)

    “纽约市所有物理学博士”(应返回纽约市所有 457 名物理学博士及任何相关元数据)

  • “The org chart of every AI startup in the Bay Area started in the past 3 years sorted by employee count, where the founders have some experience training LLMs in pytorch” (should return a perfect comprehensive list)

    “湾区过去 3 年内成立的每家人工智能初创公司的组织结构图,按员工人数排序,创始人都有在 pytorch 中培训 LLMs 的经验”(应返回一个完美的综合列表)

  • “All the apartments in SF that have a window facing a courtyard and air conditioning and don’t have any reviews about smell complaints, sorted by price” (if you happen to be superknowledgeable about this one plz DM me)

    “三藩市所有拥有面向庭院的窗户和空调,且没有任何关于气味投诉的评论的公寓,按价格排序”(如果你碰巧对此超级了解,请给我发 DM)

In short, superknowledge gives everyone comprehensive knowledge of anything as quickly as they want.

简而言之,超级知识可以让每个人以最快的速度全面了解任何事情。

I believe we urgently need this comprehensive knowledge, both to progress society and to safeguard it.

我认为,我们迫切需要这种全面的知识,既能推动社会进步,又能保护社会安全。

Superknowledge unblocks progress

超级知识阻碍进步

If you want to accelerate human progress, superknowledge is perhaps the most overlooked way to do it.

如果你想加速人类进步,超级知识也许是最容易被忽视的方法。

Progress is a constant cycle of learning what’s out there and trying something new. Superknowledge eliminates any bottlenecks to the first step so that all energy can be focused on the second.

进步是一个不断学习现有知识并尝试新事物的循环。超级知识消除了第一步的任何瓶颈,从而可以将所有精力集中在第二步上。

  • Doctors would get a deep analysis of all previous studies involving similar symptoms before making their diagnosis

    医生在做出诊断前,会深入分析以往所有涉及类似症状的研究结果

  • AI researchers would instantly gather every experimental result related to any new idea they think up

    人工智能研究人员可以立即收集到与他们想到的任何新想法相关的所有实验结果

  • Software engineers would find every C++ project containing the code snippets they need

    软件工程师可以找到所有包含所需代码片段的 C++ 项目

  • Journalists would in real time see every fact that supports or negates what their interviewee states

    记者可以实时看到支持或否定受访者观点的每一个事实

  • Investors would never miss a climate tech opportunity that fits their portfolio

    投资者不会错过适合其投资组合的气候技术机会

  • Artists would find every modern painter in Denver who they should meet when they travel there

    艺术家可以找到他们在丹佛旅行时应该见到的每一位现代画家

  • Supply chain managers would identify the best possible supplier for every stage of the rocket assembly pipeline

    供应链经理将为火箭组装流水线的每个阶段找到最佳供应商

Superknowledge would make us all superproductive and superinformed.

超级知识将使我们所有人都拥有超级生产力和超级信息。

In our personal lives, much of our time is wasted searching — for apartments, events, clothing, interesting articles, solutions to personal problems, etc. Superknowledge gathers all information for you in 2 seconds, not 2 days.

在我们的个人生活中,很多时间都浪费在搜索上 — 公寓、活动、服装、有趣的文章、个人问题的解决方案等等。Superknowledge 可在 2 秒钟内为您收集所有信息,而不是 2 天。

Sometimes we even waste not days, but months or years of our lives because we didn’t learn something existed until later – the perfect job opportunity, the right medical treatment. With superknowledge, you’d have a smart alert system so that you’re fully in the know about any topic. No more “I wish I knew that earlier”, for anything.

有时,我们甚至不是浪费生命中的几天,而是浪费生命中的几个月或几年,因为我们直到后来才了解到某些东西的存在—完美的工作机会、正确的医疗方法。有了超级知识,你就会拥有一个智能警报系统,从而对任何话题都了如指掌。任何事情都不会再有 “我要是早点知道就好了 “的想法。

Progress will accelerate most from combining superknowledge with an intelligence like GPT-5. GPT-5 can handle the planning and processing while superknowledge handles the retrieval.

将超级知识与 GPT-5 这样的智能技术相结合,将大大加快进展速度。GPT-5 可以负责规划和处理,而超级知识则负责检索。

Let’s say you want help finishing a research paper. GPT-5 + superknowledge would take each paragraph in your paper and find all the similar ideas from across the web (papers, blog posts, tweets, videos, etc). Then it would find the counterarguments to each of those ideas. Then the counterarguments to the counterarguments, and so on. It would feel as if a week-long academic conference had analyzed your paper, but in 2 seconds.

比方说,你需要帮助完成一篇研究论文。GPT-5 + superknowledge 会根据论文中的每个段落,从整个网络(论文、博文、推特、视频等)中找到所有类似的观点。然后,它会找出每个观点的反驳理由。然后是对反驳的反驳,以此类推。这感觉就像一个长达一周的学术会议分析了你的论文,但只用了 2 秒钟。

On the other hand, GPT-5 + Google would get stuck because Google can’t handle queries like finding similar ideas or counterarguments.

另一方面,GPT-5 + Google 会陷入困境,因为 Google 无法处理查找类似观点或反驳等查询。

It’s difficult for us to fathom how quickly progress will accelerate when every intelligence – whether human or AI – is unblocked by all the knowledge that’s out there.

我们很难想象,当每一种智能—无论是人类还是人工智能—都被所有的知识所阻挡时,进步的速度会有多快。

Superknowledge prepares us for superintelligence

超级知识让我们为超级智能做好准备

超级知识让我们为超级智能做好准备

Superknowledge doesn’t just accelerate us toward an advanced future, it also accelerates us toward a safer one.

超级知识不仅加速我们走向先进的未来,也加速我们走向更安全的未来。

超级知识不仅加速我们走向先进的未来,也加速我们走向更安全的未来。

When people list the biggest threats to humanity, they don’t usually put the state of our knowledge as the top threat, but it actually is.

当人们列出人类面临的最大威胁时,通常不会把我们的知识状况列为头号威胁,但实际上它确实是。

当人们列出人类面临的最大威胁时,通常不会把我们的知识状况列为头号威胁,但实际上它确实是。

That’s because our knowledge underlies everything in our society – what problems we care about, how we act toward others, which politicians we choose, etc. Every societal malfunction is downstream from bad knowledge.

这是因为我们的知识支撑着社会的一切—我们关心什么问题,我们如何对待他人,我们选择哪位政治家,等等。每一个社会弊端的下游都是错误的知识。

这是因为我们的知识支撑着社会的一切—我们关心什么问题,我们如何对待他人,我们选择哪位政治家,等等。每一个社会弊端的下游都是错误的知识。

Unfortunately, our current knowledge ecosystem is a mess. Knowledge is scattered across billions of webpages with no tool powerful enough to organize it all. That makes it extremely hard to become truly well-informed on any issue – you never know what knowledge you’re missing.

不幸的是,我们目前的知识生态系统一团糟。知识分散在数十亿个网页中,没有足够强大的工具来组织这些知识。这使得我们很难真正做到对任何问题都了如指掌—你永远不知道自己错过了什么知识。

不幸的是,我们目前的知识生态系统一团糟。知识分散在数十亿个网页中,没有足够强大的工具来组织这些知识。这使得我们很难真正做到对任何问题都了如指掌—你永远不知道自己错过了什么知识。

When people aren’t well-informed, they make the wrong decisions, elect the wrong leaders, and cause inefficiencies throughout society. This is causing real problems, from inane housing laws to actual war.

当人们缺乏足够的信息时,他们就会做出错误的决定,选出错误的领导人,并导致整个社会效率低下。这就造成了真正的问题,从不合理的住房法到真正的战争。

当人们缺乏足够的信息时,他们就会做出错误的决定,选出错误的领导人,并导致整个社会效率低下。这就造成了真正的问题,从不合理的住房法到真正的战争。

The rise of agentic AI systems multiplies this problem dramatically. If AIs are stuck with the same knowledge tools as humans, then we’ll just have thousands more intelligences operating over the same incomplete knowledge. These AIs will interact with billions of people daily and perform actions on their behalf. They will be highly intelligent but misinformed, a dangerous combination.

代理型人工智能系统的兴起使这一问题大大增加。如果人工智能只能使用与人类相同的知识工具,那么我们就会有成千上万的智能体在同样不完整的知识上运行。这些人工智能将每天与数十亿人互动,并代表他们采取行动。它们将是高度智能的,但信息不全,这是一个危险的组合。

代理型人工智能系统的兴起使这一问题大大增加。如果人工智能只能使用与人类相同的知识工具,那么我们就会有成千上万的智能体在同样不完整的知识上运行。这些人工智能将每天与数十亿人互动,并代表他们采取行动。它们将是高度智能的,但信息不全,这是一个危险的组合。

Our society deserves something better. Building superknowledge is the solution.

我们的社会需要更好的东西。建立超级知识就是解决方案。

Superknowledge advances safety because it lets people or AIs quickly become well-informed on any topic – from the technologies related to carbon removal to the laws that should govern AI itself. I’d much rather take advice from an AI that analyzed the 10,000 relevant arguments on the web over one that read the first 10 links of a Google search.

We’re now entering the most volatile decade in human history. It’s essential that humans and AIs can rely on a mature knowledge ecosystem that guides us through the chaos.

我们正在进入人类历史上最动荡的十年。人类和人工智能必须依靠一个成熟的知识生态系统来指导我们穿越混乱。

We just better build it before superintelligence arrives.

《我们最好能在超级智能到来之前建造它。

We’re building superknowledge

我们正在构建超级知识

It’s no accident that the Bible begins with a story about the tree of knowledge. For 5,000 years, humans have dreamed of knowing everything. We’re going to achieve that dream in about 3 years, and I think it’ll be powered by Exa. This is a historic mission, biblical even.

圣经》以知识树的故事开篇绝非偶然。5000 年来,人类一直梦想着了解一切。我们将在 3 年内实现这个梦想,我认为它将由 Exa 驱动。这是一项历史性的任务,甚至是圣经中的任务。

I’ve personally dreamt of knowing everything for two decades, since I was a little kid lying prone on my 4-foot tall outer-space book wondering what it all means. We’re finally almost there.

二十年来,从我还是个俯卧在 4 英尺高的外太空书上想知道这一切意味着什么的小孩子起,我就梦想着了解一切。我们终于快成功了。

It’s interesting that no-one else is working on this. While there are dozens of labs working on superintelligence, as far as I’m aware there’s only one organization in the world working on superknowledge – Exa.

有趣的是,没有其他人正在研究这个问题。虽然有几十个实验室在研究超级智能,但就我所知,世界上只有一个组织在研究超级知识—Exa。

That’s partly because building superknowledge requires an organization with the right incentives. Organizations with ad-based revenue models will not build it. Exa, in contrast, has a usage-based revenue model. We’re highly incentivized to give users full control to retrieve whatever knowledge they need. Turns out users would pay a lot for superknowledge.

部分原因在于,构建超级知识需要一个具有正确激励机制的组织。以广告为收入模式的组织不会建立超级知识。相比之下,Exa采用的是基于使用的收入模式。我们有很强的激励机制,让用户可以完全控制检索他们需要的任何知识。事实证明,用户会为超级知识支付高昂的费用。

Another reason no one’s building superknowledge is that it’s hard. We need to design novel ML architectures in a novel research field while building a novel search business. That’s not including all the massive infrastructure required for crawling, storing, processing, and serving petabytes of web data.

没有人构建超级知识的另一个原因是这很难。我们需要在新颖的研究领域设计新颖的 ML 架构,同时建立新颖的搜索业务。这还不包括抓取、存储、处理和提供 PB 级网络数据所需的大量基础设施。

Yet superknowledge seems more attainable than superintelligence. It requires fewer magical breakthroughs. We have a pretty clear roadmap to get there.

然而,超知识似乎比超智能更容易实现。它需要的神奇突破更少。我们有一个非常清晰的路线图来实现它。

The clock is ticking. To safely navigate the next decade, we need to build superknowledge before SSI, OpenAI, or some other organization builds superintelligence. For the Exa team, this is the most important technical problem of our time.

时不我待。为了安全地度过下一个十年,我们需要在 SSI、OpenAI 或其他组织建立超级智能之前建立超级知识。对于 Exa 团队来说,这是我们这个时代最重要的技术问题。