From f18044a7a68c8fac73fd66ec7df83664b93be031 Mon Sep 17 00:00:00 2001 From: osmarks Date: Fri, 7 Mar 2025 14:43:36 +0000 Subject: [PATCH] =?UTF-8?q?Edit=20=E2=80=98osmarks.net=5Fweb=5Fsearch=5Fpl?= =?UTF-8?q?an=5F(secret)=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- osmarks.net_web_search_plan_(secret).myco | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/osmarks.net_web_search_plan_(secret).myco b/osmarks.net_web_search_plan_(secret).myco index 3ff7ec9..9d77870 100644 --- a/osmarks.net_web_search_plan_(secret).myco +++ b/osmarks.net_web_search_plan_(secret).myco @@ -18,7 +18,7 @@ The job of a search engine is to retrieve useful information for users. This is = Indexing * Google/Bing/etc are plausibly primarily keyword-based. This is not ideal for most (?) queries, which care about something being "the same sort of thing". Neural reranking since at least 2019. -* Exa uses (mostly?) "Neural PageRank" i.e. contrastive link text/link target modelling. Rationale: link text roughly describes the kind of thing the link points to. +* Exa uses (mostly?) "Neural PageRank" i.e. contrastive link text/link target modelling. Rationale: link text (or text around link, or whole link-source document? probably mostly former) roughly describes the kind of thing the link points to. * {Could also do contrastive link co-occurrence modelling. Rationale: things referenced in the same document are likely semantically related. * This generalizes nicely to images too (Neural PageRank is like CLIP w/ captions). Could probably natively train in same embedding space. * We benefit from contrastive advances like SigLIP, [[https://arxiv.org/abs/2005.10242]].