diff --git a/osmarks.net_web_search_plan_(secret).myco b/osmarks.net_web_search_plan_(secret).myco index 3ff7ec9..9d77870 100644 --- a/osmarks.net_web_search_plan_(secret).myco +++ b/osmarks.net_web_search_plan_(secret).myco @@ -18,7 +18,7 @@ The job of a search engine is to retrieve useful information for users. This is = Indexing * Google/Bing/etc are plausibly primarily keyword-based. This is not ideal for most (?) queries, which care about something being "the same sort of thing". Neural reranking since at least 2019. -* Exa uses (mostly?) "Neural PageRank" i.e. contrastive link text/link target modelling. Rationale: link text roughly describes the kind of thing the link points to. +* Exa uses (mostly?) "Neural PageRank" i.e. contrastive link text/link target modelling. Rationale: link text (or text around link, or whole link-source document? probably mostly former) roughly describes the kind of thing the link points to. * {Could also do contrastive link co-occurrence modelling. Rationale: things referenced in the same document are likely semantically related. * This generalizes nicely to images too (Neural PageRank is like CLIP w/ captions). Could probably natively train in same embedding space. * We benefit from contrastive advances like SigLIP, [[https://arxiv.org/abs/2005.10242]].