Edit ‘osmarks.net_web_search_plan_(secret)’

This commit is contained in:
osmarks 2025-03-07 14:43:36 +00:00 committed by wikimind
parent d6f63d557d
commit f18044a7a6

View File

@ -18,7 +18,7 @@ The job of a search engine is to retrieve useful information for users. This is
= Indexing
* Google/Bing/etc are plausibly primarily keyword-based. This is not ideal for most (?) queries, which care about something being "the same sort of thing". Neural reranking since at least 2019.
* Exa uses (mostly?) "Neural PageRank" i.e. contrastive link text/link target modelling. Rationale: link text roughly describes the kind of thing the link points to.
* Exa uses (mostly?) "Neural PageRank" i.e. contrastive link text/link target modelling. Rationale: link text (or text around link, or whole link-source document? probably mostly former) roughly describes the kind of thing the link points to.
* {Could also do contrastive link co-occurrence modelling. Rationale: things referenced in the same document are likely semantically related.
* This generalizes nicely to images too (Neural PageRank is like CLIP w/ captions). Could probably natively train in same embedding space.
* We benefit from contrastive advances like SigLIP, [[https://arxiv.org/abs/2005.10242]].