Edit ‘osmarks.net_web_search_plan_(secret)’
This commit is contained in:
parent
f18044a7a6
commit
0726eb9ea7
@ -12,7 +12,9 @@ The job of a search engine is to retrieve useful information for users. This is
|
|||||||
|
|
||||||
* Anna's Archive is ~0.5PB. This contains a substantial fraction of books and papers. These are plausibly higher-quality than the general internet.
|
* Anna's Archive is ~0.5PB. This contains a substantial fraction of books and papers. These are plausibly higher-quality than the general internet.
|
||||||
* We do need general internet data for breadth of knowledge etc. This runs to PB (Common Crawl etc). Apparently billions of pages per month.
|
* We do need general internet data for breadth of knowledge etc. This runs to PB (Common Crawl etc). Apparently billions of pages per month.
|
||||||
* There is lots of alpha in weird corners of Twitter and also Discord. It would be useful to scrape these, though people would complain.
|
* {There is lots of alpha in weird corners of Twitter and also Discord. It would be useful to scrape these, though people would complain.
|
||||||
|
* Also IRC, but that's logged worse.
|
||||||
|
}
|
||||||
* Images, PDFs, etc contain useful knowledge which hasn't been integrated properly into most things. We need* these.
|
* Images, PDFs, etc contain useful knowledge which hasn't been integrated properly into most things. We need* these.
|
||||||
|
|
||||||
= Indexing
|
= Indexing
|
||||||
|
Loading…
x
Reference in New Issue
Block a user