Davriellelouna@lemmy.world to Technology@lemmy.worldEnglish · edit-22 days agoThe AI company Perplexity is complaining their bots can't bypass Cloudflare's firewallwww.searchenginejournal.comexternal-linkmessage-square235linkfedilinkarrow-up1858arrow-down16
arrow-up1852arrow-down1external-linkThe AI company Perplexity is complaining their bots can't bypass Cloudflare's firewallwww.searchenginejournal.comDavriellelouna@lemmy.world to Technology@lemmy.worldEnglish · edit-22 days agomessage-square235linkfedilink
minus-squareElectricd@lemmybefree.netlinkfedilinkEnglisharrow-up4arrow-down3·edit-21 day agoThey do have a point though. It would be great to let per-prompt searches go through, but not mass scrapping I believe a lot of websites don’t want both though
minus-squarethreeganzi@sh.itjust.workslinkfedilinkEnglisharrow-up2·1 day agoDoes it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
minus-squareElectricd@lemmybefree.netlinkfedilinkEnglisharrow-up1·23 hours agoI assume their script does some search engine stuff like query google or bing and then “scrap” the links they go on Some selenium stuff
They do have a point though. It would be great to let per-prompt searches go through, but not mass scrapping
I believe a lot of websites don’t want both though
Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
I assume their script does some search engine stuff like query google or bing and then “scrap” the links they go on
Some selenium stuff