Reddit locks down its public data in new content policy, says use now requires a contract

Abaixo de Cão@lemm.ee · 3 months ago

Reddit locks down its public data in new content policy, says use now requires a contract

rockSlayer@lemmy.world · 3 months ago

Too bad the website is still openly accessible and still capable of being scraped

IphtashuFitz@lemmy.world · 3 months ago

We use Akamai where I work for security, CDN, etc. Their services make it largely trivial to identify traffic from bots. They can classify requests in real time as coming from known bots like Googlebot to programming frameworks like python & java to bots that impersonate Googlebot, to virtually any other automated traffic from unknown bots.

If Reddit was smart they’d leverage something like that to allow Google, Bing, etc. to crawl their data and block all others, or poison others with bogus data. But we’re talking about Reddit here…

SzethFriendOfNimi@lemmy.world · 3 months ago

Probably what they’re targeting. Such as sites like safereddit, etc.

rockSlayer@lemmy.world · 3 months ago

Well that’s part of the thing. Web scraping doesn’t get covered by policies. Like, they could ban your ip or any accounts you have, but web scraping itself will always be acceptable. It’s why projects like NewPipe and Invidious don’t care about YouTube cease and desist letters.

AeroLemming@lemm.ee · 3 months ago

Is it any different for an “API”? I don’t think there’s a very big difference between an HTTP endpoint that returns HTML and an HTTP endpoint that returns JSON.

folkrav@lemmy.ca · 3 months ago

Parsing absolutely comes with a lot more overhead. Especially since many websites integrate a lot of JS interactivity nowadays, you oftentimes don’t get the full contents you’re looking for straight out of the HTML you’re getting out of your HTTP request, depending on the site.

werefreeatlast@lemmy.world · 3 months ago

Oops look like this community hasn’t been reviewed. Login if you still want to see the content.

AeroLemming@lemm.ee · 3 months ago

Doesn’t that only happen on the mobile version? Either way, it’s stupid and annoying. Google should start de-ranking sites that add barriers to content, but I know they never will.

werefreeatlast@lemmy.world · 3 months ago

I tried that on my desktop. So long as you are not actually logged in you cannot see the communities that are too small for a review or too adult after a review.

AeroLemming@lemm.ee · 3 months ago

Ugh, what a fucking shitshow. I know it won’t happen quickly or easily, but I’m hoping to see more people on federated platforms in the next decade or two. It’s the only way for us to take the internet back from these greedy bastards.

AtariDump@lemmy.world · 3 months ago

Obligatory

SatansMaggotyCumFart@lemmy.world · 3 months ago

They have a ton of useful and valuable comments.

They also have some of mine but that’s more of a liability.

ArugulaZ@kbin.social · 3 months ago

Sure, why not. People gave you all the information on Reddit for free, you might as well sell it to the highest bidder without compensating them. I call it the “Veasey maneuver.”

SmokumJoe@lemmy.world · 3 months ago

For awhile, it was a really special place,

for feeding procrastination.

Reddit locks down its public data in new content policy, says use now requires a contract

Reddit locks down its public data in new content policy, says use now requires a contract

Reddit locks down its public data in new content policy, says use now requires a contract | TechCrunch