Due to the nature of the default robots.txt and the meta tags in Lemmy, search engines will index even non-local communities. This leads to results that are undesirable, such as unrelated/undesirable content being associated with your instance.
As of today, lemmy-ui does not allow hiding non-local (or any) communities from Google and other search engines. If you, like me, do not want your instance to be associated with other content, you can add a custom robots.txt and response headers to avoid indexing.
In nginx, simply add this:
# Disallow all search engines
location / {
...
add_header X-Robots-Tag noindex;
}
location = /robots.txt {
add_header Content-Type text/plain;
return 200 "User-agent: *\nDisallow: /\n";
}
Here’s a commit in my fork of the lemmy-ansible playbook. And here’s a corresponding issue I opened in lemmy-ui.
I hope this helps someone :-)
There is no way to exclude individual communities. The post URLs are generic, like /post/1234. From nginx or other proxies, I cannot tell what community they belong to. I would love to have my own be searchable, but not at the price of tainting my project’s reputation.