Cyber Defense Advisors

RSS Feed Scraper Websites and How They Affect Blog Authors

I sent a note to my customers yesterday saying that I’m going to try to temporarily put my blog behind a paywall to fend off RSS scrapers. These are sites that blatantly copy all your content instead of displaying a portion of the article and then redirecting to the blog to read the full article.

This behavior hurts blog authors, and especially on a site like Medium where you may be getting paid for web traffic. Instead of the traffic coming to Medium your blogs are read elsewhere and your Medium stats are low.

The problem is that then I noticed in my Medium stats that RSS scrapers are still getting the full blog post, even with the paywall. Perhaps they are waiting for me to remove the paywall and then scraping the blog.

Note: If you are reading this content anywhere other than my Medium Cloud Security blog, please contact me on LinkedIn or Twitter and let me know.

https://medium.com/cloud-security

My name is unique and it should be easy to find on either platform:

Teri Radichel

https://linkedin.com/in/teriradichel

https://twitter.com/teriradichel

You can also search for my company, 2nd Sight Lab, LLC to which I assign the copyright at the bottom of each post.

Then authors have to send their time searching around for duplicated content and reporting it to Google:

Is Someone Sabotaging your Blog?

The other thing is, I think some of these scrapers are simply rearranging the conent and they are definitely removing links. It could be in some cases that the scrapers are trying to prevent your blogs to get traction in search engine rankings as I wrote about in the above post.

Here’s an article I just saw yesterday on the topic that provides a list of RSS Scrapers:

https://www.techbusinessnews.com.au/rss-feed-scraper-websites-and-how-to-stop-them/

They also offer some solutions to help you fend off RSS scrapers. However, if you host your content on Medium, you can’t do this, since Medium controls the web hosting for your content.

How could Medium fix this problem?

First of all, Medium stats need to be more granular — like Google Analytics — showing you more details about the IP addresses that visited your site and whether they as a result RSS or web.

Medium could provide a lot more information to make the site more valuable to authors like showing which countries frequent your blog and even what corporate IP ranges, which you can identify using something like MaxMind or possibly CloudFlare as the article above mentions, or maybe even source straight from the IP registries: ARIN, RIPE, APNIC, LACNIC, AFRINIC. I’ve written about those before.

Allow authors to block IP ranges they don’t want frequenting their blogs.

Now, the IP addresses that visit your site alone might not help you, because you have to link that IP to the site that’s hosting your content. The RSS feed could be pulled by one IP and published to a site with a different IP (most likely). But if certain IPs are known for performing these actions then you could identify them at least.

Then, Medium needs to all you to block certain IP addresses from visiting your blog.

Next, show user agents. Same thing. Some user agents are malicious or at least annoying scrapers. Allow authors to block specific user-agents.

If that’s too complicated, for a short term fix, Medium could allow authors to block RSS altogether. How many people still use RSS for legitimate reasons? I don’t really know the answer to that question. But for my purposes, I would like to simply block RSS on my blog. I don’t see any option for doing that.

The other thing is, instead of sending the entire blog in RSS, which one person who was blatantly copying my blog said was the problem because other sources don’t do that, Medium could deliver a portion or the blog and a link. That would drive people who read the posts via RSS to visit the blog.

The other thing Medium should do is provide referrers — in more detail. A complete list. That would also help authors see when other types of advertising and marketing campaigns are successful via parameters in the URL. But at a minimum, authors could see who is visiting your site from a referrer vs. someone who just comes straight to the site once per day with no referrer to scrape the content — so there needs to be a no referrer category and show you which IPs those are.

Medium is a great, simple blogging platform, but it is almost too simple. Time will tell if I keep my content here. For the moment, I’m hoping for a simple toggle to turn off RSS. Pretty please.

Teri Radichel | © 2nd Sight Lab 2023

If you liked this story ~ use the links below to show your support. Thanks!

Support:
Clap
for this story or refer others to follow me.
Follow on Medium: Teri Radichel
Sign up for Email List: Teri Radichel
Follow on Twitter: @teriradichel
Follow on Mastodon: @[email protected]
Follow on Post: @teriradichel
Like on Facebook: 2nd Sight Lab
Buy a Book: Teri Radichel on Amazon
Buy me a coffee:
Teri Radichel
Request services via LinkedIn:
Teri Radichel or through IANS ResearchAbout:
Slideshare: Presentations by Teri Radichel
Speakerdeck: Presentations by Teri Radichel
Recognition: SANS Difference Makers Award, AWS Hero, IANS Faculty
Certifications: SANS
Education: BA Business, Master of Sofware Engineering, Master of Infosec
How I got into security: Woman in tech
Company (Penetration Tests, Assessments, Training): 2nd Sight Lab

Cybersecurity for Executives in the Age of Cloud on Amazon

Cloud Security Training (virtual now available):

2nd Sight Lab Cloud Security Training

Is your cloud secure?

Hire 2nd Sight Lab for a penetration test or security assessment.

Have a Cybersecurity or Cloud Security Question?

Ask Teri Radichel by scheduling a call with IANS Research.

More by Teri Radichel:

Cybersecurity and Cloud security classes, articles, white papers, presentations, and podcasts

RSS Feed Scraper Websites and How They Affect Blog Authors was originally published in Cloud Security on Medium, where people are continuing the conversation by highlighting and responding to this story.