Member-only story

What I Learned from Scraping 15k Data Science Articles on Medium

And why having 5 Claps in your Article is okay

Khuyen Tran
7 min readJul 17, 2020
Photo by Thomas Charters on Unsplash

Motivation

Have you ever wondered about what factors make an article receive a high number of claps? Besides, as a data science writer, I wonder:

  • What is the average number of claps? Some articles I came across have 100 or even 1000 claps. Is that a typical number of claps for a Data Science article?
  • Which titles are most used by data science articles?
  • What is the ideal reading time for a good article?
  • Will publishing on the weekdays give more claps than the weekends?

To answer these questions, I scraped all data science articles on Medium published within the last year.

Tools

To scrape medium, I used the excellent repository from Harrison Jansma with slight changes in the packages to deal with the errors in the requirements. I choose 6 tags related to data science:

  • Data science
  • Machine learning
  • AI
  • Python
  • Data visualization
  • Big data

The articles are published anytime between July 2019 and July 2020. It took me 4 to 5 hours to scrape all of these tags but I got good data ready for cleaning and analyzing. I merged data from 6 tags together and added a column Tag showing which tags the article belongs to.

If you want to play along with the data and follow along with the articles, you could download the data here:

or use Datapane Blob to get direct access to the data:

--

--

Responses (3)

Write a response