Member-only story
What I Learned from Scraping 15k Data Science Articles on Medium
And why having 5 Claps in your Article is okay
Motivation
Have you ever wondered about what factors make an article receive a high number of claps? Besides, as a data science writer, I wonder:
- What is the average number of claps? Some articles I came across have 100 or even 1000 claps. Is that a typical number of claps for a Data Science article?
- Which titles are most used by data science articles?
- What is the ideal reading time for a good article?
- Will publishing on the weekdays give more claps than the weekends?
To answer these questions, I scraped all data science articles on Medium published within the last year.
Tools
To scrape medium, I used the excellent repository from Harrison Jansma with slight changes in the packages to deal with the errors in the requirements. I choose 6 tags related to data science:
- Data science
- Machine learning
- AI
- Python
- Data visualization
- Big data
The articles are published anytime between July 2019 and July 2020. It took me 4 to 5 hours to scrape all of these tags but I got good data ready for cleaning and analyzing. I merged data from 6 tags together and added a column Tag showing which tags the article belongs to.
If you want to play along with the data and follow along with the articles, you could download the data here:
or use Datapane Blob to get direct access to the data: