What I Learned from Scraping 15k Data Science Articles on Medium

And why having 5 Claps in your Article is okay

Image for post
Image for post
Photo by Thomas Charters on Unsplash

Motivation

Tools

Get Started

import datapane as dp 
import pandas as pd
import numpy as np
medium = medium.replace('nan', np.nan)# Drop duplicated
medium = medium.drop_duplicates(subset=['Title', 'Subtitle', 'Author', 'Year','Month', 'Day', 'Tag'])
medium.info()
Image for post
Image for post

What I Found

Topics

# Number of duplicated articles with different tags
>>> sum(medium.iloc[:,:8].duplicated())
38516
# Drop duplicated
medium = medium.drop_duplicates(subset=['Title', 'Subtitle', 'Author', 'Year', 'Month', 'Day'])

Comment

Claps

Image for post
Image for post
claps = px.histogram(medium.sort_values(by='Claps')[:80000],
x='Claps',
title='Number of Claps')

Reading Time vs Claps

>>> medium.corr().loc['Reading_Time', 'Claps']0.1301349558669967
Image for post
Image for post

Author

>>> author_rank = medium.Author.value_counts().index>>> 100-(list(author_rank).index('Khuyen Tran')+1)/len(author_rank) * 10099.85684944295761
>>> author_groupby.Year.median()
1

Publications

>>> sum(publication_groupby.sort_values(by='Year', ascending=False).head(int(len(publication_groupby)*0.01)).Year)/sum(publication_groupby.Year)
0.6225407930121115

Trend

Day of the Week

Titles

Conclusion

Written by

Data scientist. I share a little bit of goodness every day through articles and daily data science tips: https://mathdatasimplified.com/

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store