Hello! I’m Khuyen Tran. I have been writing on Medium since December 2019, but I haven’t properly introduced myself so I wrote this article to do so.
I major in statistics, but I love playing with data science and Python tools and share them with others in my free time. Thus, I decided to write at least one article per week. At the point of writing this article, I have written more than 100 articles.
I love open-source tools, but it can be difficult to understand what they do without spending hours on them. Thus, some cool packages are known…
Imagine you are trying to train a machine learning model to predict whether an ad is clicked by a particular person. After receiving some information about a person, the model predicts that a person will not click on an ad.
But why does the model predict that? How much does each feature contribute to the prediction? Wouldn’t it be nice if you can see a plot indicating how much each feature contributes to the prediction like below?
Have you ever wanted to collaborate with your teammates on a project using Jupyter Notebook? Google Colab allows you and others to edit and comment on the same project, but you cannot see their changes in real-time. To see others’ comments or changes, you need to refresh the page.
This approach works but can be problematic if you are running a cell that takes a long time to run. Since you are not aware others commented until the page is refreshed, you cannot give them instant feedback.
Wouldn’t it be nice if you can see the real-time changes like below?
Have you ever passed your data to a list of functions and classes without knowing for sure how the output is like?
You might try to save the data then check it in your Jupyter Notebook to make sure the output is as you expected. This approach works, but it is cumbersome.
Another common issue is that it’s hard to understand the relationships between functions when looking at a Python script that contains both the code to create and execute functions.
Your code looks even more complex and hard to follow as the project grows.
Wouldn’t it be…
Have you ever been so bored that you searched on Google: “What to do when being bored?” Wouldn’t it be nice if you can create an app that suggests users a random activity for the day and books related to that activity using Python?
You can play with the app shown above here. In this article, I will show you how to create this app in a few lines of code using PyWebIO, Bored API, and Open Library.
Before getting into the code, let’s collect the tools we need to create the app.
PyWebIO is a Python library that allows…
On average do your friends have more friends than you? If you are an average person, there is a high chance that you have fewer friends than your friends.
This is called the friendship paradox. This phenomenon states that most people have fewer friends than their friends have, on average.
In this article, I will demonstrate why such a paradox exists, and whether we can find the paradox in Facebook data.
To understand why the friend paradox exists, let’s start with a minimal example. We will create a network of people. …
What do you often do when you want to learn more about a person? You might decide to read about a person using websites such as Wikipedia.
However, the text can be lengthy, and it might take you a while to get the pieces of information that you need.
Since we are better at absorbing information in form of images than text, wouldn’t it be nice if you can get a graph like below when searching for Steve Jobs?
Sometimes, you might want experiment with combinations of features to create a good model. However, this often requires extra effort to transform the features into arrays that could be used by a scikit-learn’s model.
Wouldn’t it be nice if you could quickly create features using arbitrary Python code? That is when Patsy comes in handy.
Patsy is a Python library that allows data transformation using arbitrary Python code.
With Patsy, you could use human-readable syntax such as
life_expectancy ~ income_group + year + region (life expectancy depends on income group, year, and region).
To install Patsy, type:
pip install patsy
Have you ever looked at a time series and wonder if there are any repetitive patterns, outliers, or change points in it?
I asked the same questions when looking at the daily number of users viewing my page from January to August 2021.
When looking at the graph above, I wonder:
Imagine you are an owner of a clothing store. The demand for clothes varies from day to day (more people prefer to go shopping on the weekend than on the weekday). The production cost also varies from day to day (it costs more to hire workers to work on the weekend).
Your job is to determine how many units of clothes to produce each day.
Since you can store your clothes, you might decide to produce as many clothes as possible on the cheapest day. …