Who I am and What Motivates me to Write

Image of Author

Hello! I’m Khuyen Tran. I have been writing on Medium since December 2019, but I haven’t properly introduced myself so I wrote this article to do so.

I major in statistics, but I love playing with data science and Python tools and share them with others in my free time. Thus, I decided to write at least one article per week. At the point of writing this article, I have written more than 100 articles.

What Motivates Me to Write?

I love open-source tools, but it can be difficult to understand what they do without spending hours on them. Thus, some cool packages are known…


Getting Started

A Comprehensive Guide to SHAP and Shapley Values

Motivation

Imagine you are trying to train a machine learning model to predict whether an ad is clicked by a particular person. After receiving some information about a person, the model predicts that a person will not click on an ad.

Image by Author

But why does the model predict that? How much does each feature contribute to the prediction? Wouldn’t it be nice if you can see a plot indicating how much each feature contributes to the prediction like below?


Data Science

Package and Ship Your Notebook With Ease

Motivation

Have you ever wanted to collaborate with your teammates on a project using Jupyter Notebook? Google Colab allows you and others to edit and comment on the same project, but you cannot see their changes in real-time. To see others’ comments or changes, you need to refresh the page.

GIF by Author

This approach works but can be problematic if you are running a cell that takes a long time to run. Since you are not aware others commented until the page is refreshed, you cannot give them instant feedback.

Wouldn’t it be nice if you can see the real-time changes like below?


Create Maintainable and Modular Data Science Pipelines with Kedro

Motivation

Have you ever passed your data to a list of functions and classes without knowing for sure how the output is like?

Image by Author

You might try to save the data then check it in your Jupyter Notebook to make sure the output is as you expected. This approach works, but it is cumbersome.

Another common issue is that it’s hard to understand the relationships between functions when looking at a Python script that contains both the code to create and execute functions.

Your code looks even more complex and hard to follow as the project grows.

Wouldn’t it be…


Hands-on Tutorials

Suggest Users Which Activities to Do When They’re Bored Using Python

Motivation

Have you ever been so bored that you searched on Google: “What to do when being bored?” Wouldn’t it be nice if you can create an app that suggests users a random activity for the day and books related to that activity using Python?

GIF by Author

You can play with the app shown above here. In this article, I will show you how to create this app in a few lines of code using PyWebIO, Bored API, and Open Library.

Before getting into the code, let’s collect the tools we need to create the app.

Tools

PyWebIO — Create a Web Application in Python

PyWebIO is a Python library that allows…


Do Your Friends Have More Friends Than You, On Average?

Motivation

On average do your friends have more friends than you? If you are an average person, there is a high chance that you have fewer friends than your friends.

This is called the friendship paradox. This phenomenon states that most people have fewer friends than their friends have, on average.

In this article, I will demonstrate why such a paradox exists, and whether we can find the paradox in Facebook data.

GIF by Author — Interact with the graph here.

Minimal Example

To understand why the friend paradox exists, let’s start with a minimal example. We will create a network of people. …


How to Answer Difficult Questions with Knowledge Graph

Motivation

What do you often do when you want to learn more about a person? You might decide to read about a person using websites such as Wikipedia.

Image by Author

However, the text can be lengthy, and it might take you a while to get the pieces of information that you need.

Since we are better at absorbing information in form of images than text, wouldn’t it be nice if you can get a graph like below when searching for Steve Jobs?


Data Transformation in One Line of Code

Motivation

Sometimes, you might want experiment with combinations of features to create a good model. However, this often requires extra effort to transform the features into arrays that could be used by a scikit-learn’s model.

Image by Author

Wouldn’t it be nice if you could quickly create features using arbitrary Python code? That is when Patsy comes in handy.

What is Patsy?

Patsy is a Python library that allows data transformation using arbitrary Python code.

With Patsy, you could use human-readable syntax such as life_expectancy ~ income_group + year + region (life expectancy depends on income group, year, and region).

To install Patsy, type:

pip install patsy

Getting Started


All It Takes is Kats and a Few Lines of Code

Motivation

Have you ever looked at a time series and wonder if there are any repetitive patterns, outliers, or change points in it?

I asked the same questions when looking at the daily number of users viewing my page from January to August 2021.

When looking at the graph above, I wonder:

  • Is there a shift in the mean number of views on my page? If yes, when?
  • Are there any outliers in my time series? If yes, when?
  • I suspected that there are some hours that have more active users on the page than other hours. Is this true?


How Many Units Should You Produce Each Day to Minimize the Production and Inventory Cost?

Motivation

Imagine you are an owner of a clothing store. The demand for clothes varies from day to day (more people prefer to go shopping on the weekend than on the weekday). The production cost also varies from day to day (it costs more to hire workers to work on the weekend).

Your job is to determine how many units of clothes to produce each day.

Image by Author

Since you can store your clothes, you might decide to produce as many clothes as possible on the cheapest day. …

Khuyen Tran

Data scientist. I share a little bit of goodness every day through daily data science tips: https://mathdatasimplified.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store