Who I am and What Motivates me to Write

Hello! I’m Khuyen Tran. I have been writing on Medium since December 2019, but I haven’t properly introduced myself so I wrote this article to do so.

I major in statistics, but I love playing with data science and Python tools and share them with others in my free time. Thus, I decided to write at least one article per week. At the point of writing this article, I have written a total of 97 articles.

What Motivates Me to Write?

I love open-source tools, but it can be difficult to understand what they do without spending hours on them. Thus, some cool packages are…


Now you can Reload a Loop Body without Losing its State

Motivation

Have you ran a for-loop and wished to add more details to the code inside the for loop? You might decide not to because adding more details means that you need to stop your progress and rerun everything again.

Stopping your progress is especially not ideal if the code has been running for hours. Wouldn’t it be great if you can reload a loop body on each iteration without losing state like below?


A Better Alternative to One-hot Encoding when Encoding Dirty Categories

Motivation

Imagine your task is to predict employees’ salaries in Montgomery County. Your employee_position_title column looks something like this:

Office Services Coordinator
Master Police Officer
Social Worker IV
Resident Supervisor II
Police Officer III
Police Aide

Since many machine learning algorithms only work with numerical data, it is important to convert employee_position_title variables to their numeric forms.

To do that, a common approach is to use one-hot encoding to deal with nominal data. When applying one-hot encoding, each category will be presented as an array of 0s and 1.

For example, in the table below, Office Service Coordinator is represented as…


Avoid headaches when debugging in one line of code

Motivation

Have you ever seen an error output like below:

2 divided by 1 is equal to 2.0.
Traceback (most recent call last):
File "loguru_example.py", line 17, in <module>
divide_numbers(num_list)
File "loguru_example.py", line 11, in divide_numbers
res = division(num1, num2)
File "loguru_example.py", line 5, in division
return num1/num2
ZeroDivisionError: division by zero

and wish the output can be a little bit easier to understand as shown here?

You might also want to visualize which lines of code are being executed and how many times they are executed in real-time:


Use your Domain Knowledge to Label your Data

Nowadays, data scientists often give machine learning model data with labels so that it can figure out the rules. These rules can be used to predict the labels of new data.


Gain Control over your Machine with These 3 Tools

Motivation

If you are a Linux user, you might want to know some important information about your computer such as:

  • System’s CPU usage, memory usage, and disk usage
  • System information
  • Disk usage of each folder or file and when was the last time that you used them.
  • Memory and CPU consumption of the running processes
  • Startup applications

Knowing those pieces of information will enable you to optimize your system.

In this article, I will show you 3 tools that allow you to do all of the things above and much more!

htop - an Interactive Process Viewer

htop is an interactive process viewer. htop allows you to…


Use Plot Bindings to Understand Data from Different Angles

Motivation

Have you ever wanted to see one plot change when you interact with another plot like below? That is when Altair comes in handy.

The graph above is created with Altair. If you don’t know about Altair, Altair is a statistical visualization library for Python based on Vega and Vega-Lite. In my last article, I showed how Altair allows you to use concise visualization grammar to quickly build statistical graphics.

In this article, I will show how you can create bindings and conditions between multiple plots using Altair.

Get Started

To install Altair, type:

pip install altair

We will use Altair to…


Now you can share the fun of exploring your data with others!

Motivation

Have you ever wanted to explore a dataset in a browser or publish your dataset so that others can explore and download your data? If so, try Datasette.

Below is how the website for your data will look like after publishing it with Datasette.

Before digging into the article, you can try to explore the FiveThirtyEight’s Hate Crimes dataset using Datasette first.

What is Datasette?

Datasette is a tool for exploring your data in a web browser and publishing it as an interactive website.

To install Datasette, type:

pip install datasette

If this doesn’t work for you, find other ways to install Datasette…


Leverage your Python Skills to Create Beautiful Mathematical Animations

Motivation

Have you ever struggled with the math concepts of a machine learning algorithm and used 3Blue1Brown as a learning resource? 3Blue1Brown is a famous math YouTube channel created by Grant Sanderson. Many people love 3Blue1Brown because of Grant’s great explanation and the cool animations like below.

Wouldn’t it be cool if you can learn how he created these animations so you can create similar animations to explain some data science concepts to your teammates, managers, or followers?

Luckily, Grant puts together a Python package called manim that enables you to create mathematical animation or pictures using Python…


Build Web Applications in Several Lines of Python Code without the Knowledge of HTML and JS

Motivation

Have you ever wanted to create a web application in only several lines of Python code? Streamlit allows you to do that, but it doesn’t give you a lot of options to customize your input box, output, layout, and pages.

If you are looking for something that is easier to learn than Django and Flask, but more customizable than Streamlit, you will love PyWebIO.

What is PyWebIO?

PyWebIO is a Python library that allows you to build simple web applications without the knowledge of HTML and Javascript. PyWebIO can also be easily integrated into existing web services such as Flask or Django.

To…

Khuyen Tran

Data scientist. I share a little bit of goodness every day through daily data science tips: https://mathdatasimplified.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store