Hello! I’m Khuyen Tran. I have been writing on Medium since December 2019, but I haven’t properly introduced myself so I wrote this article to do so.
I major in statistics, but I love playing with data science and Python tools and share them with others in my free time. Thus, I decided to write at least one article per week. At the point of writing this article, I have written a total of 97 articles.
I love open-source tools, but it can be difficult to understand what they do without spending hours on them. Thus, some cool packages are…
Have you ran a for-loop and wished to add more details to the code inside the for loop? You might decide not to because adding more details means that you need to stop your progress and rerun everything again.
Stopping your progress is especially not ideal if the code has been running for hours. Wouldn’t it be great if you can reload a loop body on each iteration without losing state like below?
Imagine your task is to predict employees’ salaries in Montgomery County. Your
employee_position_title column looks something like this:
Office Services Coordinator
Master Police Officer
Social Worker IV
Resident Supervisor II
Police Officer III
Since many machine learning algorithms only work with numerical data, it is important to convert
employee_position_title variables to their numeric forms.
To do that, a common approach is to use one-hot encoding to deal with nominal data. When applying one-hot encoding, each category will be presented as an array of 0s and 1.
For example, in the table below,
Office Service Coordinator is represented as…
Have you ever seen an error output like below:
2 divided by 1 is equal to 2.0.
Traceback (most recent call last):
File "loguru_example.py", line 17, in <module>
File "loguru_example.py", line 11, in divide_numbers
res = division(num1, num2)
File "loguru_example.py", line 5, in division
ZeroDivisionError: division by zero
and wish the output can be a little bit easier to understand as shown here?
You might also want to visualize which lines of code are being executed and how many times they are executed in real-time:
Nowadays, data scientists often give machine learning model data with labels so that it can figure out the rules. These rules can be used to predict the labels of new data.
If you are a Linux user, you might want to know some important information about your computer such as:
Knowing those pieces of information will enable you to optimize your system.
In this article, I will show you 3 tools that allow you to do all of the things above and much more!
htop is an interactive process viewer. htop allows you to…
Have you ever wanted to see one plot change when you interact with another plot like below? That is when Altair comes in handy.
The graph above is created with Altair. If you don’t know about Altair, Altair is a statistical visualization library for Python based on Vega and Vega-Lite. In my last article, I showed how Altair allows you to use concise visualization grammar to quickly build statistical graphics.
In this article, I will show how you can create bindings and conditions between multiple plots using Altair.
To install Altair, type:
pip install altair
Have you ever wanted to explore a dataset in a browser or publish your dataset so that others can explore and download your data? If so, try Datasette.
Below is how the website for your data will look like after publishing it with Datasette.
Before digging into the article, you can try to explore the FiveThirtyEight’s Hate Crimes dataset using Datasette first.
Datasette is a tool for exploring your data in a web browser and publishing it as an interactive website.
To install Datasette, type:
pip install datasette
Have you ever struggled with the math concepts of a machine learning algorithm and used 3Blue1Brown as a learning resource? 3Blue1Brown is a famous math YouTube channel created by Grant Sanderson. Many people love 3Blue1Brown because of Grant’s great explanation and the cool animations like below.
Wouldn’t it be cool if you can learn how he created these animations so you can create similar animations to explain some data science concepts to your teammates, managers, or followers?
Luckily, Grant puts together a Python package called manim that enables you to create mathematical animation or pictures using Python…
Have you ever wanted to create a web application in only several lines of Python code? Streamlit allows you to do that, but it doesn’t give you a lot of options to customize your input box, output, layout, and pages.
If you are looking for something that is easier to learn than Django and Flask, but more customizable than Streamlit, you will love PyWebIO.