Stop writing Python functions that take more than three minutes to understand

Image for post
Image for post
Photo by The Creative Exchange on Unsplash

Motivation

Have you ever looked at a function you wrote one month earlier and found it difficult to understand in 3 minutes? If that is the case, it is time to refactor your code. If it takes you more than 3 minutes to understand your code, imagine how long it would take for your teammates to understand your code.

If you want your code to be reusable, you want it to be readable. Writing clean code is especially important to data scientists who collaborate with other team members in different roles.

You want your Python function to:

  • be small
  • do one…


Are you Using Print or Log to Debug your Code? Use Icecream instead.

Image for post
Image for post
Photo by Kenta Kikuchi on Unsplash

Motivation

If you are using print to debug your code, you might find it confusing to look at many lines of output on your terminal and then try to figure out which code each output belongs to.

For example, running the script below

will give you

30
40

Which one of these outputs is num1 ? Which one of these outputs is num2 ? Two outputs might not be so bad to figure out, but what if there are more than 5 different outputs? To try to find the source code that is responsible for the output can be time-consuming.

You might try to add text to the print statement to make it easier to figure…


Have a Quick Understanding of your CSV Files through your Terminal in 1 Line of Code

Image for post
Image for post
Created by author

Motivation

Have you ever tried to have a general understanding of your CSV file by staring at it but ended up not understanding your file? You can open a Jupyter Notebook to analyze your CSV file, but it is time-consuming to open a notebook just to understand a CSV file, especially when you are working primarily with Python scripts and terminal.

Is there a way that you can quickly analyze your CSV files from your terminal in 1 line of code such as this?

$ xsv stats bestsellers.csv | xsv table

This is when xsv comes in handy.

What is xsv?

xsv is a command-line program for indexing, slicing, analyzing, splitting, and joining CSV files. I like xsv because makes it extremely quick and easy to work with CSV files. …


Data Visualization, Natural Language Processing

Distinguish Gender in Tweets and Present them in an Interactive HTML Scatter Plot

Image for post
Image for post
Photo by Dainis Graveris on Unsplash

Motivation

If you are given a tweet, can you recognize which gender it belongs to? You probably can recognize the gender of the author by looking at specific words in a tweet.

For example. if you see the word ‘cute’ in a tweet, there is a high percentage that the author is female. Because some words are used more often by a certain gender, it is possible for machine learning models to distinguish between different genders using these gender-related words.

Wouldn’t it be interesting if we can visualize how different words are related to different genders on Twitter? That could be easily done with Scattertext. …


Bored with Your Terminal Output? Let’s Change its Color and Shape!

Image for post
Image for post

Motivation

If you are working with Python, you probably print the output on the terminal either to debug or to be informed of the process. However, if the output is lengthy, it is difficult to keep track of the output.

Is there a way that you can make the important terminal output stand out more such as adding color, and enlarging the text like below?


Seamlessly Compare Different Experiments and Reproduce your Machine Learning Experiments using Python

Image for post
Image for post
Photo by Solé Bicycles on Unsplash

Motivation

If you have applied machine learning or deep learning for your data science projects, you probably know how overwhelming it is to keep track and compare different experiments.

In each experiment, you might want to track the outputs when you change the model, hyperparameters, feature engineering techniques, etc. You can keep track of your results by writing them in an Excel spreadsheet or saving all outputs of different models in your Jupyter Notebook.

However, there are multiple disadvantages to the above methods:

  • You cannot record every kind of output in your Excel spreadsheet
  • It takes quite a bit of time to manually log the results for each…


And how to Document your Code using Notion

Image for post
Image for post
Photo by NeONBRAND on Unsplash

Motivation

Even if you are a data scientist who is knowledgeable in a variety of tools and good at coding, you might still find yourself struggling to:

  • Decide the next steps to take for your project
  • Remember what you have done
  • Understand the results of your experiments
  • Get help from your teammates because they don’t fully understand the code you have written

You can use some sophisticated apps to fix these problems, but the most effective and easiest method is to document your code.

How you can fix these problems with Notion?

What is Notion? Notion can be called an all-in-one workplace.

I like to use Notion to document everything that is related to my data science projects because Notion supports every kind of file including code, pdf, and web bookmark. Notion also makes it easy to organize your contents and make them look nice with minimal effort. …


All it Takes is 10 Lines of Code!

Image for post
Image for post
Photo by freestocks on Unsplash

Motivation

Whether you have worked with natural language processing (NLP) or not, you probably feel amazed by some applications of NLP. If you want to start working with NLP, where should you start?

There’s good news. There are many amazing NLP tools out there that enable you to work on text with minimal domain knowledge. Even better news is that it is also easy to create an NLP app!

In this article, you will learn how to

  • Build an app to predict the sentiment
  • Build an app to find word similarities

Let’s get started!

Introduction to spaCy and Streamlit

spaCy makes it easy for you to process and analyze a text in several lines of code. Streamlit is a Python library that enables you to create web apps in minutes. …


Impress Recruiters with Your Skills and Cool Stats Graph when Viewing your Github Profile

Image for post
Image for post
My Github Profile

Motivation

If you are looking for a new opportunity in the data science or programming field, you should probably try to optimize your Github profile.

Don’t expect the recruiters to know your skills by investigating each of your repositories. If you have something to show off, put everything about you on the front page so the recruiters do not need to spend a lot of energy to learn about you.

This includes links to your website and social media accounts, your activity on Github, and what you are interested in.

This article will show 3 simple steps to build an impressive Github profile. After this article, you will learn how to create an impressive Github README like…


And 3 other Pandas Tricks to Process your Data Efficiently

Image for post
Image for post
Photo by Lisa Kohnen on Unsplash

Motivation

Pandas is a common library for data scientists. There are different ways to process a Pandas DataFrame, but some ways are more efficient than others. This article will provide you 4 efficient ways to:

  • Assign new columns to a DataFrame
  • Exclude the outliers in a column
  • Select or drop all columns that start with ‘X’
  • Filter rows only if the column contains values from another list

Let’s get started!

Assign New Columns to a DataFrame

Imagine we want to assign new columns whose values depend on the values of existing columns. …

About

Khuyen Tran

Data scientist. I share a little bit of goodness every day through articles and daily data science tips: https://mathdatasimplified.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store