Have you ever looked at a function you wrote one month earlier and found it difficult to understand in 3 minutes? If that is the case, it is time to refactor your code. If it takes you more than 3 minutes to understand your code, imagine how long it would take for your teammates to understand your code.
If you want your code to be reusable, you want it to be readable. Writing clean code is especially important to data scientists who collaborate with other team members in different roles.
You want your Python function to:
If you are using print to debug your code, you might find it confusing to look at many lines of output on your terminal and then try to figure out which code each output belongs to.
For example, running the script below
will give you
30
40
Which one of these outputs is num1
? Which one of these outputs is num2
? Two outputs might not be so bad to figure out, but what if there are more than 5 different outputs? To try to find the source code that is responsible for the output can be time-consuming.
You might try to add text to the print statement to make it easier to figure…
Have you ever tried to have a general understanding of your CSV file by staring at it but ended up not understanding your file? You can open a Jupyter Notebook to analyze your CSV file, but it is time-consuming to open a notebook just to understand a CSV file, especially when you are working primarily with Python scripts and terminal.
Is there a way that you can quickly analyze your CSV files from your terminal in 1 line of code such as this?
$ xsv stats bestsellers.csv | xsv table
This is when xsv comes in handy.
xsv is a command-line program for indexing, slicing, analyzing, splitting, and joining CSV files. I like xsv because makes it extremely quick and easy to work with CSV files. …
If you are given a tweet, can you recognize which gender it belongs to? You probably can recognize the gender of the author by looking at specific words in a tweet.
For example. if you see the word ‘cute’ in a tweet, there is a high percentage that the author is female. Because some words are used more often by a certain gender, it is possible for machine learning models to distinguish between different genders using these gender-related words.
Wouldn’t it be interesting if we can visualize how different words are related to different genders on Twitter? That could be easily done with Scattertext. …
If you are working with Python, you probably print the output on the terminal either to debug or to be informed of the process. However, if the output is lengthy, it is difficult to keep track of the output.
Is there a way that you can make the important terminal output stand out more such as adding color, and enlarging the text like below?
If you have applied machine learning or deep learning for your data science projects, you probably know how overwhelming it is to keep track and compare different experiments.
In each experiment, you might want to track the outputs when you change the model, hyperparameters, feature engineering techniques, etc. You can keep track of your results by writing them in an Excel spreadsheet or saving all outputs of different models in your Jupyter Notebook.
However, there are multiple disadvantages to the above methods:
Even if you are a data scientist who is knowledgeable in a variety of tools and good at coding, you might still find yourself struggling to:
You can use some sophisticated apps to fix these problems, but the most effective and easiest method is to document your code.
What is Notion? Notion can be called an all-in-one workplace.
I like to use Notion to document everything that is related to my data science projects because Notion supports every kind of file including code, pdf, and web bookmark. Notion also makes it easy to organize your contents and make them look nice with minimal effort. …
Whether you have worked with natural language processing (NLP) or not, you probably feel amazed by some applications of NLP. If you want to start working with NLP, where should you start?
There’s good news. There are many amazing NLP tools out there that enable you to work on text with minimal domain knowledge. Even better news is that it is also easy to create an NLP app!
In this article, you will learn how to
Let’s get started!
spaCy makes it easy for you to process and analyze a text in several lines of code. Streamlit is a Python library that enables you to create web apps in minutes. …
If you are looking for a new opportunity in the data science or programming field, you should probably try to optimize your Github profile.
Don’t expect the recruiters to know your skills by investigating each of your repositories. If you have something to show off, put everything about you on the front page so the recruiters do not need to spend a lot of energy to learn about you.
This includes links to your website and social media accounts, your activity on Github, and what you are interested in.
This article will show 3 simple steps to build an impressive Github profile. After this article, you will learn how to create an impressive Github README like…
Pandas is a common library for data scientists. There are different ways to process a Pandas DataFrame, but some ways are more efficient than others. This article will provide you 4 efficient ways to:
Let’s get started!
Imagine we want to assign new columns whose values depend on the values of existing columns. …
About