Open in app

Sign In

Write

Sign In

Khuyen Tran
Khuyen Tran

30K Followers

Home

About

Pinned

About Me

Who I am and What Motivates me to Write — Hello! I’m Khuyen Tran. I have been writing on Medium since December 2019, but I haven’t properly introduced myself so I wrote this article to do so. I major in statistics, but I love playing with data science and Python tools and share them with others in my free time…

About Me

4 min read

About Me
About Me
About Me

4 min read


Published in Towards Data Science

·Mar 11

Write Readable Tests for Your Machine Learning Models with Behave

Use natural language to test the behavior of your ML models — Motivation Imagine you create an ML model to predict customer sentiment based on reviews. Upon deploying it, you realize that the model incorrectly labels certain positive reviews as negative when they’re rephrased using negative words. This is just one example of how an extremely accurate ML model can fail without…

Python

9 min read

Write Readable Tests for Your Machine Learning Models with Behave
Write Readable Tests for Your Machine Learning Models with Behave
Python

9 min read


Published in Towards Data Science

·Jan 15

How to Structure an ML Project for Reproducibility and Maintainability

Start Your Next ML Project With This Template — Motivation Getting started is often the most challenging part when building ML projects. How should you structure your repository? Which standards should you follow? Will your teammates be able to reproduce the results of your experimentations? Instead of trying to find an ideal repository structure, wouldn’t it be nice to have…

Data Science

7 min read

How to Structure an ML Project for Reproducibility and Maintainability
How to Structure an ML Project for Reproducibility and Maintainability
Data Science

7 min read


Published in Towards Data Science

·Jan 1

Human-Learn: Rule-Based Learning as an Alternative to Machine Learning

Incorporate Domain Knowledge into Your Model with Rule-Based Learning — Motivation You are given a labeled dataset and assigned to predict a new one. What would you do? The first approach that you probably try is to train a machine learning model to find rules for labeling new data. This is convenient, but it is challenging to…

Machine Learning

7 min read

Human-Learn: Rule-Based Learning as an Alternative to Machine Learning
Human-Learn: Rule-Based Learning as an Alternative to Machine Learning
Machine Learning

7 min read


Published in Towards Data Science

·Dec 21, 2022

Build a Full-Stack ML Application With Pydantic And Prefect

Create a UI for ML Feature Engineering in One Line of Code — Motivation As a data scientist, you might frequently adjust your feature engineering process and tune your machine learning models to get a good result. Instead of digging into your code to change function parameters: …, wouldn’t it be nice if you could change the parameter values from the UI?

Python

8 min read

Build a Full-Stack ML Application With Pydantic And Prefect
Build a Full-Stack ML Application With Pydantic And Prefect
Python

8 min read


Published in Towards Data Science

·Dec 6, 2022

River: Online Machine Learning in Python

A Fast and Cheap Approach to Update an ML Model in Production — Problem with Batch Learning It is common for data practitioners to use batch learning to learn from data. Batch learning is the training of ML models in batch. An ML pipeline with batch learning typically includes: Splitting the data into train and test sets Fitting a model to the train set Computing the…

Machine Learning

9 min read

River: Online Machine Learning in Python
River: Online Machine Learning in Python
Machine Learning

9 min read


Published in Towards Data Science

·Nov 29, 2022

Create Observable and Reproducible Notebooks with Hex

How to integrate notebooks into your data pipeline — Motivation Jupyter Notebook is a popular tool for data scientists to explore and process data because of its ease to check code outputs. However, there are several drawbacks to Jupyter Notebook, including: Interpretability issue: As your code gets bigger, the relationship between cells becomes increasingly complex. …

Python

7 min read

Create Observable and Reproducible Notebooks with Hex
Create Observable and Reproducible Notebooks with Hex
Python

7 min read


Published in Towards Data Science

·Nov 1, 2022

DVC + GitHub Actions: Automatically Rerun Modified Components of a Pipeline

A Perfect Combo to Quickly Iterate on Your DS Project — Motivation Imagine your data pipeline looks similar to the graph below. The pink box represents a stage, which is an individual data process. Dependencies are the files that a stage depends on, such as parameters, Python scripts, or input data. Now imagine Dependencies 2 changes. …

Data Science

6 min read

DVC + GitHub Actions: Automatically Rerun Modified Components of a Pipeline
DVC + GitHub Actions: Automatically Rerun Modified Components of a Pipeline
Data Science

6 min read


Published in Towards Data Science

·Oct 13, 2022

Create a Maintainable Data Pipeline with Prefect and DVC

Make Your Pipelines Easier to Support and Maintain — Motivation In engineering, maintainability is the ease with which a product can be maintained to: Meet new requirements Cope with a changing environment Improve the performance of the product In a data science project, it is essential to build a maintainable pipeline because: The characteristics of data can change frequently Data…

Data Science

8 min read

Create a Maintainable Data Pipeline with Prefect and DVC
Create a Maintainable Data Pipeline with Prefect and DVC
Data Science

8 min read


Published in Towards Data Science

·Sep 22, 2022

4 Tools to Automatically Extract Data from Datetime in Python

How to Extract Datetime From Text and Data From Datetime — Motivation Datetime features are important for time series and forecasting. However, you don’t always have clean datetime features to work with. Wouldn’t it be nice if you can automatically extract datetime from text and data from datetime? In this article, you will learn 4 tools to do exactly that. datefinder: Automatically Find Dates and Time in a Python String

Python

4 min read

4 Tools to Automatically Extract Data from Datetime in Python
4 Tools to Automatically Extract Data from Datetime in Python
Python

4 min read

Khuyen Tran

Khuyen Tran

30K Followers

I share a little bit of goodness every day through daily data science tips: https://mathdatasimplified.com

Following
  • Tim Denning

    Tim Denning

  • Sofien Kaabar, CFA

    Sofien Kaabar, CFA

  • George J. Ziogas

    George J. Ziogas

  • Kurtis Pykes

    Kurtis Pykes

  • Christopher Tao

    Christopher Tao

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech