Open in app

Sign In

Write

Sign In

Khuyen Tran
Khuyen Tran

28K Followers

Home

About

Pinned

About Me

Who I am and What Motivates me to Write — Hello! I’m Khuyen Tran. I have been writing on Medium since December 2019, but I haven’t properly introduced myself so I wrote this article to do so. I major in statistics, but I love playing with data science and Python tools and share them with others in my free time…

About Me

4 min read

About Me
About Me
About Me

4 min read


Published in Towards Data Science

·Jan 15

How to Structure an ML Project for Reproducibility and Maintainability

Start Your Next ML Project With This Template — Motivation Getting started is often the most challenging part when building ML projects. How should you structure your repository? Which standards should you follow? Will your teammates be able to reproduce the results of your experimentations? Instead of trying to find an ideal repository structure, wouldn’t it be…

Data Science

7 min read

How to Structure an ML Project for Reproducibility and Maintainability
How to Structure an ML Project for Reproducibility and Maintainability
Data Science

7 min read


Published in Towards Data Science

·Jan 1

Human-Learn: Rule-Based Learning as an Alternative to Machine Learning

Incorporate Domain Knowledge into Your Model with Rule-Based Learning — Motivation You are given a labeled dataset and assigned to predict a new one. What would you do? The first approach that you probably try is to train a machine learning model to find rules for labeling new data. This is convenient, but it is challenging to…

Machine Learning

7 min read

Human-Learn: Rule-Based Learning as an Alternative to Machine Learning
Human-Learn: Rule-Based Learning as an Alternative to Machine Learning
Machine Learning

7 min read


Published in Towards Data Science

·Dec 21, 2022

Build a Full-Stack ML Application With Pydantic And Prefect

Create a UI for ML Feature Engineering in One Line of Code — Motivation As a data scientist, you might frequently adjust your feature engineering process and tune your machine learning models to get a good result. Instead of digging into your code to change function parameters: …, wouldn’t it be nice if you could change the parameter values from the UI?

Python

8 min read

Build a Full-Stack ML Application With Pydantic And Prefect
Build a Full-Stack ML Application With Pydantic And Prefect
Python

8 min read


Published in Towards Data Science

·Dec 6, 2022

River: Online Machine Learning in Python

A Fast and Cheap Approach to Update an ML Model in Production — Problem with Batch Learning It is common for data practitioners to use batch learning to learn from data. Batch learning is the training of ML models in batch. An ML pipeline with batch learning typically includes: Splitting the data into train and test sets Fitting a model to the train set Computing the…

Machine Learning

9 min read

River: Online Machine Learning in Python
River: Online Machine Learning in Python
Machine Learning

9 min read


Published in Towards Data Science

·Nov 29, 2022

Create Observable and Reproducible Notebooks with Hex

How to integrate notebooks into your data pipeline — Motivation Jupyter Notebook is a popular tool for data scientists to explore and process data because of its ease to check code outputs. However, there are several drawbacks to Jupyter Notebook, including: Interpretability issue: As your code gets bigger, the relationship between cells becomes increasingly complex. Thus, when one cell…

Python

7 min read

Create Observable and Reproducible Notebooks with Hex
Create Observable and Reproducible Notebooks with Hex
Python

7 min read


Published in Towards Data Science

·Nov 1, 2022

DVC + GitHub Actions: Automatically Rerun Modified Components of a Pipeline

A Perfect Combo to Quickly Iterate on Your DS Project — Motivation Imagine your data pipeline looks similar to the graph below. The pink box represents a stage, which is an individual data process. Dependencies are the files that a stage depends on, such as parameters, Python scripts, or input data. Now imagine Dependencies 2 changes. …

Data Science

6 min read

DVC + GitHub Actions: Automatically Rerun Modified Components of a Pipeline
DVC + GitHub Actions: Automatically Rerun Modified Components of a Pipeline
Data Science

6 min read


Published in Towards Data Science

·Oct 13, 2022

Create a Maintainable Data Pipeline with Prefect and DVC

Make Your Pipelines Easier to Support and Maintain — Motivation In engineering, maintainability is the ease with which a product can be maintained to: Meet new requirements Cope with a changing environment Improve the performance of the product In a data science project, it is essential to build a maintainable pipeline because: The characteristics of data can change frequently Data…

Data Science

8 min read

Create a Maintainable Data Pipeline with Prefect and DVC
Create a Maintainable Data Pipeline with Prefect and DVC
Data Science

8 min read


Published in Towards Data Science

·Sep 22, 2022

4 Tools to Automatically Extract Data from Datetime in Python

How to Extract Datetime From Text and Data From Datetime — Motivation Datetime features are important for time series and forecasting. However, you don’t always have clean datetime features to work with. Wouldn’t it be nice if you can automatically extract datetime from text and data from datetime? In this article, you will learn 4 tools to do exactly that. datefinder: Automatically Find Dates and Time in a Python String

Python

4 min read

4 Tools to Automatically Extract Data from Datetime in Python
4 Tools to Automatically Extract Data from Datetime in Python
Python

4 min read


Published in Towards Data Science

·Sep 16, 2022

Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing

Create Clean and Robust Tests with Property-Based Testing — Motivation Imagine you are trying to figure out whether the function processing_fn is working properly. You use pytest to test the function with an example. The test passed, but you know that one example is not enough. You need to test the function with more examples to make…

Pandas

4 min read

Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing
Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing
Pandas

4 min read

Khuyen Tran

Khuyen Tran

28K Followers

DeveRel at Prefect. I share a little bit of goodness every day through daily data science tips: https://mathdatasimplified.com

Following
  • Entreprogrammer

    Entreprogrammer

  • Sofien Kaabar, CFA

    Sofien Kaabar, CFA

  • Dmytro Khmelenko

    Dmytro Khmelenko

  • Pranay Dave

    Pranay Dave

  • George J. Ziogas

    George J. Ziogas

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech