How to identify and remove duplicate values in Pandas

Duplicate values are a common occurrence in data science, and they come in various forms. Not only will you need to be able to identify duplicate values, but you will...

How to identify and count unique values in Pandas

When working with Pandas, you’ll often need to identify and count unique values in a DataFrame. This is a common task in data science, and Pandas provides two methods to...

How to create a customer retention model with XGBoost

Although all business know the importance of retaining customers, few companies are actually able to measure customer retention accurately, and fewer still can predict which ones will churn or be...

How to use sort_values() to sort a Pandas DataFrame

When working with Pandas dataframes you’ll commonly need to sort the data in some way. This is easy to do with the sort_values() and sort_index() methods. These two methods allow...

How to use Pandas CategoricalDtype to create custom sort orders

When working with Pandas, you’ll often need to sort a dataframe by one or more columns. While the Pandas sort_values() method makes it easy to sort categorical data in alphabetical...

How to convert a Pandas dataframe or series to a list

When working with a Pandas dataframe you’ll sometimes need to convert the dataframe or a series to a list or dictionary. There are certain operations that are easier to perform...

How to add a new column to a Pandas dataframe

Pandas is extremely versatile and includes a wide range of different methods you can use to add a new column or series to an existing dataframe. Whether you want to...

How to add feature engineering to a scikit-learn pipeline

When building a machine learning model, feature engineering is one of the most important steps. Feature engineering is the process of creating new features from existing data and can often...

How to tune a CatBoostClassifier model with Optuna

The CatBoost model is a gradient boosting model that is based on decision trees, much like XGBoost, LightGBM, and other tree-based models. It is a very popular model for tabular...