Duplicate values are a common occurrence in data science, and they come in various forms. Not only will you need to be able to identify duplicate values, but you will...
When working with Pandas, you’ll often need to identify and count unique values in a DataFrame. This is a common task in data science, and Pandas provides two methods to...
Although all business know the importance of retaining customers, few companies are actually able to measure customer retention accurately, and fewer still can predict which ones will churn or be...
When working with Pandas dataframes you’ll commonly need to sort the data in some way. This is easy to do with the sort_values() and sort_index() methods. These two methods allow...
When working with Pandas, you’ll often need to sort a dataframe by one or more columns. While the Pandas sort_values() method makes it easy to sort categorical data in alphabetical...
When working with a Pandas dataframe you’ll sometimes need to convert the dataframe or a series to a list or dictionary. There are certain operations that are easier to perform...
Pandas is extremely versatile and includes a wide range of different methods you can use to add a new column or series to an existing dataframe. Whether you want to...
When building a machine learning model, feature engineering is one of the most important steps. Feature engineering is the process of creating new features from existing data and can often...
The CatBoost model is a gradient boosting model that is based on decision trees, much like XGBoost, LightGBM, and other tree-based models. It is a very popular model for tabular...