For years, I used to spend much of my time performing Exploratory Data Analysis directly in SQL. Over time, the queries I wrote became very complicated, and it was often...

While data scientists may do nearly everything in Pandas, we also need to perform file operations in regular Python and in applications not tied to dataframes. Thankfully, Python makes it...

There are hundreds of excellent Python data science libraries and packages that you’ll encounter when working on data science projects. However, there are four of them that you’ll probably use...

Word clouds (also known as tag clouds, wordles, or weighted lists) have been around since the mid nineties and are one of the most effective data visualisations for representing the...

One of the key steps in the Exploratory Data Analysis process that comes before model development is to understand the statistical distribution of the variables or features within the data...

The Venn diagram is one of the most intuitive data visualisations for showing the overlap between two or three groups, or “sets”, of data. These diagrams were created in the...

Line charts, line graphs, or line plots are among the most widely used data visualisations. They’re ideal for time series data in which you’re plot a metric on the y...

Barplots or bar charts are probably the most widely used visualisation for displaying and comparing categorical variables. They’re very easy to understand and are quick and easy to generate.

Pearson’s product-moment correlation, or Pearson’s R, is a statistical method commonly used in data science to measure the strength of the linear relationship between variables. If you can identify existing...