How To Make Histograms with Matplotlib in Python?

Histogram with Matplotlib's hist(): Right number of Bins
Histogram with Matplotlib's hist(): Right number of Bins

If you’ve ever stared at a column of numbers in a dataset and struggled to understand its underlying story, you’re not alone. While many libraries can create plots quickly, true mastery of Python visualization comes from understanding Matplotlib—the foundational engine that gives you full control over every detail of your chart.

This is where the histogram shines. It’s your fastest tool for instantly understanding the shape and story of your data, and building it with Matplotlib gives you the power to customize it exactly to your needs.

In this comprehensive guide, we’ll walk you through everything you need to know about creating insightful and highly customized histograms with Matplotlib, from your first simple plot to advanced comparative techniques.

What Can a Histogram Tell Us?

Before we write any code, it’s crucial to understand what a histogram actually does. Its primary job is to take a large set of numbers and visualize its distribution—in other words, to show you the underlying shape of your data. While a simple average can be misleading, a histogram gives you a much more complete picture by answering several key questions in a single glance.

A well-made histogram allows you to instantly spot the story within your data. You can immediately see where the majority of your data points are clustered by identifying the tallest bars, a concept known as the central tendency. You can also see how spread out the data is. Does it have a long tail stretching to one side? This reveals skewness, which tells you if your data is symmetric or leans in one direction. Are there small, isolated bars far away from the main group? These could be outliers or anomalies that warrant further investigation.

Perhaps the most powerful use of a histogram is for comparison. By visualizing the distributions for different categories side-by-side (for example, comparing the bill amounts for smokers vs. non-smokers), you can quickly determine if a variable behaves differently across various groups in your dataset. This ability to compare distributions is fundamental to effective data analysis.

Creating a Basic Histogram in Matplotlib

Now that we understand the “why” behind histograms, let’s create our first one. Instead of using abstract random numbers, we’ll use a palmer penguin dataset to make histograms

First, we’ll import matplotlib.pyplot and define our data. Then, we will use the plt.hist() function to generate the plot.

# Import the necessary libraries
import seaborn as sns
import matplotlib.pyplot as plt

sourcecode language=”python”]
# Load the built-in penguins dataset
penguins = sns.load_dataset(“penguins”)
penguins.head()
[/sourcecode]

Matplotlib’ Pyplot has hist() function that takes the data as input and makes histogram. In addition to data, hist() function can take a number of arguments to customize the histogram.

# Histogram with hist() in Matplotlib
# You can pass the DataFrame column directly into plt.hist().
# Matplotlib will automatically ignore any missing values.
plt.hist(penguins['flipper_length_mm'], edgecolor='black')

# Add titles and labels for a complete, professional-looking plot
plt.title("Distribution of Penguin Flipper Lengths", size=20)
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Frequency")
plt.savefig("Histogram_with_matplotlib_hist.png")
Histogram example with Matplotlib’s hist() function

Interpreting the histogram made with Matplotlib

The code above generates our histogram for penguin flipper lengths.

From this chart, we can already see interesting patterns. The distribution is not a single symmetric bell shaped curve. Instead, it appears to be bimodal, meaning it has two distinct peaks (one around 190mm and a larger one around 215mm). This immediately suggests that our dataset, flipper length, might contain at least two different subgroups of penguins with different typical flipper lengths. This is a powerful insight that would be difficult to see just by looking at the raw numbers.

Of course. Here is the entire section on “Choosing the Right Number of Bins” formatted as a single HTML code block for your post.

Choosing the Right Number of Bins: A Critical Step

In the previous plot, Matplotlib made an automatic guess for the number of bins (the bars) to display. While this default is often a reasonable start, the number of bins you choose can dramatically alter the plot’s appearance and the insights you draw from it.

Choosing the right number of bins is a critical step.

  • Too few bins can oversimplify the data, hiding important details and patterns.
  • Too many bins can create a noisy, chaotic plot that is difficult to interpret.

Let’s see this in action using our penguin flipper length data.

Example 1: Histogram with Too Few Bins

Here’s what happens when we use only 5 bins. This forces all the data into just a few wide bars.

# Histogram with too few bins (oversimplified)
plt.hist(penguins['flipper_length_mm'],
         bins=5, 
         edgecolor='black')

plt.title("Oversimplified Distribution with Too Few Bins", size=20)
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Frequency")
plt.savefig("Histogram_with_matplotlib_too_few_bins.png")
Histogram with Matplotlib’s hist(): Example with Too Few Bins

Result: With so few bins, the bimodal (two-peak) pattern we saw earlier is completely hidden. The histogram is now misleading, as it suggests a single unimodal distribution with broad peak.

Example 2: Histogram with Too Many Bins

Now, let’s go to the other extreme and use 100 bins.

# Histogram with too many bins (noisy)
plt.hist(penguins['flipper_length_mm'], 
         bins=100, 
         color="salmon",
         edgecolor='black')
plt.title("Histogram: Noisy Distribution with Too Many Bins", 
          size=16)
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Frequency")
plt.savefig("Histogram_with_matplotlib_too_many_bins.png")
Histogram with Matplotlib’s hist(): Too Many Bins

Result: When the number of bins is too many, the histogram we get is too “spiky” and detailed. While it technically shows the distribution, the noise makes it hard to identify the underlying shape and the main peaks.

Example 3: A Better Choice

# Histogram with a more reasonable number of bins
plt.hist(penguins['flipper_length_mm'],
         bins=30, 
         color='lightgreen', 
         edgecolor='black')
plt.title("Clearer Distribution with a Balanced Number of Bins", size=16)
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Frequency")
plt.savefig("Histogram_with_matplotlib_balanced_bin_number.png")
Histogram with Matplotlib’s hist(): Right number of Bins

Result: This histogram looks much better with the right bin size. The plot is smooth enough to clearly show the bimodal nature of the data without being noisy. We can confidently see the two distinct groups of penguins.

Customizing Histogram’s Appearance

Now that we have an accurately binned histogram, let’s improve its appearance. Matplotlib gives you full control over the style of your plot. We’ll cover a few key parameters that will make your histogram clearer and more professional.

1. Changing Bar Color with color

The default blue is fine, but you can change the color of the bars to match your presentation or brand using the color parameter. You can use common color names (like ‘skyblue’, ‘salmon’, ‘lightgreen’) or even hex codes.

# Changing the bar color to skyblue
plt.hist(penguins['flipper_length_mm'], 
         bins=30,
         color='skyblue',
         edgecolor='black')
plt.title("Histogram with Custom Bar Color")
plt.savefig("Histogram_with_custom_bar_color.png")
Matplotlib histogram: Custom Bar color Example

2. Adjusting Transparency with alpha

The alpha parameter controls the transparency of the bars. It takes a value between 0 (completely transparent) and 1 (completely opaque). Adjusting transparency is especially useful when you want to overlap multiple histograms, as we will see in the next section. A value of 0.7 is often a good choice to soften the look of a plot.

# Making the bars partially transparent
plt.hist(penguins['flipper_length_mm'], 
         bins=30, 
         color='steelblue',
         edgecolor='black',
         alpha=0.5)
plt.title("Histogram with Transparency")
plt.savefig("Histogram_with_transparency.png")
Matplotlib histogram: Set Transparency with alpha

Putting It All Together: A Polished Histogram

Let’s combine these parameters to create a single, polished, presentation-ready histogram. We will also increase the figure size using plt.figure() to make it larger and more readable.

#Set the figure size before plotting
plt.figure(figsize=(10, 6))

# Create a polished histogram with several customizations
plt.hist(penguins['flipper_length_mm'], 
         bins=30, 
         color='#6A5ACD',  # Using a hex code for a specific color (SlateBlue)
         edgecolor='black', 
         alpha=0.7)
plt.title("Polished Histogram of Penguin Flipper Lengths", fontsize=16)
plt.xlabel("Flipper Length (mm)", fontsize=12)
plt.ylabel("Frequency", fontsize=12)
plt.savefig("Publication_ready_Histogram_with_Matplotlib.png")
Publication Ready Histogram with Matplotlib’s hist()

Advanced Histogram Techniques

Now that you can create and style a single histogram, let’s move on to some advanced techniques that are essential for data analysis. These methods allow you to compare distributions across different groups.

1. Plotting a Density Histogram

So far, our y-axis has represented the raw count of data points in each bin (frequency). Sometimes, it’s more useful to display a probability density. In this case, the y-axis is scaled such that the total area of all the bars in the histogram equals 1. This is useful for comparing the shape of distributions for groups that have different numbers of data points.

You can do this in Matplotlib by setting the density parameter to True.

# Plotting a probability density histogram
plt.hist(penguins['flipper_length_mm'], 
         bins=30, 
         density=True,  # This is the key parameter
         color='c', 
         edgecolor='black', 
         alpha=0.7)

plt.title("Density Histogram of Penguin Flipper Lengths", size=18)
plt.xlabel("Flipper Length (mm)")
plt.ylabel("Density") # Note the change in the y-axis label
plt.savefig("Density_Histogram_with_Matplotlib.png")

Result: The shape of the density histogram is the same as before, but the y-axis now shows density instead of frequency. This normalized view is the standard way to compare distributions.

Density Histogram with Matplotlib’s hist()

2. Overlapping Histograms for Comparison

The most powerful use of a histogram is to compare the distributions of two or more groups. Let’s compare the flipper lengths of the Adelie and Gentoo penguin species.

Plotting Multiple Histograms for Comparison

When you need to compare distributions across several categories in your data, a powerful and efficient method is to programmatically loop through each unique category and plot its histogram on the same axes.

The process is:

  1. Identify the unique categories you want to plot (the penguin species).
  2. Loop through each category.
  3. In each loop, filter the data for that category and plot its histogram.

This approach is scalable and gives you full control over the visualization.

# Set the figure size
plt.figure(figsize=(10, 6))

# Get a list of unique species from the 'species' column
species_list = penguins['species'].unique()

# Loop through each species and plot its histogram
for species in species_list:
    # Filter the DataFrame for the current species
    subset = penguins[penguins['species'] == species]
    
    # Plot the histogram for the filtered data
    # We add a label for the legend and use alpha for transparency
    plt.hist(subset['flipper_length_mm'], 
             bins=20, 
             alpha=0.6,
             edgecolor="black",
             label=species)

# Add a legend to distinguish the histograms
plt.legend()

plt.title("Comparison of Flipper Length Distribution Across Species", fontsize=16)
plt.xlabel("Flipper Length (mm)", fontsize=12)
plt.ylabel("Frequency", fontsize=12)
plt.savefig("Overlapping_histograms_of_multiple_groups_with_Matplotlib.png")
Overlapping histograms of multiple groups with Matplotlib

Result (The Analytical Payoff)

This plot provides a clear comparison of the flipper length distributions for all penguin species. By layering them with transparency, we can easily see how the groups differ. The distribution for Adelie is centered around a lower flipper length (approx. 190mm) than the distributions for Gentoo (approx. 217mm) and Chinstrap penguins. This programmatic approach is a powerful technique for comparing multiple groups in your data.

Summary and Key Lessons

Congratulations! You have now walked through a complete guide to creating insightful, publication-quality histograms using Matplotlib. You have the skills to move beyond basic plots and use histograms as a powerful tool for data exploration.

Here are the key lessons from this guide:

  • Master the Fundamentals: A histogram is your best tool for understanding the true shape and distribution of your data. Matplotlib’s plt.hist() is the foundational function for this, giving you a powerful starting point for any analysis.
  • Binning is a Critical Choice: The most important parameter for any histogram is the number of bins. As we saw, your choice can either hide or reveal key patterns in the data. Always experiment to find the clearest representation.
  • Customize for Clarity and Impact: Matplotlib provides full control over the appearance of your plot. Using parameters like color, edgecolor, and alpha will elevate your charts from simple drafts to polished, professional-quality visualizations.
  • Compare Groups Programmatically: The real power of analysis comes from comparison. By looping through categories and plotting them on the same axes, you can efficiently create compelling, comparative histograms that reveal deep insights into your data.

The best way to solidify these skills is to apply them. Download the Jupyter Notebook and Try using one of your own datasets and see what stories you can uncover with Matplotlib. If you have any questions or discover a cool insight, share it in the comments below!

And you want to make histograms using Seaborn in Python, check out the detailed blog post “Make Beautiful Histograms with Seaborn in Python

Exit mobile version