• overview

  • Course welcome
  • wrangle

  • Filter
  • Arrange
  • Mutate
  • Wrap up
  • visualize

  • Getting started
  • Geoms
  • Aesthetics
  • Facets
    • summarize

    • Summarize
    • Group by
    • Visualizing summaries
    • plot types

    • Line plots
    • Bar plots
    • Histograms
    • Boxplots

    Facets

    Faceting

    You've learned to use color in your scatterplots.

    Now you'll learn another way to explore your data. plotnine lets you divide your plot into subplots to get one smaller graph for each level of a variable.

    This is called faceting, and it's another powerful way to communicate relationships within your data.

    Faceting

    asia_top200 = (
      music_top200
      >> filter(_.continent == "Asia")
    )
    asia_top200

    country position track_name artist streams duration continent
    4600 Hong Kong 1 WANNABE ITZY 112648 191.242 Asia
    4601 Hong Kong 2 Intentions (feat. Quavo) Justin Bieber 104467 212.867 Asia
    4602 Hong Kong 3 Señorita Shawn Mendes 84196 190.960 Asia
    ... ... ... ... ... ... ... ...
    12197 Viet Nam 198 Đưa Nhau Đi Trốn (Chill Version) Đen 20750 241.959 Asia
    12198 Viet Nam 199 Hôm Nay Tôi Buồn Phùng Khánh Linh 20580 275.000 Asia
    12199 Viet Nam 200 Kick It NCT 127 20495 233.013 Asia

    2600 rows × 7 columns

    For this example, we'll use Spotify top 200 track data for countries in Asia.

    Faceting

    (asia_top200
      >> ggplot(aes("position", "streams", color = "country"))
       + geom_point()
    )

    png

    This plot shows the number of streams for each top 200 track in Asian countries.

    There are so many countries that the plot hits two issues:

    1. it has to use many different colors.
    2. many points are on top of each other.

    A facetted plot could help here, by giving each country its own small panel.

    Faceting

    (asia_top200
      >> ggplot(aes("position", "streams", color = "country"))
       + geom_point()
       + facet_wrap('~country')
    )

    png

    You facet a plot by adding another option, with a +, to the end of your code, after geom_point.

    You add facet underscore wrap, then "tilde continent" within the parentheses.

    In R, the tilde symbol typically means "by", meaning that we're splitting the plot by continent, and you can usually find it on the upper left of your keyboard. This tells ggplot2 to divide the data into subplots based on the continent variable.

    Faceting is a powerful tool, and in the exercises you'll see how you can use faceting not just to compare among continents, but to compare between all of the years in our dataset.

    Let's practice!

    Scroll down to get started with practice!

    Exercise 1:

    TODO

    Exercise 2:

    Below is the start of plotnine's documentation for facet_wrap.

    Notice that the Parameters section lists ncol and nrow options. These determine how many columns or rows to use. For example, the plot below has nrow = 1.

    Try out the plot as is, and with the nrow argument changed to ncol = 1. Then, answer the questions below.

    png

    Test yourself

    Which of the three artists tends to have the lowest valence?

    (click to answer)

    Correct! We'll discuss how to measure the idea of "tends to have" in the next chapter.
    That's not right. This artist tends to have the highest valence.
    That's not right. Look for the cluster of points toward the bottom of the plot.

    Which value seems easier to compare across facets, when ncol is set to 1?

    answer This is subjective, but I would say energy is easier to compare, since there is only one x-axis for across all plots. (For example, there is a only one spot on the x-axis where energy is .25).
    prev pagenext page