• overview

  • Course welcome
  • wrangle

  • Filter
  • Arrange
  • Mutate
  • Wrap up
  • visualize

  • Getting started
  • Geoms
    • Aesthetics
    • Facets
    • summarize

    • Summarize
    • Group by
    • Visualizing summaries
    • plot types

    • Line plots
    • Bar plots
    • Histograms
    • Boxplots

    Geoms

    Using plotnine Geoms

    (billie
     >> ggplot(aes("energy", "valence"))
      + geom_point()
    )

    png

    In the previous lesson, you created this scatter plot comparing the energy of each song to its valence.

    This plot communicates some interesting information: we can see that higher income countries tend to have higher life expectancy.

    One problem with this plot, however, is that it's impossible to tell which song is which. Adding labels to the point will let us identify songs in the plot.

    Using geom_label

    (billie
     >> ggplot(aes("energy", "valence", label = "track_name"))
      + geom_label()
    )

    png

    The first option for plotting text is using geom_label(). This plots the text with a filled in box around it.

    Unlike scanning a DataFrame of results, it's easy to see pick out songs across a range of valence and energy.

    For example, relative to other songs:

    • "i love you" is low valence and low energy
    • "bad guy" is high valence and high energy
    • "bellyache" is extremely high energe and a moderate amount of valence

    Using geom_text

    (billie
     >> ggplot(aes("energy", "valence", label = "track_name"))
      + geom_text()
    )

    png

    A second option for plotting text is with geom_text(). This plots text by itself, without a background box.

    Notice the words in front don't cover other words as much, but it can also be harder to read them.

    Combining geoms

    (billie
     >> ggplot(aes("energy", "valence", label = "track_name"))
      + geom_text(nudge_y = .1)
      + geom_point()
    )

    png

    When you plot multiple geoms--like in this plot--ggplot goes line-by-line, from top to bottom.

    For example, in the code here, it first puts down the words from geom_text(), then the points from geom_point().

    Notice also that geom_text() has an option passed to it, nudge_y = ,1.

    More on geom options

    In general, there are many arguments you can pass to geoms, and different geoms can take different arguments.

    The easiest way to understand what arguments geoms can take, and what they do, is to look at the plotnine documentation website.

    Here are two useful parts of the docs:

    • list of available geoms
    • help document on geom_text

    Let's practice!

    In the exercises, you'll practice creating other scatter plots to compare variables across countries, and in the rest of this chapter you'll learn more ways to communicate information in a graph.

    Exercise 1:

    The options below let you change different arguments to geom_text(). Try changing them and running the code, in order to get a readable plot. Then, answer the questions underneath the plot.

    • :
    • :
    • :

    Below are three songs at different corners of the graph. Can you tell whether they have high or low energy? Valence? Which do you think has low energy and low valence?

    Hammer to Fall

    Crazy Little Thing Called Love

    Love of My Life

    Exercise 2:

    This exercise is a case study on selecting extreme differences between two features, such as energy and acousticness.

    At the end of the case study, you'll be prompted to add code!

    Generally tracks with higher energy tend to be less acoustic, as shown in the plot below.

    png

    But notice that in the plot above, there's a point in the top right, that is high energy and high acousticness.

    In order to find high energy and acousticness songs like this, I used the following code.

    artistalbumtrack_nameenergyvalencedanceabilityspeechinessacousticnesspopularityduration
    23989MC Kevin o ChrisVamos pra GaiolaVamos pra Gaiola0.9710.5210.8720.28100.91700061161.600
    5210ScHoolboy QCrasH TalkBlack Folk0.9020.4000.7340.39600.83100051147.040
    24928MC Kevin o ChrisEu Vou pro Baile da GaiolaEu Vou pro Baile da Gaiola0.9570.6420.8320.10500.82400052123.220
    .................................
    18950Foo FightersThere Is Nothing Left To LoseLearn to Fly0.9190.5370.4650.04080.00001874235.293
    20424Foo FightersOne By One (Expanded Edition)Times Like These0.9080.2650.3770.08990.00001468265.560
    21870Turmion KätilötGlobal WarningJumalauta0.9390.5490.4540.06180.00001042210.107

    812 rows × 10 columns

    Can you plot songs by MC Kevin o Chris, with both points and text?

    ⚠️: Don't forget to replace all the blanks!

    Why do you think Vamos pra Gaiola is high energy and high acousticness?

    answer My best guess is because the drums are done by a persons voice, there are few instruments, but it is still a pretty fast tempo. It would be interesting to look at their other tracks for a comparison.

    Can you modify each code block in the case study to be about high energy and low danceability songs?

    prev pagenext page