#11 what do song lyrics look like? — — 9 minute read

In this edition: I walk through a visualization tool for taking a closer look at your favorite songs, think about what it means to be a crab and explore the lineage of citrus fruit.

Hi, I’m Christian and this is my bi-weekly (fortnightly?) newsletter with interesting content and links orbiting the world of graph. I’d love to hear your feedback and suggestions—hit the reply button to let me know what you think.


Graphs header

In 2017 I became obsessed with the music of the Canadian band Destroyer; led by the singer and lyricist Dan Bejar. The name betrays a popular-leaning discography of albums ranging from glam-rock, obtuse chiptune orchestras, yacht-synth and Spanish folk guitar.

Destroyer isn’t for everyone. The biggest detractors cite his laconic, sometimes monotonous delivery as a barrier to entry. I found the music to meander in a way I found peaceful and compelling rather than boring. As I slowly listened to more and more albums from the Destroyer back catalogue I could see a whole world of lyrical motifs that became more apparent over time. This lyrical world-building is something I appreciate in some of my other favourite artists like The Mountain Goats & Owen Pallett.

When I started to pay more attention to the lyrics, I could see references to previous album names, future album names, other bands, Destroyer song titles, repeated references to the band name and even meta references to the song that’s currently being sung. Bejar is well known for these references littered through his music—so much so there’s a drinking game.

The connections between Destroyer songs and albums are so dense that I thought a lyrical analysis would be an interesting project to explore. I booted up a Python Jupyter notebook, scraped all the Destroyer lyrics from Genius and looked to generate n-grams from all of the lyrics and album titles. An n-gram is a basic Natural Language Processing (NLP) technique to split text into chunks of n-grams or collection of # n words. For example extracting the n=3 grams of the following sentence would give me the following:

[tell your friends about my newsletter] ⇒
[ [tell, your, friends], [your, friends, about], [friends, about, my], [about, my, newsletter] ]

On it’s own this isn’t particularly interesting, but I figured that if I were to look at the most common n-grams for all songs, I would get an cool breakdown of common phrases and themes.

It turns out that my approach was probably a little naive when it came to getting the output I was hoping for. In my (limited) experience, NLP leads to slippery slopes involving tweaking model parameters and data cleaning functions. I created a pipeline of text transformations that became quite unwieldy.

I had to put a pin in this project, but last month I felt compelled to dive back in. I had been wondering about applying a similar technique to any song or artist rather than the full repertoire of one artist. Instead of worrying about the connections across songs, what if I focused on the lyrics within a single song?

The new idea was to look at each word in a song and build a graph that focused on the connections between words based on the order they’re sung—not dissimilar to a Markov chain. A network of words would be naturally created as common words are linked across verses, bridges and choruses.

[ Very superstitious Writing's on the wall ] ⇒
[very → [superstitious] → [writing's] → [on] → [the] → [wall]

Inspired by Andrei Kashcha’s work and the vast collection of songs this technique could be applied to I built a webapp that does the following:

  1. Searches for potential song matches
  2. Parses the lyrics, stripping out stop words and non-word text
    • (e.g. in the above example “on” and “the” are removed)
  3. Generates and visualizes a graph from the adjacency of each word

I scaled the word sizes by their relative occurrence and did the same for the link widths for frequency of word pairs. I’ve only visualized a small collection of songs but it’s been fascinating (and addictive) to see the variety between various songs. You can check it out yourself at https://cjlm.dev/lyrics-graph/ but here are some notable examples:

Superstition by Stevie Wonder

The arc in the top left is a pretty common pattern to see in these networks, typically showing the verses of songs.

Harder, Better, Faster, Stronger by Daft Punk

Unsurprising for anyone familiar with the song but quite an amusing, tight, network.

Mark Zuckerberg by Nap Eyes

This is perhaps the closest graph I have found to a “single line” song—a reasonably-lengthed one for which there are no repeated words.

Of course I had to try it on the original inspiration for the project:

Painter in Your Pocket by Destroyer

So what’s next with this project?

There are some tweaks that could be made to the data cleansing to include or exclude specific stop words depending on their importance to the song (check out Take on Me by a-ha and try and spot the title in the lyrics!).

I’d also love to generalize the graph creation to work for full albums but will need a strategy for the over-linking of common, but uninteresting phrases.

Finally there’s always more work that can be done on the visuals. I like the minimalist aesthetic but highlighting other data points provided by the analytics would definitely take it to the next level. I wonder if graph diameter or similar metrics could give an interesting statistic for the lyrical form of your favorite song?


Nodes header

The popular webcomic xkcd gets more and more apt by the day. The recent comics on COVID-19 were a highlight for me. I don’t check the site regularly, so when I do I really enjoy binging weeks-worth of dry, science-adjacent humor. I often find the payoff of each comic is actually in the alt-text of the image itself.

Flicking back a few weeks I came across the comic focusing on carcinization, a great term for—well—I’ll let Wikipedia explain it in a little more depth:

an example of convergent evolution in which a crustacean evolves into a crab-like form from a non-crab-like form. The term was introduced into evolutionary biology by L. A. Borradaile, who described it as "one of the many attempts of Nature to evolve a crab".

Isn’t this fascinating? Vaguely reminiscent of simultaneous invention (see also: cadmium, calculus, and, uh, Dennis the Menace) it turns out there are quite a few examples where various organisms have independently evolved into their crab-like form we know and...love?

Here are a few examples:

  • Crab: Hairy Stone Crab
  • Genus: Lomis
  • Defining characteristics: Hairy with blue antenna
  • Probable ancestor: Hermit crab or Aegla
  • Crabbiness rating: High

  • Crab: Porcelain Crab
  • Family: Porcellanidae
  • Defining characteristics: one small pereiopod (read: leg) used for cleaning
  • Probable ancestor: Squat lobster
  • Crabbiness rating: 6/8 (it only has three pairs of legs compared to a “true” crab, which have four)

  • Crab: Red King Crab
  • Family: Lithodidae
  • Cool fact: as they age these crabs move to deeper and deeper water before living most of their adult lives at more than 200 metres from the surface
  • Probable ancestor: Hermit crab
  • Crabbiness rating: “way too big” / 8

Exploring carcinization in graphs is the obvious next move here—sideways perhaps?—and there’s a variety of literature on the topic, including this excellently-named paper from 2019: “What is it like to be a crab?”. In true source/target meta fashion here’s a graph utilizing Connected Papers highlighted in the last edition to show prior art for this article, mainly papers on the use of complex networks for similar analysis.

Itching for more crab content? The eye-catching crustacean graph above accompanies this essay in National Geographic on the fascinating life of crabs.


Links header

  • For a visual cleanse after all that crab imagery, check out this (I swear I’m not sponsored by Nat Geo) short article on the family tree of citrus fruits. I found this through the fruit version of the “this X does not exist” trend. Check it out for some truly bizarre fruit mashups.

  • Here’s an introduction to link analysis from a journalistic lens. I never saw Channel 4’s Who Knows Who project. It looks like it’s been offline for a while and I have no idea how I’d run a Flash applet in 2020.

  • A small project from Nicholas de Jong to generate and visualize circle-of-trust networks from keybase.io—acquired by Zoom earlier this year.

  • I’m coming around to videos as a particularly efficient way to learn new skills. After my foray into Gephi last edition I found these great videos from Mathieu Jacomy that would have been very helpful when I was learning the ropes.


Thanks again for subscribing, I’d love to know what you think. Oh and don’t forget to send me lyrics graphs you find interesting. There’s (possibly) a prize for the first person to find a fabled “single line” song…

Stay safe, I’ll see you again in two weeks.