Solid Serialization of Béyoncé and her Spotify Friends¶
This assignment is part of: :doc`/syllabus/assignments/homework/solid-serialization-skills`
Contents
Using a JSON data response from Spotify’s API, sort/filter/analyze the artists most related to Béyoncé, according to Spotify’s algorithm.
The data file is mirrored here:
http://stash.compciv.org/2017/spotify-beyonce-related-artists.json
You can read Spotify’s documentation of its related-artists
endpoint here, including a description of each field:
https://developer.spotify.com/web-api/get-related-artists/
While messing around with the Spotify API may not be the most high-minded civic use of our time, it’s certainly a really fun API with broad appeal to just about anyone. The fact that it’s one of the best documented and easily accessible commercial APIs makes it a no-brainer to play with.
Deliverables¶
You are expected to deliver a script named: spotify_beyonce_relations.py
This script has 4 prompts, which means at the very least, your script will contain 4 separate function definitions, from foo_1 to foo_4.
When I run your script from the command line, I expect to see something like this:
$ python spotify_beyonce_relations.py
Done running assertions!
And I expect your script to have this code at the bottom:
def foo_assertions():
assert type(foo_1()) is list
assert len(foo_1()) is 20
assert foo_1()[0] == "Destiny's Child"
assert type(foo_2()) is list
assert foo_2()[0] == ['name', 'followers']
assert foo_2()[-1][0] == "Kelis"
assert type(foo_3()) is list
assert foo_3()[0]['name'] == 'Rihanna'
assert foo_3()[-1]['popularity'] == 62
assert type(foo_4()) is list
assert foo_4()[0][1] == 20
if __name__ == '__main__':
foo_assertions()
print("Done running assertions!")
That script should have a if __name__ == '__main__'
conditional block, in which a function named foo_assertions()
is executed.
Prompts¶
The mirror of the related-artists.json
result for Béyoncé is here:
http://stash.compciv.org/2017/spotify-beyonce-related-artists.json
Download (and cache) that file, then parse/deserialize the text data, then write the foo_
functions to satisfy the following prompts:
2. List artists’ names with number of followers, rank by followers¶
Expected output:
[['name', 'followers'],
['Rihanna', 8792360],
['Chris Brown', 3832426],
['Justin Timberlake', 2853318],
['Jennifer Lopez', 1493872],
['Alicia Keys', 1464533],
['Christina Aguilera', 1168887],
['Mariah Carey', 1106762],
['Whitney Houston', 903522],
['Ciara', 803715],
["Destiny's Child", 634049],
['Fergie', 539632],
['Mary J. Blige', 536947],
['Kelly Rowland', 479842],
['The Pussycat Dolls', 414452],
['TLC', 387450],
['Ashanti', 245022],
['Keri Hilson', 221302],
['Jennifer Hudson', 195102],
['Cassie', 140344],
['Kelis', 115659]]
Background¶
For a more thorough exploration of the Spotify API (albeit from the Command Line), check out my guide from a couple years back: http://www.compciv.org/recipes/data/touring-the-spotify-api/
The actual URL for the live endpoint of related-artists for Béyoncé can be found here:
https://api.spotify.com/v1/artists/6vWDO969PvNqNYHIOW5v0m/related-artists
Answers (Partial)¶
I like to start off by writing a function that does the downloading and deserializing:
import json
from os import makedirs
from os.path import exists, join, basename
import requests
SRC_URL = 'http://stash.compciv.org/2017/spotify-beyonce-related-artists.json'
DATA_DIR = 'data-files'
DATA_FNAME = join(DATA_DIR, basename(SRC_URL))
def bootstrap_data():
""" returns serialized object"""
if exists(DATA_FNAME):
rawjson = open(DATA_FNAME).read()
else:
r = requests.get(SRC_URL)
makedirs(DATA_DIR, exist_ok=True)
with open(DATA_FNAME, 'w') as f:
f.write(r.text)
rawjson = r.text
return json.loads(rawjson)
The above script is a result of my desire for some organization. I only want to download the source data once. And I want that data file to be saved in a subdirectory:
data-files/spotify-beyonce-related-artists.json
However, you may not care about that. In any case, if you’re going to use my code above, make sure you know what each line does, e.g. basename(SRC_URL)
.
After you’ve created a bootstrap_data()
function that does all the data grabbing/parsing work, each subsequent function that needs the data can just call bootstrap_data()
:
def foo_1():
"""
Return a list of artist names, from the list of artists most related to Béyoncé, according to the Spotify API, sorted by popularity
"""
data = bootstrap_data()
names = []
for a in data['artists']:
names.append(a['name'])
return names
Or, if you prefer brevity:
def foo_1():
return [a['name'] for a in bootstrap_data()['artists']]