Day 14: Thursday, February 23, 2017 - Face the APIs

OK, one more day of fun-with-JSON, and downloading, opening, reading, closing files. But now we’ll do it with Amazon APIs.

AWS, Access, and APIs

Today, we’ll go over how to log into to Amazon Web Services and test out a few of the front-facing APIs, including:

I should be emailing you all in class with your keys/authentication, and we’ll walk through the process of logging in, making calls to the AWS API, and just generally playing around. It is yet another example of dealing with serialized/deserialization of data, and tediously figuring out what data we want, and how, nevermind doing something useful with it.

‘D’ is for data, deserialization, and death

Bummed that all we’ve done is download/open/close/read text files? I’m not going to argue that working with files is fun. In fact, it’s one of the most painful things of any project, but exactly the kind of boring thing that a computer is perfectly suited for.

But the bigger questions are: Why files? Why text? Why 1 and 0 and True and False and other simple values?

The immediate answer is: because that’s our best interfaces with computers, which otherwise think in 1s and 0s. If your goal in computational journalism is to find and expose and stop corruption/evil/inequity, then you will have to deal with the reality that those aren’t self-evident when the information (the data, or what have you) is fundamentally just 1s and 0s.

The related question that we take for granted: Why do we write? Not just code, but anything? Why can’t our societies’ laws and traditions be communicated through oral tradition? Why isn’t the beauty and power of Shakespeare’s works as it was performed centuries ago enough of a contribution to human civilization? Obviously, because in-person experiences and words being shouted out don’t persist. They aren’t portable or sharable, and when they are, they don’t usually keep their integrity.

It’s not wrong to see writing as merely the ability to make dark marks on cave walls/parchment/paper. Of course there’s more to it than that, if you know how to write, and you’ve ever committed your experiences to paper, and then had the experience of someone having their own interpretation


Tangential reading: I realize I keep saying the word “cache” and will increasingly do so as we deal with remote data sources. Knowing that “cache invalidation” is one of computer science’s hard problems probably isn’t the most practical bit of insight. So here’s a nice article from Wired explaining how cache design/implementation has similar concerns as properly organizing your closet:

For the purposes of this class and its assignments, when I talk about creating a “cache”

About the project

If you’re worried about the projects, don’t be. They’re just a way to do your own thing with what you’ve learned. It should feel easier and more interesting – it’s generally much easier to program when you know what you want to accomplish.

Another thing that should ease the stress: you only have to program the tasks and features that you know exactly how to do.

So for most people, I just assume this includes:

  • Sentiment analysis.
  • Natural language processing.
  • Creating structured data from unstructured text.
  • Extracting meaning from binary data, e.g. converting handwritten/photocopied text to actual text values.
  • Non-supervised machine learning. Or any machine learning,r eally.
  • Statistical learning, bayes analysis, or, really, any stats beyond calculating a percentage.
  • Detection of fake news.
  • Parallel/asynchronous/distributed programming.
  • Mediocre-level error handling.
  • Managing datasets too big to store on your computer.
  • Pretty much all of web design/development, other than understanding that HTMl is just a text format.

Things that you either know how to do, or, just as good, know are completely possible for you to figure out and do:

  • How to use the Tab key, because your brain is not meant to be wasted on preventing typos or memorizing arcane filenames.
  • How to use iPython, and its interactive help features (such as help())
  • How to use the type() function
  • Writing code as if you’re going to spend 10x more time reading/editing than writing it.
  • How to use a function to wrap up a block of code, like variables wrap up hard-to-express values.
  • There’s no such thing as “close enough” when it comes to capitalization, punctuation, spelling, and the quoting of values.
  • How to visit a URL in your browser or in a program
  • How to describe what you want to find in text as a pattern (i.e. regular expressions)
  • How to open a file
  • How to read a file’s contents
  • How to make new folders and files.
  • How to count things in a collection
  • How to turn raw text into data structures using Python’s libraries
  • How to filter/sort/select data.
  • How to find and read documentation for libraries and APIs.
  • How to extract data from a dictionary or list.
  • How to use a for-loop to do the same task, over and over.
  • How to define branches/alternative routes of behavior with conditional statements.
  • Where to find in an error message the exact line and type of problem.
  • How to use a text editor, including saving and opening of files.
  • How to run a program.
  • How to see if a file by a filename

Prospective projects:

  • A bot
  • Likely a second bot (as a variation)
  • A web application

Sorry, bottleneck is thinking of easiest route for “publishing” these projects, i.e. via your own remote server. Also, still debating on whether a web application should actually deliver web pages (i.e. requiring you to learn some HTML and care about design). Example user-facing/friendly web-project (Flask, Python) from previous years.

Don’t focus on the design or interface as much as the story that is being told, more specifically, the filtering of facts from the raw data, and which facts and angles have been prioritized. Part/much of journalism is being able to take the same facts that have been covered elsewhere and still be confident

Inmate deaths in California by Reade Levinson:

Single-page projects by Saurabh Datar: