Kevin Schaul

Hacker journalist

May 28, 2013
by Kevin
3 Comments

A new job for a new school year

It’s not every day that you get to write your own job description, but the good folks at the Star Tribune let me do just that.

After a short summer gig in New York, I’ll be heading back to the Strib as more than an intern (but still not full-time, so I can finish my degree). Take a look at the job posting we decided on.

Digital news developer

  • Pitch, report on and build standalone interactive editorial content for StarTribune.com and mobile platforms
  • Be available to provide preliminary data and statistical analysis services to reporters and editors (at the discretion of his or her self, and that of his or her editor)
  • Work with developers on large-scale projects with editorial stake, such as live election results
  • Improve mobile experiences by communicating technical advice between designers and developers
  • Develop open source newsroom tools to improve workflow for digital content beyond written articles and traditional multimedia
  • Develop open source software to capture, analyze and display data
  • Build the Star Tribune’s presence in the open source community

If that isn’t a wonderful job, I don’t know what is. I’m in for a wild ride in my next few months.

April 18, 2013
by Kevin
12 Comments

Introducing: Binify

Nothing tickles my fancy more than a good mapping technique, and the recent L.A. Times’s 911 response map did just that. The novel idea they brought to news mapping: hexagon binning.

Dot density maps are hard

A successful dot density map requires a specific data set. The points must be dense enough to be interesting, but not too crowded so as to overlap each other. Maps with multiple zoom levels must perform magic to correctly display points with optimal sparsity. Perceptually, dots indicate data at a specific location, which doesn’t bode well for census-block level datasets.

Hexagon binning can alleviate these issues. In hexagon binning, a grid of a hexagons is placed over the extent of the points, and the number of points intersecting each shape is saved with the grid. The grid can then be visualized based on this accumulation, enabling better comparisons between dense areas of a dataset. Since there are no points, there is no overlapping. The locations of the individual points are replaced by a less fine-grain grid, revealing the interesting  and many times more pertinent  trend data.

Many dot density maps suffer from crowding of points. Binify uses hexagon binning to alleviate this pain and better display trends.

Many dot density maps suffer from crowding of points.
Binify uses hexagon binning to alleviate this pain and better display trends.

Performing hexagon binning on deadline isn’t easy. Mmqgis, a useful QGIS plugin, can help with the step of creating the grid, but it requires using a GUI and is finicky. It can’t easily be automated. And it certainly can’t end up in a Makefile, as we’d prefer all our data manipulation to.

Introducing: Binify  A command-line tool to better visualize crowded dot density maps. (That’s bin-i-FY, for you phonetics.)

Binify takes all the meticulous guesswork out of hexagon binning. Simply give the program a point shapefile, and it’ll output a calculated hexagon grid version of the data ready to be visualized.

Binify is available in the Python Package Index (PyPI) for simple installation. To get started, follow the instructions on GitHub. I built the tool with the simplicity to be used for exploratory analysis, and with enough customization to cover all needs. (Of course, it’s work in progress. If you have an idea, please open an issue on GitHub.)

While hexagon binning is not the ultimate solution for every dataset, it’s a viable option for many. I hope you’ll find Binify as useful as I already have.

March 25, 2013
by Kevin
0 comments

Journalism, with a side of math.

This post was first published at journalists.org.

There’s nothing less funny than listening to a journalism professor joking that we’re all in this field because we can’t do math. Some of the best journalism being done today only exists because journalists overcame their fear of numbers and dug deep into the data.

Take the L.A. Times’s series on 911 response times. An analysis found stark disparities in the response times of emergency vehicles, and it produced journalism with real impact. Not bad for a bit of math.

Or look at USA Today’s diversity index. Journalist Phil Meyer and Paul Overberg, the paper’s database editor, invented a way tonumerically compare racial and ethnic diversity — no small feat back in 2001. This index opened up a wealth of unreported stories, and gave measurable evidence to those we only believed anecdotally.

You might say, “But these are both CAR stories,” and you’d be right. But we are all computer-assisted reporters. The moment a news organization sheds the backwards thought that only a select few can understand data, millions of uncovered stories will be discovered.

In a recent talk, Ryan Pitts, senior editor for digital media at The Spokesman-Review, and Jeremy Bowers, news applications developer at NPR, walked through how indexes like USA Today’s can be created to fit your beat. How does this business tax proposal compare to previous laws? Create an index for it. Which college is the most cost-effective for students? Ditto. If Nate Silver of the New York Times has taught us anything, it’s that people trust data over a reporter’s intuition.

Of course, not all questions can be answered with indexes. Statistics provides the tools to figure out information we haven’t even considered yet. The more mathematically savvy journalists you have in your newsroom, the more groundbreaking journalism your company will produce. (And yes, that could be quantified.)

Recently Chase Davis, new assistant editor of interactive news at the New York Times, explained five algorithms with huge potential that journalists have not yet explored. Want to know which politicians in your state are the most similar? Run a nearest neighboranalysis. Need to classify thousands of bills into clean categories? Try a random forest algorithm, and let the robots do the work for you.

The more we become familiar with these sorts of solutions, the more stories we can pitch. Editors love reporters who find new angles on our world, and data-driven work is no exception.

And here’s the best part: The hard work has already been done for us. Open source toolsexist to ease the computation efforts of these statistical models. We’re on the brink of using technology to better understand subjects as complex as campaign finance and how elections are won. Mathematicians have produced loads of information analysis techniques that are just waiting to be taken advantage of by journalists. We don’t need to be experts in statistics to find answers to our interesting questions.* We just have to get over our fear of numbers.

If you can say “algorithm” with a straight face, I’ll bet there’s a job out there for you. And don’t let your journalism professors get away with their cheap math jokes. The times have changed.

*Of course, it’s easy to lie with data. Be sure to run your work by someone who does know what they’re talking about before publishing.

March 2, 2013
by Kevin
0 comments

On IE and doing awesomeness

Questioned on The New York Times’s use of D3 for graphics, even though IE 7 and 8 do not support it, Amanda Cox gave this response:

 

 

I can’t wait to work with these people.

An example use of Box Chart Maker

February 19, 2013
by Kevin
0 comments

Tutorial: Create simple graphics with Box Chart Maker

As part of my AP-Google journalism in technology scholarship, I developed a tool to help journalists create simple graphics for online. I call it Box Chart Maker. I’ll walk through the creation of a chart using the tool.

The first step to creating compelling graphics is to find interesting data. Box Chart Maker creates a very versatile type of chart that can represent almost any story involving numbers (so, yes, that’s almost every story ever, in some way). I’ll use last week’s vote to proceed on the confirmation of Chuck Hagel as Defense Secretary.

The interface of Box Chart MakerNow that we have some data, go to the Box Chart Maker site. Here, you’ll find an example graphic already created. That was easy! If you’re representing “Data title” and have 36 items, you’re done. Otherwise, we’ll want to customize these options.

Let’s start with the “Yes” votes. The motion won 58 yeas, so let’s represent that in our first chart. Fill out the form with the correct information (58 boxes, “Yes votes” as the label, colors as you see fit). Under advanced options, change the ID to “yes_votes” or similar. This will allow multiple charts on the same page. When the options look good, hit the update button, and you’ll see your chart appear in the preview area.

If you’re happy with the chart, click on “Show embed code” under the preview button. A text box will appear containing all the html/css code that represents your chart. Copy that, and throw it in a fresh text document.

Embed code

Now, do the same for the “No” votes. When you’re happy, copy the code and put it under the “Yes” votes code. Save the file as an html file, and open it up in your browser. Here’s what mine looks like:

Raw output of Box Chart Maker

You’re done! Of course, you’d want to add context to the chart, such as that the motion required 60 votes to succeed. But for absolutely no hand-coding, that’s not a bad graphic. Paste the code directly into your blog or CMS, and you’ll have a nice web-friendly graphic to help explain your story. Not bad for a few minutes of work.

Pro tip: If you’ve got more time and some html/css know-how, it’s simple to enhance the output of Box Chart Maker. Here’s what I came up with after a few more minutes of tweaking:

Edited output of Box Chart Maker

Publish!