Kevin Schaul

Hacker journalist

April 18, 2013
by Kevin
12 Comments

Introducing: Binify

Nothing tickles my fancy more than a good mapping technique, and the recent L.A. Times’s 911 response map did just that. The novel idea they brought to news mapping: hexagon binning.

Dot density maps are hard

A successful dot density map requires a specific data set. The points must be dense enough to be interesting, but not too crowded so as to overlap each other. Maps with multiple zoom levels must perform magic to correctly display points with optimal sparsity. Perceptually, dots indicate data at a specific location, which doesn’t bode well for census-block level datasets.

Hexagon binning can alleviate these issues. In hexagon binning, a grid of a hexagons is placed over the extent of the points, and the number of points intersecting each shape is saved with the grid. The grid can then be visualized based on this accumulation, enabling better comparisons between dense areas of a dataset. Since there are no points, there is no overlapping. The locations of the individual points are replaced by a less fine-grain grid, revealing the interesting  and many times more pertinent  trend data.

Many dot density maps suffer from crowding of points. Binify uses hexagon binning to alleviate this pain and better display trends.

Many dot density maps suffer from crowding of points.
Binify uses hexagon binning to alleviate this pain and better display trends.

Performing hexagon binning on deadline isn’t easy. Mmqgis, a useful QGIS plugin, can help with the step of creating the grid, but it requires using a GUI and is finicky. It can’t easily be automated. And it certainly can’t end up in a Makefile, as we’d prefer all our data manipulation to.

Introducing: Binify  A command-line tool to better visualize crowded dot density maps. (That’s bin-i-FY, for you phonetics.)

Binify takes all the meticulous guesswork out of hexagon binning. Simply give the program a point shapefile, and it’ll output a calculated hexagon grid version of the data ready to be visualized.

Binify is available in the Python Package Index (PyPI) for simple installation. To get started, follow the instructions on GitHub. I built the tool with the simplicity to be used for exploratory analysis, and with enough customization to cover all needs. (Of course, it’s work in progress. If you have an idea, please open an issue on GitHub.)

While hexagon binning is not the ultimate solution for every dataset, it’s a viable option for many. I hope you’ll find Binify as useful as I already have.

March 25, 2013
by Kevin
0 comments

Journalism, with a side of math.

This post was first published at journalists.org.

There’s nothing less funny than listening to a journalism professor joking that we’re all in this field because we can’t do math. Some of the best journalism being done today only exists because journalists overcame their fear of numbers and dug deep into the data.

Take the L.A. Times’s series on 911 response times. An analysis found stark disparities in the response times of emergency vehicles, and it produced journalism with real impact. Not bad for a bit of math.

Or look at USA Today’s diversity index. Journalist Phil Meyer and Paul Overberg, the paper’s database editor, invented a way tonumerically compare racial and ethnic diversity — no small feat back in 2001. This index opened up a wealth of unreported stories, and gave measurable evidence to those we only believed anecdotally.

You might say, “But these are both CAR stories,” and you’d be right. But we are all computer-assisted reporters. The moment a news organization sheds the backwards thought that only a select few can understand data, millions of uncovered stories will be discovered.

In a recent talk, Ryan Pitts, senior editor for digital media at The Spokesman-Review, and Jeremy Bowers, news applications developer at NPR, walked through how indexes like USA Today’s can be created to fit your beat. How does this business tax proposal compare to previous laws? Create an index for it. Which college is the most cost-effective for students? Ditto. If Nate Silver of the New York Times has taught us anything, it’s that people trust data over a reporter’s intuition.

Of course, not all questions can be answered with indexes. Statistics provides the tools to figure out information we haven’t even considered yet. The more mathematically savvy journalists you have in your newsroom, the more groundbreaking journalism your company will produce. (And yes, that could be quantified.)

Recently Chase Davis, new assistant editor of interactive news at the New York Times, explained five algorithms with huge potential that journalists have not yet explored. Want to know which politicians in your state are the most similar? Run a nearest neighboranalysis. Need to classify thousands of bills into clean categories? Try a random forest algorithm, and let the robots do the work for you.

The more we become familiar with these sorts of solutions, the more stories we can pitch. Editors love reporters who find new angles on our world, and data-driven work is no exception.

And here’s the best part: The hard work has already been done for us. Open source toolsexist to ease the computation efforts of these statistical models. We’re on the brink of using technology to better understand subjects as complex as campaign finance and how elections are won. Mathematicians have produced loads of information analysis techniques that are just waiting to be taken advantage of by journalists. We don’t need to be experts in statistics to find answers to our interesting questions.* We just have to get over our fear of numbers.

If you can say “algorithm” with a straight face, I’ll bet there’s a job out there for you. And don’t let your journalism professors get away with their cheap math jokes. The times have changed.

*Of course, it’s easy to lie with data. Be sure to run your work by someone who does know what they’re talking about before publishing.

March 2, 2013
by Kevin
0 comments

On IE and doing awesomeness

Questioned on The New York Times’s use of D3 for graphics, even though IE 7 and 8 do not support it, Amanda Cox gave this response:

 

 

I can’t wait to work with these people.

An example use of Box Chart Maker

February 19, 2013
by Kevin
0 comments

Tutorial: Create simple graphics with Box Chart Maker

As part of my AP-Google journalism in technology scholarship, I developed a tool to help journalists create simple graphics for online. I call it Box Chart Maker. I’ll walk through the creation of a chart using the tool.

The first step to creating compelling graphics is to find interesting data. Box Chart Maker creates a very versatile type of chart that can represent almost any story involving numbers (so, yes, that’s almost every story ever, in some way). I’ll use last week’s vote to proceed on the confirmation of Chuck Hagel as Defense Secretary.

The interface of Box Chart MakerNow that we have some data, go to the Box Chart Maker site. Here, you’ll find an example graphic already created. That was easy! If you’re representing “Data title” and have 36 items, you’re done. Otherwise, we’ll want to customize these options.

Let’s start with the “Yes” votes. The motion won 58 yeas, so let’s represent that in our first chart. Fill out the form with the correct information (58 boxes, “Yes votes” as the label, colors as you see fit). Under advanced options, change the ID to “yes_votes” or similar. This will allow multiple charts on the same page. When the options look good, hit the update button, and you’ll see your chart appear in the preview area.

If you’re happy with the chart, click on “Show embed code” under the preview button. A text box will appear containing all the html/css code that represents your chart. Copy that, and throw it in a fresh text document.

Embed code

Now, do the same for the “No” votes. When you’re happy, copy the code and put it under the “Yes” votes code. Save the file as an html file, and open it up in your browser. Here’s what mine looks like:

Raw output of Box Chart Maker

You’re done! Of course, you’d want to add context to the chart, such as that the motion required 60 votes to succeed. But for absolutely no hand-coding, that’s not a bad graphic. Paste the code directly into your blog or CMS, and you’ll have a nice web-friendly graphic to help explain your story. Not bad for a few minutes of work.

Pro tip: If you’ve got more time and some html/css know-how, it’s simple to enhance the output of Box Chart Maker. Here’s what I came up with after a few more minutes of tweaking:

Edited output of Box Chart Maker

Publish!

February 4, 2013
by Kevin
0 comments

Data-driven opinion

In yesterday’s New York Times, I ran across an unusual opinion piece on gerrymandering. Sam Wang, the column’s author, made his argument with data.

“Using statistical tools that are common in fields like my own, neuroscience, I have found strong evidence that this historic aberration arises from partisan disenfranchisement,” Wang said. A graphic ran alongside the piece further explaining Wang’s statistical findings.

Of course, statistics (in traditional and visual forms) can be a wonderful tool to removing emotion from reality and giving us a raw version of what’s really happening. But we also know that statistics and visualizations can be used to mislead.

So, do statistics and graphics belong in the opinion section?

My initial reaction is that they absolutely do. The public already takes an extremely critical view of anything published by a news company. Readers should treat a data-driven piece no differently. Opinion writers have the duty to promote worthwhile discussion and encourage change, and the best do this by admitting anything worth debating is not black-and-white. Statistical methods are powerful and must be used to enhance this discussion, not to alter truth.

I see it as an ethical duty of opinion writers to be fair about their data, just as they shouldn’t mislead with anything they write. For complete clarity, I would have liked to see both Wang and graphics editor Bill Marsh publish their methodology in a related post.

The technique is young, but it isn’t likely to go away. I’d love more discussion on this.