back to notes

Comments to How 5 Data Dynamos Do Their Jobs

I’d like someone to go through the tax data and find out what happened to all the accountants before and after Wang Spreadsheet, Lotus123, and Excel were released. What happened to their earnings, their retirement age, the number of children they had, did they change careers, what happened to the total number of them. Things like that. And interview some of them. People could put other suggestions for stories in the comments here. A little crowdsourcing of ideas for The NY Times.
====================================
Your investigative teams need data scientists who can both accumulate data and then formulate and test hypotheses. You describe a situation in which, on gun reporting, you have conditional statements with 15 Levels of IF, Then. May I humbly suggest that you have created a very complex sieve for your raw data, but the number of possible classifications is quite high. If each level of IFT could only result in 2 possibilities you would still have 2 ^15 or 32,768 categories or classifications.
At some point you must translate your math marvels into a compelling story with a narrative arc. Don’t let the math over complicate. Trust some of the other recommendations.
====================================
Spreadsheet or not, the main thing is finding help for exactly what you want to do. It's exhilarating and scary to type a complaint about what you want to do in a search engine, and out pops a discussion board or blog entry where some kind user explains exactly that, most often e.g., on https://stackoverflow.com/questions/tagged/excel.

Sharing problems and solutions (including failed solutions) is truly revolutionary. It drives discovery in science and distribution in social well-being (ok, that was awkward, but you know...). It should be treated like roads and communications, as essential infrastructure to promote and incentivize. For now we are left only with inspiration from stories.
===================================
if a journalist, or anyone for that matter trying to tease an answer out of that data will tell you, the data has to be used both to support and challenge the hypothesis.

In trying to answer the question of why “only seven black students won seats at Stuyvesant, New York City’s most elite public high school” the data seems to have been used only in support of a preconceived answer that it was due to “the rise of the local test preparation industry.”

I have to wonder, did the data journalists even try to come up with any alternate hypothesis?
===================================
It increases my appreciation of your reporting to read about the methods you use to organise data and uncover stories, and then revisit those stories with that new insight.


last updated june 2019