Data is Deep

Over the last few years, I’ve learned a lot about Data. Every scientist is aware that Data is, in its plainest form, numbers and statistics. How to take these a step further, to use Data as a bank for innovation and discovery, is a lot trickier. Here are some lessons that I’ve learned for data exploration and analysis to search for insights and patterns.

1. Analyzing data is too important to be fed through standard pipelines. Instead of going for high-level summaries, plot and visualize intermediate steps to get new perspectives on how things look.
2. No dataset is perfect. Freely explore and embrace datasets that are good-enough.
3. Data doesn’t come with labels marking what is new and exciting. These insights take creativity, fluency in the discipline, and scientific expression to develop hypotheses.
4. Each dataset must be treated differently, with its own considerations, outliers, and analysis approaches. Be flexible and willing to dismiss your initial approach to follow the data wherever it leads.
5. When analyzing what is present in a dataset, it is also crucial to consider what is absent. What were your expectations, and what does it mean if your expectations are violated?
6. Look for new patterns, both visually and statistically, and imagine what might explain it. This is how a hypothesis is born.
7. Different people will discover different things with the same dataset. What insights can you bring to the table?
8. If you’re lost in the data, make a map. It could be a map over time, or over space. I often draw pictures to help with my creative process.
9. With any new dataset, come to terms with its inherent limitations. Decide what types of claims you can and cannot make with a dataset before embarking on exploration, or it won’t be clear what you are looking for.
10. You are never really finished analyzing a dataset. Eventually, you decide to just stop and move on, leaving some things undiscovered. So remember to revisit your data in each new framework you develop.

Leave a comment