Lab 7: Data Collection and Modeling

Section 1: Background

Now that you have completed Project 2 and have experience with modeling and simulating systems, we’d like you to revisit the idea and reflect.

Suppose that Brown is looking to add a new course in the Urban Studies department and has asked you to compile a report on the state of education in New York City. They requested three datasets that provide the most comprehensive and pressing issues for students, parents, educators and policymakers in the New York City public school system. Tim was able to gather a list of ten datasets, but needs your help to narrow this list down to just three.

Section 2: Learning Goals

Data collection and modeling, like we saw in Project 2 (and that we are about to explore in this lab, are never neutral or objective. While data (and models built from it) are often presented as unbiased, cold, hard, facts, they are inherently subjective. Data are collected by researchers or other stakeholders that, consciously or not, have a particular goal in mind. The same holds for those who curate the data over time, or those who use it for modeling or making policy decisions.

For example, in Project 2, we provided you with five example countries but allowed you to choose off a bigger set of countries if you liked. Both we, as a staff, and you, as the programmer, made decisions about the country you wished to represent. You may have chosen the five countries that held a particular interest for you, five countries close together, or five countries that were listed on the handout because they were the most easily available. Whatever the reasoning, picking those five countries centered some and excluded others. If you lived in a country that was going to lose 50% of its crop production due to global warming in the next year, or that would be underwater due to sea-level rise, you would likely be concerned and upset that your home (or factors important to your home) didn’t appear in this model.

All data collection and modeling works in a similar way. We can’t possibly collect every single piece of pertinent data or model every single situation because of time, space, or energy constraints. As Tim likes to say, “The only accurate model is the entire universe.” The goal of this lab is to have you consider various implications of data collection and modeling.

Importantly, we are not arguing that data collection (because of its biases) or modeling exercises (because they can’t account for all variables) are useless! To the contrary, we _are _saying that because of the importance and power of data-gathering and modeling we have an obligation to think deeply and carefully about who we choose to represent and exclude when we engage in these activities.

Notice: Throughout this lab, you will be paired with various classmates. Be prepared to engage and move throughout the room. The answers you submit in the Google Form will count as your attendance and be graded on your participation. You can find the Google Form here.

Take a look at NYC OpenData, a project that aims to collect data and publicize data on various aspects of New York City. The following ten datasets come from the education category on their website:

Remember, the goal is to compile three datasets that provide pertinent information on the most comprehensive and pressing issues for students, parents, educators and policymakers in the New York City public school system. Pay less attention to the years the data were collected and more to the information each dataset conveys.

Task 1: Choose five datasets you think best serve the goal above. Be prepared to articulate your choices and reasoning behind them. Submit this list of five to the Google Form.

Task 2: Pair up with someone in the class, and compare the datasets you chose. Working together, narrow your list down to just three datasets. Disagreements should be resolved charitably; this is the kind of (professional) discussion that data scientists and modelers have. Submit this list of three to the Google Form. Each partner should fill out the form.

Task 3: Final move! You and your partner should find another pair. Working in this group of four, come up with a final list of three datasets to submit as your final proposal. Each group member should fill out the form.

Task 4: Discuss the following reflection questions in your groups. Individually submit your responses to the Google Form.

Section 3: Reflection Questions

Part 1: Projection

Projection is the process of seeing one’s own inner circumstances reflected onto external circumstances. Projection is a very common process for us when filtering and choosing information. In other words, what matters personally to us oftentimes shapes what we view as necessary or important.

Task 5: While reading through the dataset lists, were there particularly datasets you did not consider at all or considered of higher importance than others? Did projection play any role in your decision making?

Part 2: Stakeholders

Take a look at the final datasets you came up with and consider the stakeholders that might be impacted by your set. Who might have been left out and in what ways might they be impacted? For example, a number of schools have 95% of their student populations in poverty. For these students, the school meal program would highly impact their ability to learn and engage in their classes. Thus, including the school meal program in an overview of schools would be of high importance to these students and parents. What are other examples of stakeholders impacted by the data you chose and in what why?

Task 6: Reflect on the stakeholders that might be impacted by the datasets you chose. Who might have been left out and in what ways might they be impacted?

Task 7: Were there any datasets that were left out of the list of ten? (There were many.) Try to think of one or two we missed.