Big data in education research

I am a full supporter of using big data (i.e., large longitudinally-maintained, individual-level administrative datasets) in education research as it is the lifeblood of the most effective types of research and program evaluation. That said, any data that are collected and used for research purposes should always be done with a very clear description of what is being collected, how frequently, for what purposes, and for whose use. I do not think parents (and/or students) would necessarily balk at de-identified data used to determine the efficacy of school, district, or vendor educational initiatives, but they might be more untrusting of the same data used for market research.

I often like to say that access to these data is a privilege, not a right. While some may disagree with me, as a consumer and maintainer of large data sets over the years, I find myself more empathetic to the data sharer’s point of view. I believe we should always treat the willingness to share data with the respect it deserves, for education agencies have the right to deny access to individual records just as much as the individuals do if they feel the researcher might use or store data inappropriately. In addition, a sense of entitlement certainly does not help to build trusting relationships when trying to negotiate a data sharing agreement.

Researchers and schools must always provide parents/students with very clear options for how to opt in or opt out of having their data used for research purposes. This should include reminders – on a regular basis or study-by-study – to give families the opportunity to change their minds and/or be selective about the use of their data. There is simply no substitute for a solid informed consent/assent process to alleviate concerns and promote a trusting relationship between schools, families, and researchers. Once shared, researchers must follow up with appropriate technical solutions to ensure the security of said data while in their possession, and inform the originating agency of any changes in data management plans. Data security standards should always be in line with the expectations of originating agencies, but also should at least include plans for regular backups, data masking, differential access, and data destruction1.

Ultimately, the burden is – and should be – on the researcher to prove that he or she will handle data collected with care and use it for stated purposes only.

– Michael Scuello, Senior Associate and IRB Chair

1For more information about best practices in data security, consult the Privacy Technical Assistance Center (PTAC) at the US Department of Education (ptac.ed.gov).