Managing research data: A beginner’s guide

Data management is a core aspect of the research process, helping scholars organize their work, share it with others, and collaborate effectively. But data management often comes as an afterthought in the research process, perhaps because so few researchers receive formal training in how to do it. As Canada’s national funding bodies prepare to launch a new policy “promoting sound data management and data stewardship practices,” federally funded grant holders are going to have to brush up on their data management skills—and brush up fast. 

“You essentially won’t get funding if your research proposal doesn’t address how you will be storing and managing your data,” says Lina Harper, a student policy analyst at the Social Sciences and Humanities Research Council (SSHRC) and a researcher on the ScholCommLab’s Meaningful Data Counts (MDC) project. “Also, it’s tied to the larger trend of open science—opening and publishing your data means that your work can be verified, and that your scientific output is transparent and reproducible. If you want to be a researcher and a scholar, then research data management is going to be your new reality.”

But what should a research data management plan actually look like? And what does it take to create one? In this post, Lina offers guidance from her own experiences crafting a Research Data Management Plan (RDMP) for the MDC project, as well her master’s research on digital humanities scholars and their data reuse practices.

A woman enters research data into a laptop
Research data management plans (RDMP) are a relatively new but increasingly important scholarly research product

What is a Research Data Management Plan? 

“The Research Data Management Plan is a relatively new product in the scholarly research landscape,” Lina explains. “We’re still taking baby steps, especially the Canadian community.” The novelty of this product—as well as the wide range of data collection methods and research practices that scholars use—means that RDMPs can take many shapes and forms. At the core of any RDMP, however, are two basic components: research data, “the data that researchers collect as they embark on scholarly research,” and management, “the process of making your research data transparent and reproducible.” Simply put, an RDMP is a document that helps researchers navigate that process by addressing the key ethical, legal, and technical considerations involved in their project.

To create an RDMP, researchers must think through questions like: What kind of data will we collect? Where will we store it? How will we coordinate as a team? Do we have permissions to access all of the resources we’ll need? By laying out a clear plan for how to tackle these and other potential hurdles in the research process, RDMPs give research teams a common language and shared strategy for ensuring their project will succeed. 

While this may sound like a lot of additional work, Lina says many of these steps likely already take place informally in most research projects. “The RDMP verbalizes what’s already in your head, the conversations that you are having internally with the PIs, the co-PIs, the research associates,” Lina explains. “In French we say concrétiser—it’s what makes something tangible,  translates the knowledge from your brain to a record.”

“The RDMP verbalizes what’s already in your head, the conversations that you are having internally with the PIs, the co-PIs, the research associates… In French we say concrétiser—it’s what makes something tangible,  translates the knowledge from your brain to a record.”

Lina Harper

Why Research Data Management Plans matter

At the most basic level, RDMPs help researchers stay organized and coordinated throughout the research process. By laying out a clear plan of action, they make it possible to address potential challenges and snags before they come up, and ensure that project work flows are organized in a way that will work well for the entire team. But RDMPs, Lina says, offer other important benefits too. 

First, RDMPs provide a clear outline of how research data was collected, manipulated, and stored. This information—when paired with open scholarship practices, like open data—makes it possible for peer reviewers and other researchers to evaluate study findings. “Especially given ongoing concerns about research replicability, making data easy to access and use is a step in the right direction,” Lina says. 

In addition, any good RDMP will include a strategy for how to maintain data in the long term, which can help other scholars build on and extend earlier work. Studies have found that secondary analyses of existing data are becoming more common, offering new insights into important issues like health and poverty. “There’s lots of evidence that data sharing and open access and open data are good things,” Lina says. “Even just taking a slightly different perspective or asking a different research question using the same data can reveal interesting new things.” Creating an RDMP is an important first step towards making that possible. 

How to create your first Research Data Management Plan 

So what does it take to craft a strong RDMP? Here are a few tips to point you in the right direction: 

A friendly librarian seeks a book among the stacks
When it comes to data management, scholarly data librarians are some of the most helpful people around

Tip #1. Talk to your librarian!

“Never underestimate the awesomeness of your library or your scholarly data librarian,” Lina says. These highly specialized librarians are experts in all aspects of data management, and can be a rich resource as you create your RDMP.

Data librarians can go by different names and titles, like Data Services Librarian, Scholarly Communications Librarian, Research Data Management Librarian, or Data Librarian. If you’re at a university, there will likely be a library staff member with one of these titles who can support you. But there are other options too: “If you don’t have one at your library, go on Twitter and reach out,” says Lina. “There’s a community of librarians who are excited about open science and good research data management. They are passionate about accessibility, transparency, and data reuse — and they want to talk about it.” 

“There’s a community of librarians who are excited about open science and good research data management. They are passionate about accessibility, transparency, and data reuse — and they want to talk about it.”

Lina Harper

Tip #2. See data management as core to your project 

“As a researcher, whether PI, co-PI or a research assistant, you should try to view the RDMP planning process as part of your overall project management,” says Lina. “Start thinking about it on day one.” By considering data management questions at the outset of your project, you’ll be better positioned for success at every other stage, from writing up the methods to publishing the paper. Plus, it means you’ll have more time to work through potential data management challenges, rather than having to troubleshoot in the middle of your study. 

For the Meaningful Data Counts project, the RDMP was built into the larger project plan from the beginning. The research team met regularly to work on the RDMP together, with all members sharing responsibility in chairing and organizing meetings and writing the document collaboratively. Throughout, Lina worked closely on developing DMP sections with another research assistant, Erica Morissette, with feedback from PI Stefanie Haustein, Co-PI Isabella Peters, and collaborator Felicity Tayler, the University of Ottawa’s RDM Librarian. Felicity’s expertise and editorial guidance helped ensure that the RDMP would be useful to the project team, and also be recognized and promoted as a DMP Exemplar in the national RDM training resources published by the Portage Network.

Runners set off for a race
Remember that research data management is a marathon, not a sprint!

Tip #3. Plan for the long run 

Lina emphasizes the importance of considering succession plans as you craft your RDMP. Who’s going to take over the data management if the lead researcher is no longer able to do so? What happens if the team member overseeing data collection leaves academia, becomes ill, or passes away? “It’s a hard question to ask yourself or to ask others,” Lina says. “But you might want to think about it. Your data is part of your legacy as a scholar.” 

“Your data is part of your legacy as a scholar.”

Lina Harper

Tip #4. Edit collaboratively and iteratively

“Find a way to collaborate on the same document,” says Lina. “Don’t do too much versioning—labelling drafts v1, v2, and v3—because you’ll get bogged down. People will have the wrong link to the document, and things will get lost.” 

Using version control resources, like the UCSD Library’s Version Control Guide, can help not only for the RDMP but also for managing versions of research data itself, especially when paired with clear communication channels. For the MDC project, the team used Google Docs to craft their RDMP. This mode of communication, along with web meetings and quick questions over Slack, allowed them to collaborate and make changes to their plan with minimal coordination effort. Following the team’s open science approach, the RDMP is now published on Zenodo, so that other researchers can reuse and remix it.

Working in a collaborative workspace also helped the team see their plan as a “living document” rather than a static resource. “You’re going to have to be revisiting the plan all the time,” Lina explains. Maybe your data storage platform changes its terms and services, or you end up collecting a form of data you haven’t planned for. Using a document editing software that allows for continuous change can help your team pivot quickly. 

A colourful pile of lego blocks
Breaking your RDMP into manageable pieces is a great way to make the planning process less intimidating.

Tip #5. Break it down into smaller chunks and use existing templates

“Take it step by step,” Lina suggests. “While we were working on our RDMP, both me and the other research assistant had full-time jobs, during a pandemic. We just had to take it slowly and work on it bit by bit.” 

Ideally, your plan should include the following sections: Responsibilities and Resources, Data Collection, Documentation and Metadata, Storage and Backup, Data Sharing and Reuse, and Ethical and Legal Compliance. Tackling each one in turn is a simple way to make the workload more manageable—especially if you have help: “People think it’s going to be a ton of work, but there are so many wizards and DMP assistants out there that prompt you about the sections.” While there are lots of options to choose from, free resources like Portage’s DMP Assistant and the DCC’s Curation Lifecycle Model are a good starting place. 

Tip #6. Make it fun 

Finally, remember that you’ll likely be referencing this plan for months, years, or even decades to come. Put the time in to make it appealing and easy to use. Your entire team will thank you for it. 

“Put things into tables, use different colors, make it jazzy.” Lina suggests. “Format it well, so that when you look at it, it looks nice. Your plan should make you feel excited about your project—whatever that looks like to you.”

“Your plan should make you feel excited about your project—whatever that looks like to you.” 

Lina Harper

A last word on research data management

Of course, managing research data efficiently and ethically is a complex process—one that will look different for every project. The tips in this post are just a few recommendations that helped the MDC team create their RDMP, but there are many, many more. For other suggestions (and helpful examples), you can download the team’s plan on Zenodo, check out the Portage Network’s list of Tools and Resources, or reach out to your institution’s RDM specialist

Learn more about the Meaningful Data Counts project at the ScholCommLab website.