Research project on how FOSS project sustainability is affected by mismatched upstream/downstream conceptualizations
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

14 KiB

Research protocol outline

Describe the research problem(s) your project addresses.

How do mismatched conceptualizations between upstream maintainers and downstream users of a Free and Open Source (FOSS) digital infrastructure project interact to affect the community health and thus sustainability of such projects?

More specifically: how do developers who maintain commonly-used FOSS projects compare to developers who use those same projects in terms of how they conceptualize:

  1. ontologies of an ideal, well-maintained and sustainable FOSS project community, i.e. which elements are present, how they relate, etc.
  2. which parties are responsible for which elements, and
  3. the state of that FOSS project relative to their ontologies of ideal projects?

Describe expected benefits to subjects and/or knowledge to be gained from your project.

In order to address systems-level problems together, we must first have shared conceptualizations of what those systems and their problems might be. Unknowingly holding different conceptualizations can lead to problems being overlooked, dismissed, or addressed with the wrong resources. Mismatches and miscommunications are common between FOSS community "upstreams" and the corporations/government agencies/NGO "downstreams" that rely on their products, hindering work on both ends.

These communication issues stem in part from fundamentally different ontologies, or conceptions of reality, regarding "how FOSS communities work." These are not simply disagreements over how project communities are doing relative to a shared scale, or even what those shared scales might be, but completely disjoint conceptualizations of what FOSS project communities are and how they operate in the first place.

This project investigates how the diversity of conceptualizations both helps and hinders efforts to improve FOSS community health. Studying these interacting ontologies can inform how upstreams and downstreams develop shared understandings of what improvements are needed, and can also make-visible what inefficiencies still require intervention after conceptualization and communication gaps have been addressed.

Developing this area of knowledge is of general interest to the software development community and to the many organizations and companies that rely on FOSS infrastructure. Additionally, individual subjects/participants will gain insight into their particular ontologies and the ontologies of their specific project community, which may improve their work.

Describe the population sample for your project.

How many subjects will participate in this project?

We are seeking six (6) participants for the project. We need 3 participants from each of 2 categories: 3 upstream developer-maintainers and 3 downstream developer-users.

How will these subjects be identified and selected for participation?

Subjects will be identified via response to email and word of mouth recruitment and screened by email to see if they fit the inclusion criteria.

As part of the eligibility screening process, we will also ask participants to email us a few sentences about why they are interested in participating in the project, and to (optionally) point us to a link where we can see some of their public-facing FOSS work related to the project. Having highly visible online portfolios is a common practice among FOSS contributors. See “Recruitment email text."

The final pool of subjects will be selected by the research team from the pool of eligible participants. One important selection criteria will be the ability to communicate both technical and non-technical information clearly, as gauged from the email sample and their public-facing portfolio. We will also be selecting the final pool to have a diversity of perspectives as much as possible (project role and seniority, age, location, ethnicity, gender, etc.)

Describe the rationale for inclusion or exclusion of any subpopulation.

The inclusion criteria for the study are as follows:

  1. Participant in the selected FOSS project. Rationale: we are studying how perspectival differences appear within the same FOSS project community, so all participants must be involved with that specific community.
  2. Age 18 or over. Rationale: We want to study adults who are able to consent to their own participation in the study; the vast majority of FOSS contributors are age 18 or over.
  3. Able to comfortably communicate in spoken and/or written English. Rationale: interviews will be conducted in English, which is the working language of the majority of FOSS projects as well as the native language of the investigative team.

How will you recruit subjects?

Subjects will be recruited in a number of ways:

  • email solicitation (see “Recruitment email text”)
    • to the selected project community via public mailing lists
    • to leaders of relevant project subteams via their publicly available contact information
    • to individuals who express interest
  • word of mouth, with participants and potential participants able to tell others about the project and forward emails, etc. if they so choose.

If respondents meet the eligibility requirements, we will contact them via email and attaching the recruitment/screening and consent forms to that email (see “Recruitment Email Text” and “Consent Form").

Describe any incentives for participation you plan to use.

Each subject will be paid $50 for each of 3 interviews, or a total of $150 each. Payment in full will be rendered upon successful completion of the 3rd interview.

Describe the data collection process.

Data collection will be neither anonymous nor confidential.

Describe your procedures for ensuring anonymity and/or confidentiality

Not applicable, as the data generated will be neither anonymous nor confidential; a fully identifiable public dataset is an integral part of this research methodology. Part of the activity that participants consent to is the recording of interviews that will become part of an open data corpus. In other words, their transcripts will appear with their names attached, and the dataset will be publicly available on the web under an open license. Participants who do not consent to this will not be enrolled in the study.

Data will go through an editing and approval process before being deposited into the open dataset. Specifically, at the end of each interview, the transcript copyright will be fully assigned to the participant (see “Copyright transfer prompt” in appendix). Legally, this means the participant will then have full control over editing and usage of their data; the research team is not allowed to use, post, etc. the transcript without the participant’s approval.

Once participants have edited (possibly with researcher help) and approved their transcript, they can release it under an open content license (creative commons or similar). This allows the public, including the research team, to use the data in ways specified by the licence (typically with the requirement that the creator be cited, for instance). At this point, the transcript is considered part of a publicly available dataset and may be analyzed by other researchers as such.

How much time is required of each subject?

Each subject will participate in three (3) interviews of approximately one hour in length each. Additionally, subjects will be asked to review and approve their interview transcripts and make any edits they wish before releasing it for inclusion in the open dataset for analysis and publication; we expect this to take no longer than an additional hour per interview. (Note: in past studies using this methodology, participants were typically able to approve their transcripts in 5 minutes or less).

The total estimated time required of each subject is therefore 6 hours each, but is likely to be much closer to 3 hours each.

What methods, instruments, techniques, and/or other sources of material will you use to gather data from human subjects?

Data will be collected via semi-structured individual interviews. See “Interview protocol” in the appendix.

Interviews will typically be conducted remotely via the participant’s preferred phone or videochat platform (Skype, Facetime, Zoom, etc.) If participants happen to be at the same location as the researcher and wish to do an interview in person, this will also be offered as an option (the protocol remains the same). Regardless of modality, all interviews will be transcribed by a professional real-time transcription service.

During the interviews, participants may reference public online resources from software projects they and others have contributed to. Since these items are already public at the time of the interview (i.e. a blog post, a website, etc.), we may also use them as sources of material in our analysis (ex: if a participant mentions a blog post, we may draw from that blog post as well).

Potential risks to subjects

There are no risks to subjects beyond minimal risk (see 8b). Inasmuch as there is minimal risk involved, it is primarily social (i.e. reputation-based).

Assess their likelihood and seriousness to subjects:

Participation in this research is minimal risk, no greater than everyday activities. FOSS project participants already do most of their work and discussion of their work in public and in an identifiable way (that is, FOSS community participants post their work publicly under their own names), so the public nature of this project is in keeping with FOSS cultural norms. Indeed, to do this study in a different (non-transparent) way would be considered a cultural oddity.

Discuss the potential benefits of the research to the population from which your subjects are drawn:

Potential benefits of this research include building a deeper understanding of participant conceptualizations of FOSS community dynamics. Members of these FOSS communities will gain more knowledge of the different ontologies being used by project stakeholders and have the chance to reflect on their communication choices and ways to possibly reduce miscommunications within a project. Additionally, researchers and organizations working with these populations will be able to make better decisions about resourcing community development using the findings from this research.

Discuss why the risks to subjects are reasonable in relation to the anticipated benefits to subjects and others, or in relation to the importance of the knowledge to be gained as a result of the proposed research:

Risks are no higher than everyday activities, and are largely mitigated by participants having control of the conversations they participate in and the questions they choose to answer or not answer, as well as the final version of their interview transcripts that will be published and analyzed. The benefits include a greater understanding of FOSS community conceptualizations, as elaborated in (8c) above. Risks are therefore reasonable in relation to anticipated benefits.

Describe the planned procedures for protecting against or minimizing potential risks, including risks to confidentiality, and assess their likely effectiveness:

Risk is minimized by giving subjects full control over editing and approving their transcripts for open-licensing before the analysis process. Due to the copyright transfer process described in part (6c), the research team will have no legal ability to use the data (since the subjects will have full ownership of copyright) until it is released with the approval of the subject.

Because of this, no analysis or publication will proceed after the interview until the subject has provided the transcript version (edited or otherwise) they would like to have published online. Subjects may also withdraw transcripts entirely from the study at any time before their publication under an open license.

What information will be provided to prospective subjects?

Subjects will receive information about the research project’s objectives (to understand FOSS project conceptualizations among participants) and the people conducting it (researcher and IRB contact information, etc.)

Subjects will also receive information on the tasks they will do (be interviewed) and an explanation that participation is voluntary, they can stop at any time, they will not be penalized for withdrawing, etc. as well as the potential benefits of the project to both them and society. This information will be made available in written English, and researchers will be available for conversations/questions upon participant request.

During the recruitment process (via email), potential subjects can ask questions, get answers, and talk with us about any aspect of the study before signing the consent form (which constitutes enrollment in the study).

Before the start of the first interview session, we will review the consent procedures and the study with subjects to give them an additional opportunity to ask questions. Participants will be reminded at the start of the interview that they are able to stop participating in the study at any time, and asked if they want to proceed before doing so (if consent is obtained).

Additionally, as previously described, participants will have full legal ownership of their interview transcripts (via copyright transfer). It is therefore entirely their decision whether and how to edit and release it into the dataset. The research team cannot analyze or otherwise use the data until the participants release it.

Written consent via formal documentation (paper or digital signature of the consent form) will be obtained before interviews begin.

Verbal (or in the case of typed interviews, written) consent will be given again informally at the beginning of each interview. The researcher will ask participant permission before starting to record (ex: “Is it okay for me to start saving the transcript after this point?”) and the consent of the participant will be recorded in the transcript as the start of the saved transcript.