News

Open Source Project Fosters Data Teamwork Best Practices

The Linux Foundation has added a new project to foster and advance best practices for data teamwork, borrowing from the Agile approach to software development.

The foundation this week announced the addition of DataPractices.org, described as "a vendor-neutral community working on the first-ever template for modern data teamwork."

That organization has published data values and principles "that illustrate the most effective, ethical and modern approach to data teamwork" and offered data practices courseware to help developers and other data-related pros understand and follow those guidelines.

Patrick McGarry, head of data.world, which pioneered the Data Practices movement, said the initial effort was patterned as an "Agile for Data" initiative, based on Agile software development practices, with the goal of offering direction and improving literacy in the data-driven ecosystem. As such, the movement started with its own document similar to the seminal Agile Manifesto.

"While the first step was the 'Manifesto for Data Practices' the intent was always to move past that and apply the values and principles to a series of free and open courseware that could benefit anyone who was interested," McGarry said in a Q&A post on The Linux Foundation site published this week.

The DataPractices.org manifesto -- with 35 authors and 1,728 signatories as of this writing -- includes four values and 12 principles.

Values are:

  • Inclusion: Maximize diversity, connectivity and accessibility among data projects, collaborators and outputs.
  • Experimentation: Emphasize continuously iterative testing and data analysis.
  • Accountability: Behave ethically and transparently, fix mistakes quickly and hold ourselves and others accountable.
  • Impact: Prioritize projects with well-defined goals, and design them to achieve measurable, substantive outcomes.

Principles for data teams are:

  1. Use data to improve life for our users, customers, organizations and communities.
  2. Create reproducible and extensible work.
  3. Build teams with diverse ideas, backgrounds and strengths.
  4. Prioritize the continuous collection and availability of discussions and metadata.
  5. Clearly identify the questions and objectives that drive each project and use to guide both planning and refinement.
  6. Be open to changing our methods and conclusions in response to new knowledge.
  7. Recognize and mitigate bias in ourselves and in the data we use.
  8. Present our work in ways that empower others to make better-informed decisions.
  9. Consider carefully the ethical implications of choices we make when using data, and the impacts of our work on individuals and society.
  10. Respect and invite fair criticism while promoting the identification and open discussion of errors, risks and unintended consequences of our work.
  11. Protect the privacy and security of individuals represented in our data.
  12. Help others to understand the most useful and appropriate applications of data to solve real-world problems.

The Data Practices Courseware mentioned by McGarry that's offered by DataPractices.org includes "Project Lifecycle Curriculum" with courses such as "Sourcing data," "Data exploration" and "Analyze and report," among others. Also offered is the "Culture/Practice Curriculum" with courses such as "How to build a data-driven culture" and "Data Ethics 101."

"As a part of the Linux Foundation, DataPractices.org intends to enable a vendor-neutral community to further establish best practices and increase the level of data knowledge across the data ecosystem," The Linux Foundation said in a news release today (March 20). "The project's new open courseware is available to anyone interested in data best practices -- including novice practitioners, data managers, corporate evangelists, seasoned data scientists and more. The project also welcomes expert practitioners to help refine and advance the courseware."

About the Author

David Ramel is an editor and writer for Converge360.