Skip to content

Affiliations

The affiliations data set is one of the key parts of the Bitergia Analytics Platform. Here you will find the information to modify the information about specific contributors or organizations.

The huge variety of tools used by software development communities makes really difficult any analysis. It is not enough to measure the activity in the code, issues, Q&A forums, mailing lists, etc. Without having a way to connect the activity of the contributors in the different tools (data sources as we call them) it will not be possible to offer high quality metrics. Just as an example, the three metrics below are only possible by having the affiliations data set:

  • Total number of contributors of a community. If your team members are using GitHub and Jira, are you sure you are not counting them twice?
  • Impact of an organization/company in the project. To get this metric you need the link between community members and organizations.
  • Retention rate of your contributors. How can you tell that someone is new to the community if you do not have a way to identify people using different accounts?

Affiliations Editor#

Identities are collected and stored by the Bitergia Analytics Platform. In case you want to improve the quality of the data set you will need to use the affiliations editor (SortingHat). Find below the typical actions our customers perform to achieve it.

The most common issue is to have some of the organizations under-represented. Follow the steps below to fix this:

  • Go to the Affiliations dashboard.
  • Select the organization Unknown in any of the two pie charts. Organizations are shown in the inner circle.
  • Identify the domains that should be associated to an organization.
  • Open the identities editor. It will be available at https://[INSTANCE].biterg.io/identities.
  • Go to the Organizations table on the right side and search for the one you are missing.
  • If you do not find it, add that organization to the database.
  • If you find it, extend the information available for that organization.
  • When the domain is added to the organization a process will refresh the metrics you see on the dashboards. This takes a couple of hours.

Adding a new organization#

An organization can be added via SortingHat which will be available at https://[INSTANCE].biterg.io/identities.

Click the ADD button on the top right corner of the Organizations table to add an organization.

To add auto affiliation you have to add the base domain of the organization. Marking the domain as top domain also will also auto affiliate using its subdomains.

New organization

Removing an organization#

Organizations can be removed via SortingHat. Note that all the enrollments of contributors affiliated to the deleted organization will be lost. Go to the Organizations table on the main page, and use the Search box to look for an organization. Click on the button with the three dots on the right side of the organization and then select Delete organization on the dropdown menu.

Remove organization

Adding an e-mail domain to an organization#

The platform can map email domains to organizations. By doing that every identity with any of those mapped domains will be associated to the proper organization.

On the Organizations table, use the Search box to look for an organization. For each organization you want to update, click the three dot button and select the button View/edit domins to add/remove the organization domains.

Manage organizations

Enrolling a person to an organization#

Search for the user using the Search box on the Individuals table, and click on the button with the chevron icon next to the user to expand it. Then, click the ADD button next to Organizations. On the pop up you can link the profile to an organization and set the start and end dates.

Enroll identity

Alternatively, you can use drag and drop to enroll a person to an organization. Search for the contributor on the Individuals table and for the organization on the Organizations table. Grab the user with your mouse and release it over the organization or viceversa. After that, a pop up will let you set the start and end dates.

Enroll identity with drag and drop

Un-enrolling a person from an organization#

Expand the contributor's information and click on the button with the trashcan icon next to the organization you want to remove. To un-enroll the person from all the organizations to which they are affiliated, click REMOVE ALL.

Unenroll

Marking bots#

There is an option to mark an identity as bot. This is available in the Individuals section in SortingHat. This helps to filter out automated activity in the dashboard while keeping such information in the database.

Search the identity that you want to mark as bot and click on the button with a robot icon next to the name.

Mark as bot

Preventing that affiliations removed from a contributor are restored#

If you changed the affiliation for some contributors and you discover the changes were restored, you can try the following. Instead of removing an organization automatically added to the contributor, change its timeframe. For instance, you can set the enrollment from 1900-01-01 to 1900-01-02, thus there will not be any data containing that enrollment.

How dates work#

Dates are by default set to the beginning of the day, i.e. 00:00:00. Thus, we should setting the periods as:

  • Organization A: 1900-01-01 to 2019-10-01
  • Organization B: 2019-10-01 to 2100-01-01

Because end date for Organization A will be set under the hood as 2019-10-01 00:00:00 (not inclusive) and start date for Organization B will be set as 2019-10-01 00:00:00 (inclusive).

So 2019-10-01 will be the first day that profile is enrolled to Organization B and 2019-09-30 will be the last day of that profile enrolled to Organization A.

It could be summarized as the following condition, where date is the date to check and decide the enrollment: start_date <= date < end_date.

Matching of identities#

Our support team will study the data sources you are tracking and set the best algorithm to identify the different accounts used by your community members. The goal is avoid duplicated identities but also having the wrong ones unified.

Types of matching#

There are three types of matching that will allow the system to unify identities: email, name, username.

  • email: same email address.
  • name: same full name.
  • username: same username of any source.

Troubleshooting#

Contributor activity is labeled with a single company when she/he is enrolled with more companies#

When two enrollments overlap in time, the system will use one of them to update the enriched information on the dashboard. That means that only one of them will be visible. In case you detect identities with overlapping dates edit them so the profile is not enrolled to more than one company at the same time.

Changes are not visible in the metrics dashboard#

Changes performed over the identities database need to be synchronized with the metrics data set.

For each data source (e.g., git, mboxes) there are 3 main phases: collection, enrichment and identities refresh. The last one takes care of synchronizing the data stored in OpenSearch with the one available in SortingHat. The time to perform the 3 steps depends on the amount of data available in the data source, however it generally takes less than 3 hours.