Skip to content

Identities

SortingHat maintains a database of identities of community members across different sources. An identity is a combination of a name, email, username and the source from where it was extracted. Identities corresponding to the same real person can be merged into one single individual with one unique profile.

Unify a contributor's identities#

Merging different identities#

You can merge all the different identities of a contributor using SortingHat, which will be available at https://[INSTANCE].biterg.io/identities.

Search for the identities using the Search box in the Individuals section. To merge one identity into another, select it on the table and drag it with your cursor into the target profile. You can select and drag several items at once.

Merge identities

Alternatively, you can click to select the identities and then click the MERGE button on the top right of the Individuals table to unify them. This method does not allow you to choose the main identity.

Merge identities

Splitting identities#

If an identity was wrongly assigned to a contributor, you can take it out of that profile or "split" it. To do that, expand the profile on the Individuals table and click the button with the diverging arrows next to the identity you want to split.

Split identity

To split all the identities from a profile at once, click the SPLIT ALL button above the list of identities. This will create a unique profile for each identity.

Finding identity matches#

To look for identities that belong to a contributor, use the Search box on the Individuals table in SortingHat. If the contributor uses different names, emails or usernames they may all not be findable in one single search. In that case, you can pin the identities in the Workspace to keep track of all of them. To pin a profile, select it with the cursor and drag it to the Workspace area or select Save in workspace on the profile's menu.

Pin identities

You can then keep searching for identities and merge them with the pinned profiles using drag and drop, or saving them on the Workspace and clicking the MERGE button on the top right.

Merge in workspace

This is a somewhat time consuming process. To automate it, you can ask SortingHat to recommend possible matches based on a profile's names, usernames and/or emails. Click on a contributor's name to open their full profile, and then click the FIND MATCHES button on the upper part of the page. It will open a form where the recommendation settings can be changed and confirmed.

Find matches automatically

The process to look for recommendations automatically may take some time. When it has finished, the profile page will show recommended identities that can be merged or, if they don't belong to that profile, dismissed.

Possible matches

To review all of the generated recommendations at once instead of opening each contributor's profile, click on the # RECOMMENDATIONS button above the Individuals table. It will open a pop up where every suggested match can be applied or dismissed.

Review recommendations

Manage profiles#

Editing profile data#

A contributor's name and main email can be edited, and information about their country and gender can be added. Click on the contributor's name on the Individuals table to open the member's full profile. To edit one of these fields, put the cursor over it to reveal a pencil button and click it to enter edit mode.

Edit name

Marking bots#

There is an option to mark an identity as bot. This is available in the Individuals section in SortingHat. This helps to filter out automated activity in the dashboard while keeping such information in the database.

Search the identity that you want to mark as bot and click on the button with a robot icon next to the name.

Mark as bot

Locking a profile#

Profiles can be locked to make them read-only. To lock a profile, place the cursor over the contributor's name in the Individuals section in SortingHat and click the button with a lock icon. Clicking that button a second time unlocks the profile.

Lock profile

Marking a profile as reviewed#

To mark a profile as reviewed, click on the button with a check icon labelled as Mark as reviewed on top of the contributor's profile.

Review profile

Put the cursor over the button to look up the review date of a profile.

Review profile date

If there have been any changes in the profile since it was reviewed, the button will display a warning icon. Clicking the button again updates the review date and removes the warning.

Review profile changes

Jobs#

Priviledged users can schedule and trigger bulk data-processing jobs.

This automation is powerful, but also mostly irreversible, and therefore potentially dangerous, as accidents might result in data corruption, which is sometimes very difficult to fix and very time-consuming in both in corrections and checks. For this reason, these automations need special permissions to be run and Bitergia reserves them for its support team.

But in order to request these jobs to be run, you need to understand them.

Currently there are several job types available:

  • Affiliate: Affiliate individuals to organizations using their e-mail domains.
  • Genderize: Autocomplete the gender information of indivivuals using genderize.io recommendations.
  • Recommend matches: Recommend identity matches for individuals.
  • Unify: Unify individuals by using matching recommendations.

All of these jobs can be triggered by the user. Affiliate and Unify can also be scheduled to be automatically run regularily. Both triggered and scheduled jobs accept some configurations. These are different for each type of job. The schedules can be enabled and disabled for each job type. Currently a single schedule per job type is allowed.

You'll find all this in the Settings button on the top menu. The General subsection on the left will allow the user to tweak and schedule the regular jobs. The Jobs subsection is for manual triggered jobs.

Types of matching#

There are three types of matching that will allow the system to unify identities: email, name, username.

  • email: same email address.
  • name: same full name.
  • username: same username of any source.

Our support team will study the data sources you are tracking and set the best algorithm to automatically identify the different accounts used by your community members. The goal is to avoid duplicated identities but also avoid having the wrong ones unified.

Interactions#

There's a single matching recommendations list. Each run of a Recommended matches job might add recommendations to the list.

Identity matching and unification are 2 steps of the same process. This is meant so the user to run agressive matching policies and manually tune the results before applying them.

But be aware that if you manually run a Recommended matches job, the next scheduled Unify job will apply the recommendations. Disable the schedule if you have not yet finished your manual tuning of the recommendations!

Prioritizing manual identity improvements#

Contributions to Open Source projects follow the Pareto distribution, so by focusing on the most relevant contributors we significantly impact data quality.

Pareto distribution

The image above shows a power law of Pareto distribution. The Y-axis represents the number of contributions, and the X-axis contains the different contributors ordered by number of contributions. Two colors represent two areas:

  • Green represents the head, where we have the most active contributors.
  • Yellow represents the long tail, where we have the vast majority of the community, with a small percentage of the overall activity.

Communities are huge sometimes, so it is essential to start improving the data set by paying attention to the head of the Pareto distribution. Bitergia Analytics Platform provides the Affiliations dashboard, which facilitates the work:

  • A visualization ranks the top contributors providing links to their SortingHat profiles. Top contributors are also the most likely to have several identities.
  • Several other visualizations facilitates filtering and focusing on specific organizations, domains, data sources or other interesting statistical populations.