Identities
SortingHat maintains a database of identities of community members across different sources. An identity is a combination of a name, email, username and the source from where it was extracted. Identities corresponding to the same real person can be merged into one single individual with one unique profile.
Unify a contributor's identities#
Merging different identities#
You can merge all the different identities of a contributor using SortingHat, which will
be available at https://[INSTANCE].biterg.io/identities
.
Search for the identities using the Search
box in the Individuals
section. To merge
one identity into another, select it on the table and drag it with your cursor into the
target profile. You can select and drag several items at once.
Alternatively, you can click to select the identities and then click the MERGE
button
on the top right of the Individuals
table to unify them. This method does not allow you
to choose the main identity.
Splitting identities#
If an identity was wrongly assigned to a contributor, you can take it out of that profile
or "split" it. To do that, expand the profile on the Individuals
table and click the
button with the diverging arrows next to the identity you want to split.
To split all the identities from a profile at once, click the SPLIT ALL
button above
the list of identities. This will create a unique profile for each identity.
Finding identity matches#
To look for identities that belong to a contributor, use the Search
box on the
Individuals
table in SortingHat. If the contributor uses different names, emails or
usernames they may all not be findable in one single search. In that case, you can pin
the identities in the Workspace
to keep track of all of them. To pin a profile, select
it with the cursor and drag it to the Workspace
area or select Save in workspace
on
the profile's menu.
You can then keep searching for identities and merge them with the pinned profiles using
drag and drop, or saving them on the Workspace
and clicking the MERGE
button on the
top right.
This is a somewhat time consuming process. To automate it, you can ask SortingHat to
recommend possible matches based on a profile's names, usernames and/or emails. Click
on a contributor's name to open their full profile, and then click the FIND MATCHES
button on the upper part of the page. It will open a form where the recommendation
settings can be changed and confirmed.
The process to look for recommendations automatically may take some time. When it has finished, the profile page will show recommended identities that can be merged or, if they don't belong to that profile, dismissed.
To review all of the generated recommendations at once instead of opening each
contributor's profile, click on the # RECOMMENDATIONS
button above the Individuals
table. It will open a pop up where every suggested match can be applied or dismissed.
Manage profiles#
Editing profile data#
A contributor's name and main email can be edited, and information about their country
and gender can be added. Click on the contributor's name on the Individuals
table to
open the member's full profile. To edit one of these fields, put the cursor over it to
reveal a pencil button and click it to enter edit mode.
Marking bots#
There is an option to mark an identity as bot. This is available in the Individuals
section in SortingHat. This helps to filter out automated activity in the dashboard while
keeping such information in the database.
Search the identity that you want to mark as bot and click on the button with a robot icon next to the name.
Locking a profile#
Profiles can be locked to make them read-only. To lock a profile, place the cursor over
the contributor's name in the Individuals
section in SortingHat and click the button
with a lock icon. Clicking that button a second time unlocks the profile.
Marking a profile as reviewed#
To mark a profile as reviewed, click on the button with a check icon labelled as Mark
as reviewed
on top of the contributor's profile.
Put the cursor over the button to look up the review date of a profile.
If there have been any changes in the profile since it was reviewed, the button will display a warning icon. Clicking the button again updates the review date and removes the warning.
Jobs#
Priviledged users can schedule and trigger bulk data-processing jobs.
This automation is powerful, but also mostly irreversible, and therefore potentially dangerous, as accidents might result in data corruption, which is sometimes very difficult to fix and very time-consuming in both in corrections and checks. For this reason, these automations need special permissions to be run and Bitergia reserves them for its support team.
But in order to request these jobs to be run, you need to understand them.
Currently there are several job types available:
- Affiliate: Affiliate individuals to organizations using their e-mail domains.
- Genderize: Autocomplete the gender information of indivivuals using genderize.io recommendations.
- Recommend matches: Recommend identity matches for individuals.
- Unify: Unify individuals by using matching recommendations.
All of these jobs can be triggered by the user. Affiliate and Unify can also be scheduled to be automatically run regularily. Both triggered and scheduled jobs accept some configurations. These are different for each type of job. The schedules can be enabled and disabled for each job type. Currently a single schedule per job type is allowed.
You'll find all this in the Settings
button on the top menu. The General
subsection
on the left will allow the user to tweak and schedule the regular jobs. The Jobs
subsection is for manual triggered jobs.
Types of matching#
There are three types of matching that will allow the system to unify identities:
email
, name
, username
.
email
: same email address.name
: same full name.username
: same username of any source.
Our support team will study the data sources you are tracking and set the best algorithm to automatically identify the different accounts used by your community members. The goal is to avoid duplicated identities but also avoid having the wrong ones unified.
Interactions#
There's a single matching recommendations list. Each run of a Recommended matches job might add recommendations to the list.
Identity matching and unification are 2 steps of the same process. This is meant so the user to run agressive matching policies and manually tune the results before applying them.
But be aware that if you manually run a Recommended matches
job, the next scheduled
Unify
job will apply the recommendations. Disable the schedule if you have not yet
finished your manual tuning of the recommendations!
Prioritizing manual identity improvements#
Contributions to Open Source projects follow the Pareto distribution, so by focusing on the most relevant contributors we significantly impact data quality.
The image above shows a power law of Pareto distribution. The Y-axis represents the number of contributions, and the X-axis contains the different contributors ordered by number of contributions. Two colors represent two areas:
- Green represents the head, where we have the most active contributors.
- Yellow represents the long tail, where we have the vast majority of the community, with a small percentage of the overall activity.
Communities are huge sometimes, so it is essential to start improving the data set by paying attention to the head of the Pareto distribution. Bitergia Analytics Platform provides the Affiliations dashboard, which facilitates the work:
- A visualization ranks the top contributors providing links to their SortingHat profiles. Top contributors are also the most likely to have several identities.
- Several other visualizations facilitates filtering and focusing on specific organizations, domains, data sources or other interesting statistical populations.