github2_pull_requests
Warnings and Disclaimers#
On one hand, Github treats internally the pull requests as issues, and this causes misleading field names and misunderstandings.
On the other hand the capability of crossing information from different datasets by the search engine Bitergia Analytics platform bases on is very limited. Thus we need to create mixed datasets and this again causes misunderstandings. This data source is one of these cases.
Physical Design Approach of this Data Source#
This index contains 3 kind of documents related to Github pull requests.
It is possible to distinguish them by using the field item_type
: ( issue
, pull_request
, comment
).
There's one document of type comment
per pull request comment plus 2 other documents per GitHub pull request, an issue
and a pull_request
.
There are also 2 subtypes of documents of type comment
depending on the value of their subtype
field: issue_comment
or review_comment
.
All the field types are aggregatable except the text
field type. Those are:
title_analyzed
.issue_title_analyzed
.body_analyzed
.
Fields Description#
provenance | icon |
---|---|
issue |
|
pull_request |
|
issue_comment |
|
review_comment |
|
SortingHat | |
common field |
Name | Type | Description | provenance |
---|---|---|---|
Cross-references fields | NA | Fields coming from cross-references study (available only when it is active), see cross_references.csv. | |
assignee_data_bot | boolean | True/False if the pull request assignee is a bot or not | |
assignee_data_domain | keyword | Pull request assignee domain name. | |
assignee_data_gender | keyword | Pull request assignee gender, based on her name (disabled by default). | |
assignee_data_gender_acc | float | Pull request assignee gender accuracy (disabled by default). | |
assignee_data_id | keyword | Pull request assignee SortingHat profile ID. | |
assignee_data_name | keyword | Pull request assignee name. | |
assignee_data_multi_org_names | list | Pull request assignee organization names. | |
assignee_data_org_name | keyword | Pull request assignee organization name. | |
assignee_data_user_name | keyword | Pull request assignee username. | |
assignee_data_uuid | keyword | Pull request assignee SortingHat profile UUID. | |
assignee_domain | keyword | Pull request assignee domain name from GitHub. | |
assignee_geolocation | geo_point | Pull request assignee geolocation from GitHub. | |
assignee_location | keyword | Pull request assignee location from GitHub. | |
assignee_login | keyword | Pull request assignee login from GitHub. | |
assignee_name | keyword | Pull request assignee name from GitHub. | |
assignee_org | keyword | Pull request assignee organization name from Github. | |
author_bot | boolean | True/False if the pull request author is a bot or not from the SortingHat profile. | |
author_domain | keyword | Item (Pull request or comment) author domain name from SortingHat profile. | |
author_gender | keyword | Item (Pull request or comment) author gender, based on her name, from SortingHat (disabled by default). | |
author_gender_acc | float | Item (Pull request or comment) author gender accuracy from SortingHat (disabled by default). | |
author_id | keyword | Item (Pull request or comment) author SortingHat profile ID. | |
author_multi_org_names | list | Item (Pull request or comment) author organization names from SortingHat profile. | |
author_name | keyword | Item (Pull request or comment) author name from SortingHat profile. | |
author_org_name | keyword | Item (Pull request or comment) author organization name from SortingHat profile. | |
author_user_name | keyword | Item (Pull request or comment) author username from SortingHat profile. | |
author_uuid | keyword | Item (Pull request or comment) author SortingHat profile UUID. | |
body | keyword | Body of the comment. | |
body_analyzed | text | Body of the comment. | |
code_merge_duration | float | Difference in days between creation and merging dates. | |
comment_created_at | date | Date when the comment was created. | |
comment_updated_at | date | Date when the comment was updated. | |
demography_max_date | date | Date of the latest pull request of the corresponding author. Available only when demography study is active. | ? |
demography_min_date | date | Date of the first (oldest) pull request of the corresponding author. Available only when demography study is active. | ? |
forks | long | Number of repository forks. | |
github_repo | keyword | GitHub repository name. | |
grimoire_creation_date | date | Pull request/comment creation date. | |
id | keyword | Pull request/comment ID. Different id's for vs | |
is_github_comment | long | Used to identify or count (any) comments. | |
is_github_pull_request | long | Used to identify or count pull_request items. |
|
is_github_review_comment | long | Used to identify or count review comments. 1 for them. | |
is_github_issue_comment | long | Used to identify or count issue comments. 1 for them. | |
issue_id_in_repo | keyword | Original identifier of the pull request (GitHub pull request number). | |
issue_closed_at | date | Date in which the pull request was closed. | |
issue_created_at | date | Date in which the pull request was created. | |
issue_labels | list | Labels of the pull request. | |
issue_pull_request | boolean | True for documents of subtype issue_comment . |
|
issue_state | keyword | Life cycle state of the related pull request. | |
issue_title | keyword | Title of the related pull request for documents of type comment . |
|
issue_updated_at | date | Date when the pull request was last updated. | |
issue_url | keyword | URL of the pull request for documents of type comment . |
|
item_type | keyword | The type of the item (issue , pull_request , comment ). |
|
merge_author_domain | keyword | Merge author domain from GitHub. | |
merge_author_geolocation | geo_point | Merge author geolocation from GitHub. | |
merge_author_location | keyword | Merge author location as string from GitHub. | |
merge_author_login | keyword | Merge author login from GitHub. | |
merge_author_name | keyword | Merge author name from GitHub. | |
merge_author_org | keyword | Merge author organization from GitHub. | |
merged_by_data_bot | boolean | True/False if the merge author is a bot or not | |
merged_by_data_domain | keyword | Merge author domain. | |
merged_by_data_gender | keyword | Merge author gender, based on her name (disabled by default). | |
merged_by_data_gender_acc | float | Merge author gender accuracy (disabled by default). | |
merged_by_data_id | keyword | Merge author's SortingHat profile ID. | |
merged_by_data_name | keyword | Merge author name. | |
merged_by_data_org_name | keyword | Merge author organization. | |
merged_by_data_user_name | keyword | Merge author username. | |
merged_by_data_uuid | keyword | Merge author SortingHat profile UUID. | |
metadata__enriched_on | date | Date when the item was enriched. | |
metadata__gelk_backend_name | keyword | Name of the backend used to enrich information. | |
metadata__gelk_version | keyword | Version of the backend used to enrich information. | |
metadata__timestamp | date | Date when the item was stored in the RAW index. | |
metadata__updated_on | date | Date when the item was updated on its original data source. | |
num_review_comments | long | Number of review comments. | |
offset | long | ||
origin | keyword | Original URL of the repository the document was retrieved from. | |
project | keyword | Project. | |
project_1 | keyword | Project (if more than one level is allowed in the project hierarchy). | |
pull_id_in_repo | keyword | Original identifier of the pull request (GitHub pull request number). | |
pull_closed_at | date | Date in which the pull request was closed. | |
pull_created_at | date | Date in which the pull request was created. | |
pull_id | keyword | Pull request ID on GitHub. | |
pull_id_in_repo | keyword | Pull request ID in the GitHub repository. | |
pull_labels | keyword | Pull request assigned labels. | |
pull_merged | boolean | True if the pull request was already merged. | |
pull_merged_at | date | Date when the pull request was merged. | |
pull_state | keyword | Life cycle state of the pull request (open /closed /merged ). |
|
pull_updated_at | date | Date when the pull request was last updated. | |
pull_url | keyword | Full URL of the pull request. . | |
reaction_confused | long | Number of reactions 'confused'. | |
reaction_eyes | long | Number of reactions 'eyes'. | |
reaction_heart | long | Number of reactions 'heart'. | |
reaction_hooray | long | Number of reactions 'hooray'. | |
reaction_rocket | long | Number of reactions 'rocket'. | |
reaction_thumb_down | long | Number of reactions '-1'. | |
reaction_thumb_up | long | Number of reactions '+1'. | |
reaction_total_count | long | Number of total reactions. | |
repository | keyword | Repository URL. | |
repository_labels | keyword | Custom repository labels defined by the user. | |
review_state | keyword | Review lifecycle state (APPROVED , COMMENTED , CHANGES_REQUESTED , or empty). |
|
sub_type | keyword | Type of the comment (issue_comment or review_comment ). |
|
tag | keyword | Perceval tag. (The URL of the repository). | |
time_open_days | float | Time the pull request is open counted in days. | |
time_to_close_days | float | Time to close a pull request counted in days. | |
time_to_merge_request_response | float | Time to merge a pull request in days. | |
url | keyword | Url of the pull request/comment. The issue and its corresponding pull_request share url. Each comment has its own. |
|
user_data_bot | boolean | True/False if the pull request author is a bot or not. | |
user_data_domain | keyword | Pull request author domain name. | |
user_data_gender | keyword | Pull Request author gender, based on her name (disabled by default). | |
user_data_gender_acc | float | Pull request author gender accuracy (disabled by default). | |
user_data_id | keyword | Pull request author SortingHat profile ID. | |
user_data_name | keyword | Pull request author name. | |
user_data_org_name | keyword | Pull request author organization name. | |
user_data_user_name | keyword | Pull request author username. | |
user_data_uuid | keyword | Pull request author SortingHat profile UUID. | |
user_domain | keyword | Pull request author domain name from GitHub. | |
user_geolocation | geo_point | Pull request author geolocation from GitHub. | |
user_location | keyword | Pull request author location as string from GitHub. | |
user_login | keyword | Pull request author login from GitHub. | |
user_name | keyword | Pull request author username from GitHub. | |
user_org | keyword | Pull request author organization from GitHub. | |
uuid | keyword | Perceval UUID. Different for each item. |