Github Issue Analysis
To use the github repo issue loader, you need to set your github token in the environment.
See here for how to get a github token.
See llama-hub for more details about the loader.
%pip install llama-index-readers-github%pip install llama-index-llms-openai%pip install llama-index-program-openai
import os
os.environ["GITHUB_TOKEN"] = "<your github token>"
Load Github Issue tickets
Section titled “Load Github Issue tickets”import os
from llama_index.readers.github import ( GitHubRepositoryIssuesReader, GitHubIssuesClient,)
github_client = GitHubIssuesClient()loader = GitHubRepositoryIssuesReader( github_client, owner="jerryjliu", repo="llama_index", verbose=True,)
docs = loader.load_data()
Found 100 issues in the repo page 1Resulted in 100 documentsFound 100 issues in the repo page 2Resulted in 200 documentsFound 100 issues in the repo page 3Resulted in 300 documentsFound 100 issues in the repo page 4Resulted in 400 documentsFound 4 issues in the repo page 5Resulted in 404 documentsNo more issues found, stopping
Quick inspection
docs[10].text
"feat(context length): QnA Summarization as a relevant information extractor\n### Feature Description\r\n\r\nSummarizer can help in cases where the information is evenly distributed in the document i.e. a large amount of context is required but the language is verbose or there are many irrelevant details. Summarization specific to the query can help.\r\n\r\nEither cheap local model or even LLM are options; the latter for reducing latency due to large context window in RAG. \r\n\r\nAnother place where it helps is that percentile and top_k don't account for variable information density. (However, this may be solved with inter-node sub-node reranking). \r\n"
docs[10].metadata
{'state': 'open', 'created_at': '2023-07-13T11:16:30Z', 'url': 'https://api.github.com/repos/jerryjliu/llama_index/issues/6889', 'source': 'https://github.com/jerryjliu/llama_index/issues/6889'}
Extract themes
Section titled “Extract themes”%load_ext autoreload%autoreload 2
The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload
from pydantic import BaseModelfrom typing import Listfrom tqdm.asyncio import asyncio
from llama_index.program.openai import OpenAIPydanticProgramfrom llama_index.llms.openai import OpenAIfrom llama_index.core.async_utils import batch_gather
prompt_template_str = """\Here is a Github Issue ticket.
{ticket}
Please extract central themes and output a list of tags.\"""
class TagList(BaseModel): """A list of tags corresponding to central themes of an issue."""
tags: List[str]
program = OpenAIPydanticProgram.from_defaults( prompt_template_str=prompt_template_str, output_cls=TagList,)
tasks = [program.acall(ticket=doc) for doc in docs]
output = await batch_gather(tasks, batch_size=10, verbose=True)
[Optional] Save/Load Extracted Themes
Section titled “[Optional] Save/Load Extracted Themes”import pickle
with open("github_issue_analysis_data.pkl", "wb") as f: pickle.dump(tag_lists, f)
with open("github_issue_analysis_data.pkl", "rb") as f: tag_lists = pickle.load(f) print(f"Loaded tag lists for {len(tag_lists)} tickets")
Summarize Themes
Section titled “Summarize Themes”Build prompt
prompt = """Here is a list of central themes (in the form of tags) extracted from a list of Github Issue tickets.Tags for each ticket is separated by 2 newlines.
{tag_lists_str}
Please summarize the key takeaways and what we should prioritize to fix."""
tag_lists_str = "\n\n".join([str(tag_list) for tag_list in tag_lists])
prompt = prompt.format(tag_lists_str=tag_lists_str)
Summarize with GPT-4
from llama_index.llms.openai import OpenAI
response = OpenAI(model="gpt-4").stream_complete(prompt)
for r in response: print(r.delta, end="")
1. Bug Fixes: There are numerous bugs reported across different components such as 'Updating/Refreshing documents', 'Supabase Vector Store', 'Parsing', 'Qdrant', 'LLM event', 'Service context', 'Chroma db', 'Markdown Reader', 'Search_params', 'Index_params', 'MilvusVectorStore', 'SentenceSplitter', 'Embedding timeouts', 'PGVectorStore', 'NotionPageReader', 'VectorIndexRetriever', 'Knowledge Graph', 'LLM content', and 'Query engine'. These issues need to be prioritized and resolved to ensure smooth functioning of the system.
2. Feature Requests: There are several feature requests like 'QnA Summarization', 'BEIR evaluation', 'Cross-Node Ranking', 'Node content', 'PruningMode', 'RelevanceMode', 'Local-model defaults', 'Dynamically selecting from multiple prompts', 'Human-In-The-Loop Multistep Query', 'Explore Tree-of-Thought', 'Postprocessing', 'Relevant Section Extraction', 'Original Source Reconstruction', 'Varied Latency in Retrieval', and 'MLFlow'. These features can enhance the capabilities of the system and should be considered for future development.
3. Code Refactoring and Testing: There are mentions of code refactoring, testing, and code review. This indicates a need for improving code quality and ensuring robustness through comprehensive testing.
4. Documentation: There are several mentions of documentation updates, indicating a need for better documentation to help users understand and use the system effectively.
5. Integration: There are mentions of integration with other systems like 'BEIR', 'Langflow', 'Hugging Face', 'OpenAI', 'DynamoDB', and 'CometML'. This suggests a need for better interoperability with other systems.
6. Performance and Efficiency: There are mentions of 'Parallelize sync APIs', 'Average query time', 'Efficiency', 'Upgrade', and 'Execution Plan'. This indicates a need for improving the performance and efficiency of the system.
7. User Experience (UX): There are mentions of 'UX', 'Varied Latency in Retrieval', and 'Human-In-The-Loop Multistep Query'. This suggests a need for improving the user experience.
8. Error Handling: There are several mentions of error handling, indicating a need for better error handling mechanisms to ensure system robustness.
9. Authentication: There are mentions of 'authentication' and 'API key', indicating a need for secure access mechanisms.
10. Multilingual Support: There is a mention of 'LLM中文应用交流微信群', indicating a need for multilingual support.