SEC-EDGAR Analysis App 📍

Workflow

Overview

This repository contains a Retrieval-augmented generation (RAG) app for a streamlined workflow for processing, merging, normalizing, and analyzing SEC-EDGAR data with Large Language Models [LLM API used - Gemini pro model]. The final step involves using Streamlit to visualize the LLM results for quicker financial insights to better interpret the SEC-EDGAR tickers employed [MSFT, AAPL, GOOGL].

Demo [SEE HERE] 🔗-https://drive.google.com/file/d/1eMftidtmhJIFWohNr3so_RnMjGIrgeee/view?usp=sharing

Output 1 📉-

The illustrations show a significant increase in operating income as revenue grew from $257.64 billion to $280.38 billion, reflecting a positive correlation between revenue and profitability.

Output 2 📈 -

The illustrations show the trend of dividends declared per share over three years, showing a consistent increase from 2021 to 2023. The dividends rose from approximately $0.88 to about $0.94 per share, demonstrating a steady and positive growth in dividends declared over time.

Tech Stack 💻

Python: Our primary programming language for application development.
Gemini-Pro: A comprehensive data analysis and insights LLM suited for our RAG app. Offers free access up to a certain number of requests. Additionally, I explored other open-source LLMs like StabilityAI, Camel-AI, and Zephyr 7B. Gemini-Pro provides versatile output formats, including structured JSON/tabular data and well-tuned text analysis, making it highly suitable for our app.
Plotly: Plotly provides interactive and customizable visualizations for our app after converting .json responses to a dataframe.
Streamlit: Enables easy deployment and offers robust visualization features.

NOTE - Text Analysis of LLM can be accessed from the pdf 'INSIGHTS WITH TEXT RESPONSES'.

Backend Process 📁

Data Extraction and Zipping:
- Go to the data_processing directory.
- Run: 1_extra_and_zip.py
- Output: This will create a zip file for each ticker.
Merge and Normalize:
- Go to the data_processing directory.
- Run: 2_merge_and_normalize.py using the zip file created in step 1. Here we first convert to .json then .txt for faster processing of embeddings.

Repeat: Perform step 1 and 2 for each ticker separately.
Output: Generates merged and cleaned files that are ready for analysis.

Store Processed Files:
- Save: Place the merged files inside the documents directory in .txt format.
Load Data and Create Embeddings:
- Run: load_data.py
- Uncomment: The API line, and provide your gemini-pro API key.
- Note: This step involves file splitting and the creation of embeddings.
Analyze Data with Gemini:
- Run: main.py
- Uncomment: The API line, and provide your gemini-pro API key.
Automation with Streamlit:
- Run: app.py using Streamlit.
- Output: This creates an interface for fast analysis.

Getting Started

Clone the Repository:

git clone https://github.com/tishachawla-jg/SEC-EDGAR_Analyis_App.git

Install Dependencies: pip install -r requirements.txt

Run app locally: streamlit run app.py

If you wish to contribute to this project, please create a pull request or raise an issue to discuss improvements.

NOTE TO APP USERS - Make sure to cross check the answers for potential hallucinations!!!

Referenes -

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEC-EDGAR Analysis App 📍

Workflow

Overview

Tech Stack 💻

Backend Process 📁

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
data_processing		data_processing
documents		documents
INSIGHTS WITH TEXT RESPONSES.pdf		INSIGHTS WITH TEXT RESPONSES.pdf
README.md		README.md
app.py		app.py
load_data.py		load_data.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SEC-EDGAR Analysis App 📍

Workflow

Overview

Tech Stack 💻

Backend Process 📁

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages