Skip to content

tishachawla-jg/SEC-EDGAR_Analysis_App

Repository files navigation

SEC-EDGAR Analysis App 📍

Workflow

Overview

This repository contains a Retrieval-augmented generation (RAG) app for a streamlined workflow for processing, merging, normalizing, and analyzing SEC-EDGAR data with Large Language Models [LLM API used - Gemini pro model]. The final step involves using Streamlit to visualize the LLM results for quicker financial insights to better interpret the SEC-EDGAR tickers employed [MSFT, AAPL, GOOGL].

Demo [SEE HERE] 🔗-https://drive.google.com/file/d/1eMftidtmhJIFWohNr3so_RnMjGIrgeee/view?usp=sharing

Output 1 📉-

image

The illustrations show a significant increase in operating income as revenue grew from $257.64 billion to $280.38 billion, reflecting a positive correlation between revenue and profitability.

Output 2 📈 -

image image image

The illustrations show the trend of dividends declared per share over three years, showing a consistent increase from 2021 to 2023. The dividends rose from approximately $0.88 to about $0.94 per share, demonstrating a steady and positive growth in dividends declared over time.

Tech Stack 💻

  • Python: Our primary programming language for application development.

  • Gemini-Pro: A comprehensive data analysis and insights LLM suited for our RAG app. Offers free access up to a certain number of requests. Additionally, I explored other open-source LLMs like StabilityAI, Camel-AI, and Zephyr 7B. Gemini-Pro provides versatile output formats, including structured JSON/tabular data and well-tuned text analysis, making it highly suitable for our app.

  • Plotly: Plotly provides interactive and customizable visualizations for our app after converting .json responses to a dataframe.

  • Streamlit: Enables easy deployment and offers robust visualization features.

NOTE - Text Analysis of LLM can be accessed from the pdf 'INSIGHTS WITH TEXT RESPONSES'.

Backend Process 📁

image

  1. Data Extraction and Zipping:

    • Go to the data_processing directory.
    • Run: 1_extra_and_zip.py
    • Output: This will create a zip file for each ticker.
  2. Merge and Normalize:

    • Go to the data_processing directory.
    • Run: 2_merge_and_normalize.py using the zip file created in step 1. Here we first convert to .json then .txt for faster processing of embeddings.
  • Repeat: Perform step 1 and 2 for each ticker separately.
  • Output: Generates merged and cleaned files that are ready for analysis.
  1. Store Processed Files:

    • Save: Place the merged files inside the documents directory in .txt format.
  2. Load Data and Create Embeddings:

    • Run: load_data.py
    • Uncomment: The API line, and provide your gemini-pro API key.
    • Note: This step involves file splitting and the creation of embeddings.
  3. Analyze Data with Gemini:

    • Run: main.py
    • Uncomment: The API line, and provide your gemini-pro API key.
  4. Automation with Streamlit:

    • Run: app.py using Streamlit.
    • Output: This creates an interface for fast analysis.

Getting Started

  • Clone the Repository:
    git clone https://github.com/tishachawla-jg/SEC-EDGAR_Analyis_App.git
    

Install Dependencies: pip install -r requirements.txt

Run app locally: streamlit run app.py

If you wish to contribute to this project, please create a pull request or raise an issue to discuss improvements.

NOTE TO APP USERS - Make sure to cross check the answers for potential hallucinations!!!

Referenes -

Releases

No releases published

Packages

 
 
 

Contributors

Languages