Project Introduction
HLD
Data Extraction
URLs, including video and SVG content, are crawled and embedded.
Database Layer
MicrosoftSQL stores chat history, embedding details, and other relevant data.
Embedding Storage
Pinecone is used to store the embeddings.
Backend Layer
FastAPI handles API requests and responses, ensuring efficient communication.
AI Integration
OpenAI and Langchain process user queries, generate responses, and reference relevant sources.
Discovery Phase Details and Process
Requirement Gathering
Engaged with stakeholders to gather detailed requirements and user expectations.
Feasibility Study
Analyzed technical feasibility and identified potential challenges.
Technology Selection
Selected MicrosoftSQL, FastAPI, OpenAI, Langchain, and Pinecone as the tech stack for their robustness and compatibility.
Data Collection Strategy
Designed a strategy for extracting and embedding data from various URLs, including video and SVG content.
Prototyping
Created initial prototypes to validate the approach and gather early feedback.
Libraries Used
SQLAlchemy: For database operations and ORM.
BeautifulSoup & Requests: For web scraping and data extraction from URLs.
Langchain: To manage and process natural language queries.
Pinecone: For storing and managing embeddings.
Databases Used
Microsoft SQL: The primary database for storing chat history, embedding details, and other related data.
Pinecone: Used for storing embeddings of extracted content, enabling quick and efficient retrieval during chatbot interactions.
Integrations Performed
Several integrations were essential to achieve the project’s objectives:
OpenAI API
Integrated for natural language processing to understand and interpret user queries.
Langchain
Used for advanced language processing and chaining multiple language models.
Pinecone
Integrated to store and manage embeddings of the extracted content.
FastAPI
Ensures smooth communication between the backend and the AI models, providing a responsive user experience.
Data Extraction Tools
Integrated libraries for scraping and extracting data from various URLs, including video and SVG content.