WARC-GPTWARC + AI: Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections. More info: "WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI". Feb 12 2024 - _lil.law.harvard.edu_https://github.com/harvard-lil/warc-gpt/assets/625889/8ea3da4a-62a1-4ffa-a510-ef3e35699237---Summary Features Installation Configuring the application Ingesting WARCs Starting the server Interacting with the Web UI Interacting with the API Visualizing Embeddings Disclaimer---Features Retrieval Augmented Generation pipeline for WARC files Highly customizable,…
Source code on GitHub.
WARC + AI: Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
More info:
https://github.com/harvard-lil/warc-gpt/assets/625889/8ea3da4a-62a1-4ffa-a510-ef3e35699237
---
---
---
Use the following commands to clone the project and instal its dependencies:
``bash
git clone https://github.com/harvard-lil/warc-gpt.git
poetry env use 3.11
poetry install
`
---
This program uses environment variables to handle settings.
Copy .env.example into a new .env file and edit it as needed.
`bash
cp .env.example .env
`
See details for individual settings in .env.example.
A few notes: