Files
comp-4800-web-crawler/README.md

739 B

comp-4800-web-crawler

This program will generate an undirected graph similar to web-Google. Given any starting website, the program will parse any links on the website, and reursively find more websites by visiting the pased links.

NOTE: Be careful with this program, it send GET requests to the parsed websites. If you send too many requests to the same website, they may block your IP address.

How to run

Make a virtual environment:

python -m venv venv

Activate:

source venv/bin/activate

Install dependencies

pip install -r reqirements.txt

Run the program, giving a starting website.

python main.py jagrajaulakh.com

View the outputted graph:

cat graph.txt