739 B
739 B
comp-4800-web-crawler
This program will generate an undirected graph similar to web-Google. Given any starting website, the program will parse any links on the website, and reursively find more websites by visiting the pased links.
NOTE: Be careful with this program, it send GET requests to the parsed websites. If you send too many requests to the same website, they may block your IP address.
How to run
Make a virtual environment:
python -m venv venv
Activate:
source venv/bin/activate
Install dependencies
pip install -r reqirements.txt
Run the program, giving a starting website.
python main.py jagrajaulakh.com
View the outputted graph:
cat graph.txt