41 lines
739 B
Markdown
41 lines
739 B
Markdown
# comp-4800-web-crawler
|
|
|
|
This program will generate an undirected graph similar to web-Google. Given any starting
|
|
website, the program will parse any links on the website, and reursively find more
|
|
websites by visiting the pased links.
|
|
|
|
**NOTE: Be careful with this program, it send GET requests to the parsed websites. If you
|
|
send too many requests to the same website, they may block your IP address.**
|
|
|
|
# How to run
|
|
|
|
Make a virtual environment:
|
|
|
|
```bash
|
|
python -m venv venv
|
|
```
|
|
|
|
Activate:
|
|
|
|
```bash
|
|
source venv/bin/activate
|
|
```
|
|
|
|
Install dependencies
|
|
|
|
```bash
|
|
pip install -r reqirements.txt
|
|
```
|
|
|
|
Run the program, giving a starting website.
|
|
|
|
```bash
|
|
python main.py jagrajaulakh.com
|
|
```
|
|
|
|
View the outputted graph:
|
|
|
|
```bash
|
|
cat graph.txt
|
|
```
|