Update README with setup instructions and added requirements.txt
This commit is contained in:
38
README.md
38
README.md
@@ -1,2 +1,40 @@
|
||||
# comp-4800-web-crawler
|
||||
|
||||
This program will generate an undirected graph similar to web-Google. Given any starting
|
||||
website, the program will parse any links on the website, and reursively find more
|
||||
websites by visiting the pased links.
|
||||
|
||||
**NOTE: Be careful with this program, it send GET requests to the parsed websites. If you
|
||||
send too many requests to the same website, they may block your IP address.**
|
||||
|
||||
# How to run
|
||||
|
||||
Make a virtual environment:
|
||||
|
||||
```bash
|
||||
python -m venv venv
|
||||
```
|
||||
|
||||
Activate:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
Install dependencies
|
||||
|
||||
```bash
|
||||
pip install -r reqirements.txt
|
||||
```
|
||||
|
||||
Run the program, giving a starting website.
|
||||
|
||||
```bash
|
||||
python main.py jagrajaulakh.com
|
||||
```
|
||||
|
||||
View the outputted graph:
|
||||
|
||||
```bash
|
||||
cat graph.txt
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user