Update README with setup instructions and added requirements.txt

This commit is contained in:
2023-03-12 14:30:55 -04:00
parent 4040c52114
commit 4597d8c775
3 changed files with 44 additions and 0 deletions

1
.gitignore vendored
View File

@@ -160,3 +160,4 @@ cython_debug/
# option (not recommended) you can uncomment the following to ignore the entire idea folder. # option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/ #.idea/
graph.txt

View File

@@ -1,2 +1,40 @@
# comp-4800-web-crawler # comp-4800-web-crawler
This program will generate an undirected graph similar to web-Google. Given any starting
website, the program will parse any links on the website, and reursively find more
websites by visiting the pased links.
**NOTE: Be careful with this program, it send GET requests to the parsed websites. If you
send too many requests to the same website, they may block your IP address.**
# How to run
Make a virtual environment:
```bash
python -m venv venv
```
Activate:
```bash
source venv/bin/activate
```
Install dependencies
```bash
pip install -r reqirements.txt
```
Run the program, giving a starting website.
```bash
python main.py jagrajaulakh.com
```
View the outputted graph:
```bash
cat graph.txt
```

5
requirements.txt Normal file
View File

@@ -0,0 +1,5 @@
certifi==2022.12.7
charset-normalizer==3.1.0
idna==3.4
requests==2.28.2
urllib3==1.26.15