Update README with setup instructions and added requirements.txt
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -160,3 +160,4 @@ cython_debug/
|
|||||||
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
||||||
#.idea/
|
#.idea/
|
||||||
|
|
||||||
|
graph.txt
|
||||||
|
|||||||
38
README.md
38
README.md
@@ -1,2 +1,40 @@
|
|||||||
# comp-4800-web-crawler
|
# comp-4800-web-crawler
|
||||||
|
|
||||||
|
This program will generate an undirected graph similar to web-Google. Given any starting
|
||||||
|
website, the program will parse any links on the website, and reursively find more
|
||||||
|
websites by visiting the pased links.
|
||||||
|
|
||||||
|
**NOTE: Be careful with this program, it send GET requests to the parsed websites. If you
|
||||||
|
send too many requests to the same website, they may block your IP address.**
|
||||||
|
|
||||||
|
# How to run
|
||||||
|
|
||||||
|
Make a virtual environment:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m venv venv
|
||||||
|
```
|
||||||
|
|
||||||
|
Activate:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
source venv/bin/activate
|
||||||
|
```
|
||||||
|
|
||||||
|
Install dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r reqirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
Run the program, giving a starting website.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python main.py jagrajaulakh.com
|
||||||
|
```
|
||||||
|
|
||||||
|
View the outputted graph:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat graph.txt
|
||||||
|
```
|
||||||
|
|||||||
5
requirements.txt
Normal file
5
requirements.txt
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
certifi==2022.12.7
|
||||||
|
charset-normalizer==3.1.0
|
||||||
|
idna==3.4
|
||||||
|
requests==2.28.2
|
||||||
|
urllib3==1.26.15
|
||||||
Reference in New Issue
Block a user