I believe what you're referring to is "Google dorking"? Where you use search operators such as "filetype:pdf" to filter google results? You might have to go into more detail if you want a quick and easy answer.
I believe the term you are looking for is 'web scraping': the automated process of retrieving data of whatever website (in this case google).
Just know that Google doesn't like you doing that: it's in fact against their policies... which is ironic for a service that makes a living on webscraping other sites, but there you go.
Anyway, that doesn't mean you can't crawl Google results, it just means that if you fire about 150 requests at a rate of about 2 per second, you will get temporary banned from Google.
Their suggested way of doing it, is paying for their api service.
On your original question and how to do it:
If you look for 'python web scraping', you'll get a bunch of good and easy to follow guides.
There are also a multitude of tools that can scrape sites, from automation programs like 'automation anywhere' to dedicated software for web crawling. Google around, it's a pretty common task with many solutions for it.
What tools to use:
It kinda depends on what you want to do with the results:
If you want full control over the results (with the drawback of being a lot of work): go with python, or whatever other language suits you.
If you want to get up and running fast and just have a file containing all results, you're better off with ready made software... just a faster path to that goal.
Comments
I believe what you're referring to is "Google dorking"? Where you use search operators such as "filetype:pdf" to filter google results? You might have to go into more detail if you want a quick and easy answer.
Feel free to PM me, but please ask good questions: https://www.shorturl.at/fmAX6
But it mostly work for content/blogging sites only.
A Chemist doing Penetration Testing - Check the Story here: BinaryBiceps
Type your comment> @gunroot said:
Maybe he's talking about robots.txt?
Feel free to PM me, but please ask good questions: https://www.shorturl.at/fmAX6
> Maybe he's talking about robots.txt?
SEO uses sitemap.xml and it's content to index the pages over search engines. It may help in some cases to find all the pages of a site.
A Chemist doing Penetration Testing - Check the Story here: BinaryBiceps
Yes, basically google dorking. How could I save all the urls to a file? ex. if I search for site:mydomain.com
Just know that Google doesn't like you doing that: it's in fact against their policies... which is ironic for a service that makes a living on webscraping other sites, but there you go.
Anyway, that doesn't mean you can't crawl Google results, it just means that if you fire about 150 requests at a rate of about 2 per second, you will get temporary banned from Google.
Their suggested way of doing it, is paying for their api service.
On your original question and how to do it:
If you look for 'python web scraping', you'll get a bunch of good and easy to follow guides.
There are also a multitude of tools that can scrape sites, from automation programs like 'automation anywhere' to dedicated software for web crawling. Google around, it's a pretty common task with many solutions for it.
What tools to use:
It kinda depends on what you want to do with the results:
If you want full control over the results (with the drawback of being a lot of work): go with python, or whatever other language suits you.
If you want to get up and running fast and just have a file containing all results, you're better off with ready made software... just a faster path to that goal.
Best of luck!