What is a tool to get all pages for a site from Google?

What is a tool to get all pages for a site from Google?
So pages which are currently indexed, ie. not with a spider and not with dirbuster.

TIA

I believe what you’re referring to is “Google dorking”? Where you use search operators such as “filetype:pdf” to filter google results? You might have to go into more detail if you want a quick and easy answer.

What about sitemap.xml ?
But it mostly work for content/blogging sites only.

Type your comment> @gunroot said:

What about sitemap.xml ?
But it mostly work for content/blogging sites only.

Maybe he’s talking about robots.txt?

@PapyrusTheGuru

Maybe he’s talking about robots.txt?

SEO uses sitemap.xml and it’s content to index the pages over search engines. It may help in some cases to find all the pages of a site.

Yes, basically google dorking. How could I save all the urls to a file? ex. if I search for site:mydomain.com

I believe the term you are looking for is ‘web scraping’: the automated process of retrieving data of whatever website (in this case google).
Just know that Google doesn’t like you doing that: it’s in fact against their policies… which is ironic for a service that makes a living on webscraping other sites, but there you go.

Anyway, that doesn’t mean you can’t crawl Google results, it just means that if you fire about 150 requests at a rate of about 2 per second, you will get temporary banned from Google.

Their suggested way of doing it, is paying for their api service.

On your original question and how to do it:
If you look for ‘python web scraping’, you’ll get a bunch of good and easy to follow guides.
There are also a multitude of tools that can scrape sites, from automation programs like ‘automation anywhere’ to dedicated software for web crawling. Google around, it’s a pretty common task with many solutions for it.

What tools to use:
It kinda depends on what you want to do with the results:
If you want full control over the results (with the drawback of being a lot of work): go with python, or whatever other language suits you.
If you want to get up and running fast and just have a file containing all results, you’re better off with ready made software… just a faster path to that goal.

Best of luck!

This thread is from a few years ago, but I still wanna chime in and offer some friendly advice. If you’re looking for a tool to retrieve all the indexed pages of a website from Google, you can explore athinadigital.co.uk. They’re like the experts in SEO and can provide professional guidance to improve your website’s visibility. Understanding the indexed pages of your website is crucial for SEO optimization. With the right tools and strategies, you can make informed decisions to enhance your website’s performance and increase its visibility on search engines.