Ingredients
  • Notepad
  • Xampp
  • Sitemap
  • 4-5 brain cells
  • Ready in 10 min
  • 1 (domain) Serving
  • 220 cals
At some point of your QA life, you will have to search for old and forgotten pages with errors. If you have experience on this and you know your website really good, it’s going to be the most boring thing in your life. On the other hand, if you are new to this… you will not know where to start. When I faced that problem for the first time, I didn’t have the time to build a script, so… go figure… Let’s jump in, straight away. A sitemap is a XML file which contains all the URLs that search bots are going to visit. Sitemap’s location is (almost) always this: http(s)://www.mydomain.com/sitemap.xml Just because I don’t want to “abuse” any of the available sitemaps out there, we are going to create our own. So… open a “text editor” aka notepad and paste the following: Save it as sitemap.xml Moving on! HTTP statuses (basics of basics) 200 – OK 301 – Moved permanently 400 – Bad requests 401 – Unauthorized 403 – Forbidden 404 – Not found 500 – Internal server error 502 – Bad gateway 503 – Service unavailable You can read about the HTTP statuses here: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes At this tutorial I’m going to create a script that handles the basics statuses. Obviously a proper script should contain all the available statuses but I’m not going to do it. YOU can do it! Be my hero! Create a php file and paste the following Run it! And boom! All the URLs from the XML file on your screen. So what we did… We created a new DOM document and we loaded the XML file in it. The “DOMDocument” allow us to use some properties and make our life easier. For example the “getElementsByTagName”, which searches for all elements with given “loc” (location) tag name. Yes you we can use regular expression and yes I like them more but the problem is that, you have to write the proper code for the regular and minified XML. So let’s move on! We have the URLs and now we have to check their status. To do so we have to add the following lines: We “get_header” from the given URL, we keep the first cell [0] and then we print it on the screen. Why we print the first [0] cell? It’s because the “get_header” command, returns this: So now if you run the script, you will get this:
https://www.google.com/HTTP/1.0 302 Found

http://google.com/abcHTTP/1.0 404 Not Found
https://www.amazon.comHTTP/1.1 200 OK
https://www.etsy.com/HTTP/1.0 200 OK
https://www.ebay.com/HTTP/1.0 302 Moved Temporarily
https://github.com/HTTP/1.1 200 OK
https://www.youtube.com/watch?v=aaaaaaaaaaaHTTP/1.0 301 Moved Permanently
So far so good. Not user-friendly but it works. A developers don’t need to print/echo any results but they trust their code and they can use counters. Let’s use counters. So now you have the numbers, you have the statuses, it’s not user-friendly and all you have to do is do add some CSS or at least put them in a HTML table. Peace!

Leave a Reply

We are a software house!

A place that we gather all together to build, test and ship software for high demanding clients.

Our headquarters

Ipirou 16
Drama, 66100
Greece

T: +30 2521 105247
E: [email protected]