wget -e robots=off
It is as simple as the above and this is something one must watch out carefully when using wget because you may think you have archived or downloaded content when you never did due to a nofollow/robots.txt statement.........
There are a few ways of doing this and all basically involve using the reverse proxy or "ProxyPass" feature of Apache to accomplish it.
1.) Create a normal vhost and simply symlink the root directory of the site you want to mirror.
Eg. originalsite.com and newsite.com
/vhosts/originalsite.com/httpdocs
You would symlink like this:
ln -s /vhosts/originalsite.com/httpdocs vhosts/originalsite.com/........
First of all if you're getting this error it is a result of extreme database activity. If you aren't expecting it or it doesn't make sense to you 99% of the time this is a database driven script being exploited (some common examples I see often are things like phpBB being hit by dozens, hundreds or thousands of bots making constant DB write requests).
The easiest way to identify this is to restart MySQL and then run the third party tool "mtop" and you'll see all........
This is useful for developing a lot of applications, I'm putting it here to keep it handy for myself and hopefully others:
Choose CountryCanadaJapanUnited StatesUnited KingdomAfghanistan........
This is very handy if you're too busy and don't have time to download whatever files you need.
The -D specifies the domains allowed, this is because I specified -H which means foreign hosts are allowed, if you don't restrict them you'll end up going to the whole internet via ads and other links just like a search Engine would follow.
-l 0 specifies to go deep, to as many levels as possible/as exist.
-e robots=off is important because robots.txt often says you can't vie........