There are a few ways of doing this and all basically involve using the reverse proxy or "ProxyPass" feature of Apache to accomplish it.
1.) Create a normal vhost and simply symlink the root directory of the site you want to mirror.
Eg. originalsite.com and newsite.com
You would symlink like this:
ln -s /vhosts/originalsite.com/httpdocs vhosts/originalsite.com/........
This is very handy if you're too busy and don't have time to download whatever files you need.
The -D specifies the domains allowed, this is because I specified -H which means foreign hosts are allowed, if you don't restrict them you'll end up going to the whole internet via ads and other links just like a search Engine would follow.
-l 0 specifies to go deep, to as many levels as possible/as exist.
-e robots=off is important because robots.txt often says you can't vie........
This is a great way to use your ftp server space, for example on your web hosting account (althoughI believe many hosts don't allow storage like this), but if you have a VPS/Dedicated Server etc.., this would be perfect. Imagine how easy it is to work with an ftp account that you can just mount as a normal partition or directory in Linux, it would be great for backups etc..
curlftpfs - mount a ftp host as a local directory
I decided on using yum to help me decide even though I normaly use proftpd I decided to see what else I could find.
yum search ftp
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* rpmforge: ftp-stud.fht-esslingen.de
* base: mirrors.netdna.com
* updates: updates.interworx.info
* addons: yum.singlehop.com
* extras: mirrors.netdna.com
This was an interesting article and people should be wary of Google's power. Google's policies and actions against competitors borders on anti-trust, far worse than Microsoft.
They are doing the right thing for all of us, Google says selling advertising and links is bad for search engines, yet they don't penalize sites who use AdSense.