|
|||||||
![]() |
|
|
Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
is there a web crawler thing (other than google) that caches myspace profiles
|
|
#2
|
|||
|
|||
|
Myspace uses robot.txt to tell automated bots not to crawl its site.
http://www.myspace.com/robots.txt However, search engines still caches some myspace pages.. And as google probably has the biggest cache of myspace pages then i only suggest using google.. (or a simular search engine; such as msn) Archive.org could work (i haven't checked), but as myspace uses a robot.txt it's highly unlikely to work... Sorry i couldn't help much.. |
|
#3
|
||||
|
||||
|
The file Troll mentioned (robots.txt) is a way of preventing bots crawling your website. You can disallow all crawlers, or some with a specific user agent.
On Myspace's robots.txt, they block ia_archiver. I just Googled, and it's the user agent string of www.archive.org. I was going to suggest that as a place to check, but they're blocked from caching Myspace pages. Other than that, try Coral: http://www.coralcdn.org/ ... or, just search Google for "search engine" and try the cached versions on all the search engines you can find. Yahoo, Live Search, et al. |
![]() |
| Thread Tools | |
| Display Modes | |
|
|