+ Reply to Thread
Results 1 to 3 of 3
  1. Join Date
    Apr 2007
    Posts
    1

    myspace question

    is there a web crawler thing (other than google) that caches myspace profiles

  2. Join Date
    Nov 2006
    Location
    East of Happy Nonsense
    Posts
    178
    Myspace uses robot.txt to tell automated bots not to crawl its site.

    http://www.myspace.com/robots.txt

    However, search engines still caches some myspace pages.. And as google probably has the biggest cache of myspace pages then i only suggest using google.. (or a simular search engine; such as msn)

    Archive.org could work (i haven't checked), but as myspace uses a robot.txt it's highly unlikely to work...

    Sorry i couldn't help much..

  3. Join Date
    Sep 2005
    Location
    UK
    Posts
    2,068
    The file Troll mentioned (robots.txt) is a way of preventing bots crawling your website. You can disallow all crawlers, or some with a specific user agent.

    On Myspace's robots.txt, they block ia_archiver. I just Googled, and it's the user agent string of www.archive.org. I was going to suggest that as a place to check, but they're blocked from caching Myspace pages.

    Other than that, try Coral:

    http://www.coralcdn.org/

    ... or, just search Google for "search engine" and try the cached versions on all the search engines you can find. Yahoo, Live Search, et al.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts