I have used the Perl CPAN implementation
One way to check a lot of pages from a command prompt is a script like this:
(
cd /home/webserver/pages
baseurl='http://mywebserver.com/'pages'
for page in *.htm *.html; do
/usr/bin/linkcheck -s ${baseurl}${page} >> outputfile
done
)
This may take a while to run the checking process is very thorough and the reports are quite verbose.
A common requirement is to just check for 404 (bad link) errors. To only report these I filtered the output file through an awk script:
# from FTP::webx-johnr\/home/johnr/librarycheck|linkfilter.awk
BEGIN { url = ""; }
/^Processing/ { url=$2;
errorCount=0;}
/^http:/ { link=$1; }
/^ Lines: / {lines = $2 $3; }
/^ Code: 404 Not Found/ {
if (! errorCount) printf "\n\nCompany page: %s\n", url;
errorCount++;
printf "link: %s lines %s; %s\n", link, lines, $0;
}
END {}
This only lists pages where 404 errors have occurred, ok pages or other 'error's such as redirections or ignored links are not listed.
No comments:
Post a Comment