nHttpDownloader
Introduction
nHttpDownloader is a small .NET library that allows a programmer to download multiple web pages in a configurable number of threads. nHttpDownloader is perfect for web spidering type applications where you need to retrieve a large number of web pages and make the maximum use of bandwidth.
Documentation
I have, as yet, written no documentation. But the thing is so easy to use, it hardly needs it. There's an example application below that shows some of the basic functionality. If you want to see a real use of all of it's functionality, I suggest you look at the source code for the Page Scavenger application. Page Scavenger is an application for downloading images from a variety of free image host web sites (like ImageShack, ImageVenue, etc). Page Scavenger makes use of nearly all the features of nHttpDownloader.
I have another project on SourceForge that will soon be using nHttpDownloader and when that code is ready, I will add a link here. Between the example below and the source code to Page Scavenger, I suspect there won't be any problems. But feel free to ask questions on the board.
Example
nHttpDownloader is designed to be easy to use, but to also provide a lot of flexibility. To give you an idea of how simple it is to use, here's a quick example.:
using nHttpDownloader;
private Downloader _downloader;
public void InitDownlaoder()
{
// 4 threads and 2 minute timeouts.
_downloader = new Downloader(4, 120000);
}
public void ShutdownDownloader()
{
// Disposing will stop the thread manager and
// wait for any pending threads to complete.
_downloader.Dispose();
}
public void DownloadPages(StringCollection pageUrlList)
{
foreach(string url in pageUrlList)
{
// Queue a page
Job job = _downloader.QueueJob(url,
JobPriority.Medium);
job.JobEnded += new
EventHandler(Job_JobEnded);
job.JobError += new
JobErrorHandler(Job_JobError);
// Enable job so it can begin.
job.Enable();
}
}
private void UnwireJobEvents(Job job)
{
job.JobEnded -= new EventHandler(Job_JobEnded);
job.JobError -= new JobErrorHandler(Job_JobError);
}
private void Job_JobEnded(object sender, EventArgs e)
{
// Job completed. Save to a file.
Job job = sender as Job;
UnwireJobEvents(job);
BinaryWriter bw = new BinaryWriter(new
File.Open(job.Url.Substring(@"C:\MyHttpFiles\" + job.Url.LastIndexOf("/") + 1),
FileMode.CreateNew,
FileAccess.Write));
bw.Write(job.Data);
bw.Close();
}
private void Job_JobError(object sender, JobErrorEventArgs e)
{
// Error encountered. We'll just report it, but we
could resubmit it.
Job job = sender as Job;
UnwireJobEvents(job);
Debug.WriteLine(string.Format("Job {0} downloading
URL:{1}
failed with error ('{2}')", job.ToString(), job.Url, e.ErrorMessage);
}
Flexibility
Keep in mind that the above example is a minimal example. It's functional, but it doesn't come close to using all the features...
In addition to the JobEnded and JobError events, there are JobStarted and JobProgress events to let you know when a url actually starts downloading as well as the ability to update any GUI progress information with the job's progress so far.
nHttpDownloader supports the use of cookies with a CookieContainer associated
with each job.
Jobs can have any of 3 priorities, Low, Medium, and High. The job manager
executes all high priority jobs in the order they're submitted. Then it
executes all medium priority jobs in the order they were submitted, finally it
executes all the low priority jobs in the order they were submitted. At any
time, new jobs can be added. So if a medium priority job is executing and a
high priority job is added, the high priority job will be the next to execute.
The Job class also has a Tag property (of type Object) that allows you to tag information to the job. You might, for example, you might put the filename in as the job tag and when the job ends, simply get the filename from the tag.
You can override the Browser string used. You can query the number of currently executing jobs. You can pause the downloads and then resume them.