nHttpDownloader

Introduction

nHttpDownloader is a small .NET library that allows a programmer to download multiple web pages in a configurable number of threads. nHttpDownloader is perfect for web spidering type applications where you need to retrieve a large number of web pages and make the maximum use of bandwidth.

Documentation

I have, as yet, written no documentation. But the thing is so easy to use, it hardly needs it. There's an example application below that shows some of the basic functionality. If you want to see a real use of all of it's functionality, I suggest you look at the source code for the Page Scavenger application. Page Scavenger is an application for downloading images from a variety of free image host web sites (like ImageShack, ImageVenue, etc). Page Scavenger makes use of nearly all the features of nHttpDownloader.

I have another project on SourceForge that will soon be using nHttpDownloader and when that code is ready, I will add a link here. Between the example below and the source code to Page Scavenger, I suspect there won't be any problems. But feel free to ask questions on the board.

Example

nHttpDownloader is designed to be easy to use, but to also provide a lot of flexibility. To give you an idea of how simple it is to use, here's a quick example.:

using nHttpDownloader;

private Downloader    _downloader;

public void InitDownlaoder()
{
    // 4 threads and 2 minute timeouts.
    _downloader = new Downloader(4, 120000);
}

public void ShutdownDownloader()
{
    // Disposing will stop the thread manager and
    // wait for any pending threads to complete.
    _downloader.Dispose();
}

public void DownloadPages(StringCollection pageUrlList)
{
    foreach(string url in pageUrlList)
    {
        // Queue a page
        Job job = _downloader.QueueJob(url, JobPriority.Medium);
        job.JobEnded += new EventHandler(Job_JobEnded);
        job.JobError += new JobErrorHandler(Job_JobError);

        // Enable job so it can begin.
        job.Enable();
    }
}

private void UnwireJobEvents(Job job)
{
    job.JobEnded -= new EventHandler(Job_JobEnded);
    job.JobError -= new JobErrorHandler(Job_JobError);
}

private void Job_JobEnded(object sender, EventArgs e)
{
    // Job completed. Save to a file.
    Job job = sender as Job;
    UnwireJobEvents(job);
    BinaryWriter bw = new BinaryWriter(new File.Open(job.Url.Substring(@"C:\MyHttpFiles\" + job.Url.LastIndexOf("/") + 1),
                      FileMode.CreateNew,
                      FileAccess.Write));
    bw.Write(job.Data);
    bw.Close();
}

private void Job_JobError(object sender, JobErrorEventArgs e)
{
    // Error encountered. We'll just report it, but we could resubmit it.
    Job job = sender as Job;
    UnwireJobEvents(job);
    Debug.WriteLine(string.Format("Job {0} downloading URL:{1} failed with error ('{2}')", job.ToString(), job.Url, e.ErrorMessage);
}

Flexibility

Keep in mind that the above example is a minimal example. It's functional, but it doesn't come close to using all the features...

In addition to the JobEnded and JobError events, there are JobStarted and JobProgress events to let you know when a url actually starts downloading as well as the ability to update any GUI progress information with the job's progress so far.

nHttpDownloader supports the use of cookies with a CookieContainer associated with each job.

Jobs can have any of 3 priorities, Low, Medium, and High. The job manager executes all high priority jobs in the order they're submitted. Then it executes all medium priority jobs in the order they were submitted, finally it executes all the low priority jobs in the order they were submitted. At any time, new jobs can be added. So if a medium priority job is executing and a high priority job is added, the high priority job will be the next to execute.

The Job class also has a Tag property (of type Object) that allows you to tag information to the job. You might, for example, you might put the filename in as the job tag and when the job ends, simply get the filename from the tag.

You can override the Browser string used. You can query the number of currently executing jobs. You can pause the downloads and then resume them.