RSSscraper -- scrape yer own RSS feed

RSS Screenscraper written in Ruby

What is this?

RSSscraper generates RSS feeds from websites who do not provide it themselves. This is done using a technique called screenscraping. New scrapers can easily be made in Ruby, shared with other RSSscraper users, and plugged in - ready to serve. Serving the RSS feeds works with either the built-in webserver, CGI or eRuby.

Where to get it

RSSscraper can be downloaded from http://rubyforge.org/frs/?group_id=188

Requirements

RSSscraper needs Ruby 1.8 (or higher) to work. Downloads can be found on http://ruby-lang.org/. For Windows, the Ruby Installer For Windows is recommended: http://rubyinstaller.sourceforge.net/

Get started quickly

There are multiple ways to provide access to the RSS feeds:

Built in webserver

If you don't have a webserver set up already, you can use the built in webserver. In a graphical enviroment where .rb files are associated with Ruby, simply double-click the "ruby_server.rb" file.

Now, to subscribe to the scraper named "RubyGardenOrg", point your RSS reader to the following URL:
http://localhost:4049/RubyGardenOrg

If you want to use the shell/command prompt instead, enter:

ruby scrape_server.rb

An RSSscraper webserver is now running on port 4049. To use a different port number for example 2000, use:

ruby scrape_server.rb 2000

CGI

Put the "RSSscraper" folder in a location where scrape.cgi can be accessed as a CGI script.

Now, to subscribe, point you RSS reader to [scrape.cgi url]?scraper=[scraper name] eg. http://localhost/RSSscraper/scrape.cgi?scraper=RubyGardenOrg

eRuby

Put the "RSSscraper" folder in the document tree of your webserver.

Now, to subscribe, point you RSS reader to [scrape.rhtml url]?scraper=[scraper name] eg. http://localhost/RSSscraper/scrape.rhtml?scraper=RubyGardenOrg

Additional scrapers

To get an RSS feed from RSSscraper you need a scraper made specifically for the website. If you have downloaded a scraper, just put the *.scraper.rb file the "scrapers" directory and it is ready for use.

Scrapers can be found on the wiki. You can also post your own scrapers there.

If the scraper does not exist already and you know the Ruby language, you can make one yourself. As this is written, there is no guide for scraper authoring, but you can look at how the example scraper is made and go from there. It should be reasonably simple to understand. Otherwise, try to look for help in the forums on the project website.

Credits

Authored by: Lau Taarnskov

with help and assistance from: David Heinemeier Hansson

This is my first ruby application besides "Hello World". (Well, actually I did: 3.times{puts "hello world!"}) After creating a working prototype of RSSscraper, David did some refacoring and made some suggestions in a quasi-pair-programming-session using SubEthaEdit.

License

Copyright (c) 2004 Lau Taarnskov

This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.

Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.

  2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.

  3. This notice may not be removed or altered from any sourcedistribution.