RSS Screenscraper written in Ruby
RSSscraper generates RSS feeds from websites who do not provide it themselves. This is done using a technique called screenscraping. New scrapers can easily be made in Ruby, shared with other RSSscraper users, and plugged in - ready to serve. Serving the RSS feeds works with either the built-in webserver, CGI or eRuby.
RSSscraper can be downloaded from http://rubyforge.org/frs/?group_id=188
RSSscraper needs Ruby 1.8 (or higher) to work. Downloads can be found on http://ruby-lang.org/. For Windows, the Ruby Installer For Windows is recommended: http://rubyinstaller.sourceforge.net/
There are multiple ways to provide access to the RSS feeds:
If you don't have a webserver set up already, you can use the built in webserver. In a graphical enviroment where .rb files are associated with Ruby, simply double-click the "ruby_server.rb" file.
Now, to subscribe to the scraper named "RubyGardenOrg", point your RSS reader to the following URL:
http://localhost:4049/RubyGardenOrg
If you want to use the shell/command prompt instead, enter:
ruby scrape_server.rb
An RSSscraper webserver is now running on port 4049. To use a different port number for example 2000, use:
ruby scrape_server.rb 2000
Put the "RSSscraper" folder in a location where scrape.cgi can be accessed as a CGI script.
Now, to subscribe, point you RSS reader to [scrape.cgi url]?scraper=[scraper name] eg. http://localhost/RSSscraper/scrape.cgi?scraper=RubyGardenOrg
Put the "RSSscraper" folder in the document tree of your webserver.
Now, to subscribe, point you RSS reader to [scrape.rhtml url]?scraper=[scraper name] eg. http://localhost/RSSscraper/scrape.rhtml?scraper=RubyGardenOrg
To get an RSS feed from RSSscraper you need a scraper made specifically for the website. If you have downloaded a scraper, just put the *.scraper.rb file the "scrapers" directory and it is ready for use.
Scrapers can be found on the wiki. You can also post your own scrapers there.
If the scraper does not exist already and you know the Ruby language, you can make one yourself. As this is written, there is no guide for scraper authoring, but you can look at how the example scraper is made and go from there. It should be reasonably simple to understand. Otherwise, try to look for help in the forums on the project website.
Authored by: Lau Taarnskov
with help and assistance from: David Heinemeier Hansson
This is my first ruby application besides "Hello World". (Well, actually I did: 3.times{puts "hello world!"}) After creating a working prototype of RSSscraper, David did some refacoring and made some suggestions in a quasi-pair-programming-session using SubEthaEdit.
Copyright (c) 2004 Lau Taarnskov
This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.
Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:
The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.
Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.
This notice may not be removed or altered from any sourcedistribution.