This Blog Has Moved!

My blog has moved. Check out my new blog at realfreemarket.org.



Your Ad Here

Monday, February 23, 2009

FSK Asks - Fetching Old Items on an RSS Feed?

The more I use Google Reader, the more I get disgusted with it. I started working on "FSK's RSS Reader". I got stuck at a weird point.

I thought that an RSS feed contained *ALL* the posts that had ever been published on that blog. It turns out that an RSS feed only contains the 20-30 most recent posts. Google Reader's behavior confused me!

In order to write a decent RSS reader, you have to *CACHE* the data locally. This means that it's going to use up a lot of disk space!

Also, you have to cache the data, because otherwise it takes forever to load and parse the page!

When you subscribe in Google Reader, it shows you all the old posts, because Google Reader *ALREADY* has the archive of the full feed! That confused me immensely!

I also looked into using a 3rd party open source RSS parser. I quickly realized that it would be just as much effort for me to roll-my-own. "Write an RSS parser." is about as much work as "Write an XML parser." "Getting a 3rd party RSS parser to do what I want!" is about as much work as "Write my own!", especially considering that I have very specific desires. The one I looked at first (Magpie RSS) had a bug when parsing my own feed, which made me very rapidly say "**** this! I'll write my own XML/RSS parser."

I'm better off using the default XML parser that comes with PHP, rather than using a 3rd party tool.

I'm getting into the "Write RSS reader!" project. Basically, I'm just picking feeds in my Google Reader OPML file, running it through PHP's default XML parser, and throwing away all the irrelevant tags. I figure "If my RSS Reader parses all the feeds I'm subscribed to, then that's good enough."

It's much easier to figure out "Where is the useful information in this feed?" than trying to read through the specification. Besides, if my RSS Reader works only for the feeds I'm subscribed to, what's the problem?

I'm stuck on "How do I get all older data, and not just the most recent posts?"

In Blogger, you can use "?start-index=300&max-results=10" to specify where you start and how many items to get. For example, "http://fskrealityguide.blogspot.com/feeds/posts/default?start-index=300&max-results=10".

In WordPress, you can use "?paged=10" to get older RSS items. For example, "http://www.nothirdsolution.com/feed/?paged=10".

I appear to be SOL if the blogger has redirected his feed via FeedBurner, without continuing to publish the original.

There's also a "Google Reader API", which works with something like "http://www.google.com/reader/atom/feed/http://fskrealityguide.blogspot.com/feeds/posts/default?n=1000". However, I don't want to make my PHP application dependent on Google Reader. Further, you have to be logged into Google Reader to get this to work, making it hard to do this programatically.

I did a bunch of Googling and searching and couldn't find out the answer. Does anybody out there know the answer? My question is:


When fetching an RSS feed, how do I get *EVERYTHING* and not just the most recent posts? I figured out how to do it on Blogger and on WordPress, but I was wondering if there's a general solution.

Does anybody out there know? This is really technical, so I'm not expecting an answer. I'm expecting the answer is "**** it! Just make your Reader work starting with current posts."

After asking around elsewhere, the answer appears to be "RSS only supports fetching the most recent items. However, Blogger and WordPress offer a feature to fetch older items. Google has all the old feed information, because they've been storing it on their servers."

I could also use the Google Reader API quick-and-dirty. The problem with that method is I have to be logged in to Google Reader to actually use it, making it hard to do programatically. I can just manually load the xml file for the feeds I care about most, regarding reading older items.

In the worst-case scenario, I could just have my RSS feed fetcher only get the most current items, and have no history.

2 comments:

fritz said...

Good luck....I consider myself lucky just to figure out how to operate my new vista windows..But I'm sure that if anyone can pull it off its you..

Fritz

Anonymous said...

http://feedwordpress.radgeek.com/wiki/how-do-i-go-back-get-older-articles-or-all-articles-sites-archives

Hope this article could help your questions. I am looking for a method to fetch the old feeds to wordpress.

This Blog Has Moved!

My blog has moved. Check out my new blog at realfreemarket.org.