21 January 2013

Nerdfighter Blogs using Yahoo Pipes

EDIT (1/23/2013): I decided to use feedburner to handle the traffic. Also it will be sensible about refreshing periodically and not causing false traffic stats on everyone's site. The reason for the no-tumblr thing was that animated GIFs were dominating everything and I really want to focus on nerdfighters who use their words (and also their sentences). I can always use the Tumblr feed or search if  I'm desperately craving My Little Pony GIFs......  :)

I have recently started taking up an interest in the Nerdfighter community started by Hank and John Green. It has been such a joy to discover so much positivity and hope in this little (but growing) island of awesome in the swirling turbulent muck that can be the internet. For whatever reason their forums and community spaces are some of the most wonderful virtual spaces I have ever found.

I'm somewhat shy about new people so I decided to start with something comfortable like blogging. After a bit of searching around I settled into a thread of Nerdfighters who want to encourage each other to blog more and blog better. The thread is pretty simple: everyone just says hi and posts a link to their blog. Cool.

The Problem

Well it's not so much a problem as a need for convenience. As I write this there are over 4 pages of posts with people saying "hey, my blog is at http://myawesomeblog.com". I really want to read all the new blogs if I can but there are a lot of them.

I started by trying to visit each one and that got tedious fast. My second idea was to bookmark them all or put them all into my google reader but what I really want is a single big list of new posts from all of these sites. Also, and this is the crux of it, I want anyone who joins the thread to get added to my blog list automatically so I don't need to maintain a list of bloggers somewhere that could get out of date.

It's 2013 so let's roll up our space-sleeves and crack open the RSS spec!  Should easy with the power of TECHNOLOGY right?

Enter Yahoo Pipes

Yahoo pipes is a revolutionary service that allows you to take a bunch of inputs like web pages, strip them down, squish them around and put them back together using pre-made modules or writing your own. It does way more than that too but I really hadn't done much with it until today so what follows is just my experience.

This also touches on the nature of a free and open internet and how this is only possible because of just how free and open the internet is. Nothing I'm doing here is hacking or stealing in any way.

Overall it was a breeze to use and with a little logic and a few minutes of tinkering I was able to create a pipe (or series of tubes..... ) that output exactly what I needed. Now I can be sitting on the bus on the way to work and quickly skim over the day's blogs, tagging ones I want to read later.

Also I know this would have been easier to do using a PHP script or maybe a Google App Engine App or something but I wanted the easiest possible method without needing to pay for hosting.

How to use this feed:

If you've never used an RSS feed or don't know how it really couldn't be easier.

The actual RSS URLs are here:

You can take this RSS URL and give it to Google Reader or any other news reader to get a quick list of new blogs.

Results and Troubleshooting

The results are not perfect but what we have is a quick'n'dirty feed list of all the blogs that anyone can pull into Google Reader or Flipboard (my favourite) and now I can browse at a glance through Nerdfighter blogs and pick and choose ones that interest me.

Some Considerations and errata:

  • This script will not find your blog if it is not a link. So if you want to be listed you need to edit your post and make the text into a link.
  • There's a possibility for spam because any post that gets added will be scraped but I'm counting on the awesomeness of the Nerdfighter community and also on the moderators of the forum to help keep things clean. If it gets obscene I'll take it down.

Questions for you, oh gentle reader:

  • Should I filter out tumblrs? I don't want to be exclusionary but they add a lot of noise to the system and most of them are just GIFs. I really want this to be about blogs and writing.
  • What's missing?
  • Could it be better?
  • Is there a better way to do this that you know of?


Let's get nerdy: How to create a pipe

NB: What follows is pretty techy and you definitely don't need to understand it to use it but I was so impressed with how easy this was that I wanted to share a bit.

The actual pipes looks something like this:

Yahoo Pipes It looks more complicated than it is. I promise!

You can follow along and see the one I built here. Please feel completely free to clone it and modify it to fit your own personal tastes and needs or just pull it apart to see how it works.

Step 1. Find every page of the thread

First I pass in the base URL of the thread  using the 'URL builder'. The base is 'http://nerdfighteria.vanillaforums.com/and the following url parameters are discussion, 1033 and share-your-blog.

Then I pass this (by connecting a pipe) to the 'Xpath Fetch Page' module which reads the page from the URL I just built. I'm diving into the actual data because I know this discussion has multiple pages and I want to scrape all of them.

So my xpath code is:

//div[@id="PagerAfter"]/a

Which means "get all the HTML
tags with an id of PageAfter". This actually grabs the pager at the bottom of the page and gives us the links to all the individual pages
Then I pass this through the 'unique' module because the pager looks like this "« 1 2 3 4 »" and I know that the "«" and the "»" can both be links. I definitely don't want duplicate data and in the source I didn't see any good way to separate out the arrow links so the 'unique' module seemed like a good choice.

Step 2. Get the links from each post

This step requires an assumption: Most of the links inside messages will be to blogs with rss feeds.

This assumption proves to be wrong because each time you use @user to tag a user it creates a link but I filter those out and you'll see how in a moment.

So what we need now is just the links from all the messages and to ignore all the other stuff happening on the page. We do this by using the 'loop' module to iterate over all the page urls from step 1 and inside this 'loop' module we plunk down another 'XPath fetch page' module with the following xcode:

//div[@class="Message"]//a

Note: the // is important because we want to say "any <a> anywhere inside <div id="PageAfter">"

This xcode will find all links inside tags that have the class "Message". Whoa. That was easy.... except now we have a bunch of @user links back to vanillaforums too. We filter these out by using the '

filter' module that blocks all submissions containing a certain string and the string we use is "nerdfighteria.vanillaforums.com".

Step 3. Get the feeds from all the blogs

So now we have a list of urls to people's blogs (and not a bunch of pr0n and spam hopefully). Time for another loop. This time we use the 'loop' module in conjunction with the 'find first site feed' module which goes to each of the blogs we've just found and searches them for an RSS feed. Then it returns the posts from each feed.

Finally we run it through a 'sort' module that orders things in descending date order and VOILA!


Filed in: Wordpress   Nerd  

Comments: This is a tiny site and comment moderation is a pain. Let's use social links to have a conversation on a bigger platform: