Jul

18

Week 8 of the Summer of Code was nothing short of a roller coaster ride. At some points I wanted to tear my hair out, other times I was jumping for joy. Here is what went down this week:

Bad: Regular Expressions Are Not My Thing

While working towards media importing this week, I had to write a few regex statements. I’ll admit it, I completely suck at regular expressions. They bring me back to my not so fond days of Discrete Mathmatics; days I would rather forget. Regardless, I managed to get through them and ended with some working code.


Good: Faster Response Times From TypePad

I noticed while working on media importing, TypePad’s server have greatly increased in speed this week. I found that importing took roughly half the time it did in the previous week, with no code optimizations on my part. The speed increases managed to stay consistent through the week, so I’m hoping the changes are here to stay.

Bad: Media Importing is Near Impossible

Wednesday came to a grinding halt when I noticed there was no way to reliably import media by parsing images from posts. There are several reasons to this:

So, media importing is unfortunately looking very glum. I will try to come up with something since importing media is a crucial feature in my book, but I’m not sure what. More on this in the coming weeks.

Good: WordPress 2.6 Released

The new version of WordPress was released this week, and I was happy for several reasons. First off, I was able update my blog and take advantage of the new features, but more importantly, trunk is going bleeding again. Hopefully in the next week or so the TypePad importer should land in trunk. Stay on the lookout!

Good/Bad: Headway Made on Movable Type Authentication Issue

I managed to get in touch with the developer over at Six Apart who originally wrote Movable Type’s Atom API implementation. The good news is he confirmed I’m not crazy and MT is authenticating differently than TypePad. The bad news is he couldn’t remember off hand what is different with the implementation.

From what it sounds like, MT is not following the RFC 5023 spec in regards to WSSE, simply because the specification was not standardized when the original code was written. I’m not sure where this leaves me, because I don’t really know what is different with authentication at the moment. Also, I’m unsure if Movable Type will correct the authentication difference in the near future.

At the moment, it appears I will be brushing up on my Perl skills and looking at the source code for Movable Type next week. With a little luck, this issue will finally be ironed out. Who knows, maybe I’ll even figure out something for media importing next week as well. Right now, I’m just hoping next week is the week of miracles.

Jul

12

The launch of Apple’s iPhone / iPod touch App Store appears to have been a great success. Apple managed to pull in several well known Mac, Palm, and game developers to contribute to the over 500 available applications at launch. Already there is an application to meet nearly every need, and with the majority of remaining developers being accepted into the developer program, I’m sure there will be more great applications in the coming days and weeks.

While the App Store and applications have been a huge hit, playing around with applications over the past two days has filled me with some worries. Worries that Apple will need to address if they want the iPhone to succeed as a platform.


Application Data

After trying out several applications yesterday, a major flaw in the way application data is stored became apparent. Application data (preferences, files, saved data such as games) are all stored directly linked with the application that created them. Therefore, if an application is uninstalled, everything that application ever created is cleaned up and throw into the ether. Sounds like a great way to make sure the iPhone stays uncluttered, no? Well it is, but keeping the iPhone clutter free brings problems.

Due to the nature of the application / data relationship, if an application is removed from the iPhone unintentionally, everything that application ever stored is removed permanently. Let me give you an example.

While messing with iTunes’ settings yesterday, I changed my iPod touch’s Application syncing preference to selective applications only. I forgot to select a few applications, and they were removed from my iPod touch the second I clicked apply. No big deal, right? I reselected the applications, and they appeared on my iPod touch with one exception - they were reset to their default settings and no longer contained my saved data. Thankfully, I only lost my Facebook login settings, my Flickr login settings, and my level 8 save game of Enigmo, but the results could have been much worse.

Looking at the list of applications currently available in the App Store, I would say 95% of applications would be fine with their data reset. Users would only loose some display settings, maybe a login or two, and that would be all. However, as the platform matures, more applications (and their users) will become reliant on stored data. Imagine finding out a year’s worth of mileage logs disappeared during your last iPhone’s restore. That could be a disaster.

Syncing

Part of this problem is due to the application / data relationship, but the bigger issue is the lack of a standardized syncing method in the iPhone OS. I would have no problem loosing data after a restore if that data could easily be added back, but at the moment there is no way to restore application data (yes, I’m aware iTunes currently stores a backup, but that is only of the most reason sync, and is no help if a single application looses its data).

Currently all of the native iPhone applications, with the exception of Notes and SMS, sync through an application that manages the iPhone’s stored data. Calendar items sync with iCal, addresses sync with Address Book, and so on. However, third-party applications are left to fend for themselves. Some application developers have cleverly worked around this by utilizing “the cloud” (great example is OmniFocus’ WebDAV sync), but applications without desktop counterparts are left stranded.

How to Trust the iPhone Platform

If Apple wants the iPhone platform to be trusted among businesses and consumers, they need to address these issues. Start backing up application data separate from the application itself. So, when the application is reinstalled, the data can be restored as well. With simple changes such as this, the iPhone will not only be the most innovate mobile platform today, the iPhone can become the most trusted platform as well.

Jul

11

In short, this past week of the Summer of Code has been a rebuilding and planning week. In addition to filling out midterm surveys and reports, I shifted gears toward media importing.

Why Import Media?

The first question I asked myself this week is why should TypePad’s media be imported into WordPress. This was a relatively easy question to answer. TypePad, being a paid service, will delete and remove all content when canceled. This includes images uploaded and stored in TypePad’s cloud. So, importing images and the like is important if a TypePad switcher does not want to see broken images all over the place.


Can TypePad Media Be Imported?

After determining media importing is essential, I explored my options for getting media into WordPress. Thankfully, TypePad has an AtomAPI for their web galleries. Unfortunately, from my testing this Atom API does not include single image uploads. So, the Atom API may be out of the question for media importing.

Since Atom API media importing appears out of the question, I started looking at their XML-RPC documentation. To no surprise, they do not have a method for retrieving media through XML-RPC. Therefore, I’m left with one option: finding media through URLs during the import process.

Sadly, this option is not ideal. For one, it will add time to the already slow import process. Also, this option will force media to be imported at the time of the initial import, since the content can only easily be traversed during the post import. So, I’m not too pleased with my options.

Next Week

Next week I plan on starting to code the media importing during the import process. I plan on making this optional, since it will most likely add significant time to the import process. However, before I do that, I will go through my options one more time. If anyone thinks of any alternatives from now until then, I’d love to hear them.

Jul

4

Time is flying. The conclusion of this week marks the midway point of the Summer of Code, and the supposed ready for core date I set back in the beginning of summer. So, how did I stack up?

This Week

This week saw the addition of Atom URI detection based on the straight blog URL. To detect the Atom URI, I have to parse the HTML page for the RSD page, then parse the RSD page for the Atom URI. It’s a multi-step process, but the most reliable should TypePad ever change the Atom URI on me. In fact, I’d like to mention that every single URI (comments, paging, etc) is automatically detecting within the importer, so the importer should be URI future proof.

In addition to AtomPub URI detection, I added a progress bar this week. I thought this would be a huge process, but thankfully the coding was relatively simple. I learned some techniques to force PHP to dump the output buffer, essentially updating the page. Paired with some simple Javascript, I was able to create a fairly responsive progress bar. I chose this method over a completely Javascript method since the Javascript method would require additional time to run (I can explain this technically if anyone wants to know why).

Overall Status

Since we’re at the midway point, I thought now would be a great time to look back at the overall status. As I mentioned earlier in the post, I choose this date as the ready for core inclusion date. Since the release cycle for WordPress 2.6 has been pushed up, core inclusion will most likely be pushed back since the AtomPub importer will not be ready for 2.6. Regardless of the actual core inclusion, the deadlines have not changed in my mind.

According my requirements for core inclusion, by now the importer should be converting AtomPub data in to actual WordPress data. That’s occurring, so I’m definitely still on track.

The Second Half

So, where’s the importer going from here? Well, I plan to start working on the media importing portion of the importer next week. I’m figuring that may take at least two weeks. By then, WordPress 2.6 should be near release, so core inclusion can be considered once trunk goes bleeding again. After the importer is included in core, I can start getting real feedback from users. That should allow me to find and fix bugs, in addition to working on speed enhancements on a much larger scale.

The Importer In Action

I thought I would leave you this week with a look at the importer action. A few notes for the video:

  1. The first item takes a while to display since the first 20 posts need to be requested, plus two requests for comments, in additional to the standard trackback and draft checks.
  2. I would like to add a throbber to the in progress page to make the wait for the first post less painful.
  3. You’ll notice the importer semi-stall after 20 posts. This is because it needs to request another group of 20 posts.
  4. By the time the average WordPress user sees this, I would really love to see the importer work ever faster than this, and will strive the second half to make that happen.

Jun

27

Another week down, another step closer to a working AtomPub importer. Unfortunately, week 5 went anything but according to plan. Sure, I fixed the bugs found at the end of last week, but new issues came to light, requiring changes in the week’s plan.

New Issues Found

Additional testing early in the week by my mentor Lloyd brought forth some coding challenges. First off, Lloyd found a few error messages on import. Those were quickly resolved, but once Lloyd made it past the error messages, he found the performance of the importing to be subpar.

After adding some performance measurements to the importer, the source of the problem was revealed. The multiple requests of different feeds of data adds up over time. Essentially, for every post the importer needs to ping the post URL to check for a 404 (draft status), request the comments feed, and request the trackbacks XML-RPC data. Each post was taking over a second, quickly adding up over time.

Progress, Progress, Progress

Unfortunately, nothing can be done at this time to lower the request time; the feed requests are at the mercy of the internet. However, the notifications can be enhanced so a user is not wondering why the importer has not finished.

So, after discussing the issue with Lloyd, I think a progress bar is needed in this situation. Unfortunately, due to the nature of PHP applications, I can’t just add a progress bar out of nowhere. I will need to modularize the importer into a more AJAXy interface, so an AJAX progress bar can be updated with the import status. I will begin looking at solutions for this later in the upcoming week.

Even More Issues

The performance issues was not my only problem this week. Lloyd found that when importing from a blog with 3,000 entries, the importer ran out of memory. Surprisingly, it ran out of memory around 130MB, which would be crazy under a normal web server, given PHP is typically limited to 16MB of RAM.

Once this issue was pointed out to me, I quickly found the problem. I had been putting all entries in a massive array before looping through them to import. So, to correct this I limited the importer to batches of 20 posts at a time, freeing the memory between each set of posts. This appears to have corrected the problem.

In addition to the memory leak, I found out that the comments feed has the same 20 comments as time restriction that the main feed had. Already familiar with the issue for the main feed, I corrected that issue and all comments started to be imported.

Outlook Looks Good

Despite the massive amounts of issues discovered this week, I think the future of the importer is looking better than ever. Some major hurdles were overcome this week, and because of that, this week ends with a more memory efficient, error-free version of the importer.

With the new discoveries, obviously the plan has been changed a bit. Currently, I’m looking at finally (and yes, I mean finally) writing the code to automatically detect the Atom API feed at the beginning of next week. From there, I will begin working on updating the interface to be more AJAXy, providing notifications along the way.

Jun

20

Now completing the fourth week of coding, the AtomPub importer is finally starting to take shape. This week I managed to retrieve trackbacks, successfully start importing the previously retrieved data into WordPress, and added a user interface enhancement.

Trackbacks Are Go

If you remember last week, I had some difficulties getting trackbacks working. Well, thankfully that is no longer the case. Earlier this week Joseph Scott helped me figure out the code needed for accessing TypePad’s XML-RPC API. With this addition, all standard blog data is now being imported.


Importing is Go

After solving the trackback dilemma, I started working on actually importing the array’s retrieved from the AtomPub server. Since much of the import code is shared between other WordPress importers, this process went fairly quickly. I only needed to make a few minor adjustments on some code, and by Thursday arrays were becoming rows in WordPress’ database.

Notifications Are Go

After testing importing several times, a logical enhancement occurred to me. Occasionally the AtomPub server can take a while to respond and feed the data into WordPress. During this time, a user would be sitting at the initial page with no notification of activity other than the browser’s loading notification. This event could have raised suspicions that the page was not loading, when in fact everything was working perfectly. So, I added a small throbber and message text while the importing occurs. The enhancement is small, but it should bring a piece of mind to those getting antsy.

AtomPub Throbber

Bugs Are Go

What would a coding project be without bugs. Today, my mentor Lloyd notified me of several small bugs. It turns out the importer has been generating warning messages left and right. I’ve been oblivious to this since apparently MAMP had PHP error reporting disabled. I have since enabled error reporting, and will start fixing the small bugs early next week.

Unfortunately, a major bug was also discovered. Lloyd found that only 20 entries will import. Since I’ve been working with only a few entries at a time, I had not run into this problem yet. I suspect the issue is with the AtomPub server, and an additional request will be needed for each set entries over 20. I’ll know more next week when I take a closer look at the issue.

Ronald is coding …and he has a plan.

I hope someone gets the above reference. Anyway, as you may have guessed, the early part of next week will focus solely on tackling the bugs found today. Hopefully the issues will not be problematic so I can begin the next task: allowing users to select an author to import all entries under. Once the author override code is committed, the next step on my coding agenda is to finally write the code to automatically the detect the AtomPub URL based on the website’s blog address. Those three items should keep me busy next week, and as always, I’ll let you know next week how the coding went.

Jun

15

Alright, that is a sensationalist title, but I needed a strong title to show my hypocrisy. Today, I have made the switch from Safari 3 to Firefox 3. I have realized despite the numerous advantages Safari has with direct operating system integration, Firefox still wins out feature-wise. To help make my decision, I made several lists of the advantages and disadvantages that matter to me. Below are those lists.


Advantages of Safari

Disadvantages of Safari

Advantages of Firefox

Disadvantages of Firefox

Looking over the lists, Safari’s advantages are mostly in the interface, while the disadvantages can quickly become show stoppers. For Firefox, the advantages are in the features, while the disadvantages are only minor quibbles. When you enumerate the features, Firefox wins hands down - at least for me.

So, my final ruling is Firefox wins this browser round. If Safari 4 can manage to fix the Flash freezes and remember cookies, Safari has a good chance of winning round 4. Until then, Firefox will remain my browser of choice.

Advertisement: Advertise Your Site in This Space

Jun

13

Week 3 of the Summer of Code has been by far the most productive week yet. The main focus of this week was to parse the AtomPub data into a PHP array, and I’m pleased to say this was a success.

The XML Parser

I started the week off by writing the custom XML parser I talked about last week. To do this, I researched several different methods for utilizing PHP’s xml_parse function. Since the parsing occured on an established standard where the tag names will not change on me, I decided to parse the tags based on a tag name switch. This worked well until I started running into nested tags. Although, that problem was quickly resolved with the use of a few class variables.


Missing Data

Once the XML was in a parsed array, I began looking over the array and envisioned how this data would import into WordPress. During this process, I realized the AtomPub feed was missing two bits of key data: the draft status of posts and a list of trackbacks. I immediately began looking into possible workarounds.

While I investigated solutions, my mentor Lloyd discussed the missing data with Six Apart. We were assured support for app:draft was on their todo list, but they did not commit to any date for availability. So, Lloyd gave me the go ahead for workarounds.

To solve the draft problem, I ended up creating a 404 checker. Assuming that drafts will not be published, the URL for the post should result in a 404. Knowing this, while posts are imported I loop through the URLs and check the HTTP status codes. The workaround certainly isn’t the best as it’s resource intensive, but for the time being it works.

After fixing the draft problem, I looked into solutions for the missing trackbacks. I found this function on TypePad’s XML-RPC developer site, however, attempts to implement the function call have failed me. So, I continued to search for alternatives.

I found out today that Movable Type has a hidden RSS feed for trackbacks. I tested this and indeed is it true. Unfortunately, TypePad does not seem to have this feed. My guess is this is because of their premium pricing model, removing support for additional and custom templates in the lower tiers. If anyone happens to know the super-secret URL for a post’s trackbacks on TypePad, I would love to know, but I truly believe that feed does not exist.

Therefore, the search for a trackback solution continues. For the time being this is being put on the back-burner. When I get some free time over the next couple of weeks or during the second half of the Summer of Code I will revisit this problem, but at the moment trackback support is being forgone.

Next Week’s Plan

So, what’s up for next week? Early next week I plan on working on the actual importing of data into WordPress. All the arrays are prepped and the functions are ready, so the importing process should go fairly quickly. I’m actually anticipating finishing up the import code by the middle of next week, but if things don’t go to plan I have until the end of the following week. Should I finish early, I will revisit some of the priority two items accumulated over the past few weeks. With a little luck, next week will bring a functioning importer with some additional fixes.

Jun

6

While I didn’t blog about last week’s status, significant progress has been made in parsing TypePad and Movable Type AtomPub feeds (well, parsing TypePad feeds). This week started off by completing more research on the AtomPub spec. In order to parse an AtomPub feed, I had to learn about X-WSSE authentication. From there, I found a great X-WSSE class that I included in my test version of WordPress. Then, the fun began.

Movable Type Hates Me

Almost immediately, I was retrieving the RAW XML of TypePad’s AtomPub feed. Unfortunately, I could not say the same for Movable Type. Due to some server configuration issues on my end or possibly a bug in Movable Type, I am unable to retrieve Movable Type AtomPub feeds at the moment. I’ve tried various methods of parsing the feed and each method returns the same cryptographic error message. I’ve called in the experts (my mentor, Lloyd) to help me, but if anyone has any clue as to why Movable Type hates me, I would appreciate the feedback.


Parsing the Feed

Regardless of the error, I kept on trucking with the TypePad AtomPub feed, knowing that Movable Type will fall in line once I can figure out my retrieval problems. Working with TypePad’s feed, I started trying different methods of Atom parsing. I first tried WordPress’ built in Magpie parser, but due to Magpie not supporting Atom 1.0, that was little help. I then tried some code snippets on php.net, but unfortunately none of the snippets parsed in the manner I required. So, I started writing my own basic XML parser.

That’s where I stand today. This coming week I will continue writing my custom XML parser, completing the parser by the end of next week. The goal is to have all AtomPub data in array, so I certainly have my work cut out for me.

May

26

Google’s Summer of Code 2008 is officially underway! This year I am working on creating an AtomPub-based content importer for WordPress. The goal is to import entries and other content from Movable Type and TypePad into WordPress in as few clicks as possible.

Since the AtomPub spec (RFC 5023) is so new, this should prove to be an interesting summer. I will be one of the first to implement a real world use of AtomPub, and I suspect documentation will be scarce. Regardless, I am up for the challenge and can’t wait to see how the end product turns out.

If you’re interested in the progress of this project, just stayed tuned to this blog. I will be blogging weekly updates on my progress, so you will always know where I stand.


Blogroll

Sponsors