The nature of syndicated data on the web is such that quality and correctness is oftentimes (nearly invariably) uneven. The RSS specs are themselves rather murky, and even the best of sites will push out the occasional unescaped entity or improper encoding.
As seems to be its natural inclination, uPortal completely ignores this reality and completely barfs when encountering any hint of irregularity. uPortal parses RSS via its XSLT channel utilizing Xalan-J, where “error recovery” means throwing an exception, dying, and spewing an ugly error at the user.
By and large most commonly run into error is character encoding issues. The uPortal channel, expecting XML, defaults to UTF-8 when encoding is left unspecified. If there are multi-byte characters, you’re screwed. My solution, that so far has fixed all the feeds that we’re currently ingesting is a two parter, using a Python first stage, and a PHP second stage. Although in most cases, you’d want to combine it into one (the Python code, probably), we’re running the two-parter because the latter code came first and zis used for other purposes.
(If you’re using uPortal: performance isn’t an issue because the channel gets cached by default for 20m. Be sure though to check that your version of the XSLT channel has my caching patch applied. There was a 3 year old caching bug that caused the channel not to cache for guest layouts and inefficiently for logged in users).
And that’s that. Ta-da! The Aristocrats!