| Trees | Indices | Help |
|
|---|
|
|
Ultra-liberal RSS parser #!/usr/bin/python Visit http://diveintomark.org/projects/rss_parser/ for the latest version Handles RSS 0.9x and RSS 1.0 feeds RSS 0.9x elements: - title, link, description, webMaster, managingEditor, language copyright, lastBuildDate, pubDate RSS 1.0 elements: - dc:rights, dc:language, dc:creator, dc:date, dc:subject, content:encoded Things it handles that choke other RSS parsers: - bastard combinations of RSS 0.9x and RSS 1.0 (most Movable Type feeds) - illegal XML characters (most Radio feeds) - naked and/or invalid HTML in description (The Register) - content:encoded in item element (Aaron Swartz) - guid in item element (Scripting News) - fullitem in item element (Jon Udell) - non-standard namespaces (BitWorking) Requires Python 2.2 or later
Version:
2.3.1
Author:
Mark Pilgrim (mark@diveintomark.org)
Copyright:
Copyright 2003, Mark Pilgrim
License:
GPL
|
|||
| RSSParser | |||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
__contributors__ =
|
|||
__history__ =
|
|||
USER_AGENT =
|
|||
short_weekdays =
|
|||
long_weekdays =
|
|||
months =
|
|||
TEST_SUITE =
|
|||
|
|||
URI, filename, or string --> stream This function lets you define parsers that take any input source (URL, pathname to local or network file, or actual data as a string) and deal with it in a uniform manner. Returned object is guaranteed to have all the basic stdio read methods (read, readline, readlines). Just .close() the object when you're done with it. If the etag argument is supplied, it will be used as the value of an If-None-Match request header. If the modified argument is supplied, it must be a tuple of 9 integers as returned by gmtime() in the standard Python time module. This MUST be in GMT (Greenwich Mean Time). The formatted date/time will be used as the value of an If-Modified-Since request header. If the agent argument is supplied, it will be used as the value of a User-Agent request header. If the referrer argument is supplied, it will be used as the value of a Referer[sic] request header. The optional arguments are only used if the source argument is an HTTP URL and the urllib2 module is importable (i.e., you must be using Python version 2.0 or higher). |
Get the ETag associated with a response returned from a call to open_resource(). If the resource was not returned from an HTTP server or the server did not specify an ETag for the resource, this will return None. |
Get the Last-Modified timestamp for a response returned from a call to open_resource(). If the resource was not returned from an HTTP server or the server did not specify a Last-Modified timestamp, this function will return None. Otherwise, it returns a tuple of 9 integers as returned by gmtime() in the standard Python time module(). |
Formats a tuple of 9 integers into an RFC 1123-compliant timestamp as required in RFC 2616. We don't use time.strftime() since the %a and %b directives can be affected by the current locale (HTTP dates have to be in English). The date MUST be in GMT (Greenwich Mean Time). |
match(string[, pos[, endpos]]) --> match object or None. Matches zero or more characters at the beginning of the string |
match(string[, pos[, endpos]]) --> match object or None. Matches zero or more characters at the beginning of the string |
match(string[, pos[, endpos]]) --> match object or None. Matches zero or more characters at the beginning of the string |
Parses any of the three HTTP date formats into a tuple of 9 integers as returned by time.gmtime(). This should not use time.strptime() since that function is not available on all platforms and could also be affected by the current locale. |
|
|||
__history__
|
USER_AGENT
|
short_weekdays
|
long_weekdays
|
months
|
TEST_SUITE
|
| Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0beta1 on Tue Jul 1 22:03:40 2008 | http://epydoc.sourceforge.net |