|
What is RSS?
by Mark Pilgrim
December 18, 2002
RSS란 무엇인가?
원본문서 : http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
--------------------------------------------------------
RSS is a format for syndicating news and the content of news-like sites, including major news sites like Wired, news-oriented community sites like Slashdot, and personal weblogs. But it's not just for news. Pretty much anything that can be broken down into discrete items can be syndicated via RSS: the 'recent changes' page of a wiki, a changelog of CVS checkins, even the revision history of a book. Once information about each item is in RSS format, an RSS-aware program can check the feed for changes and react to the changes in an appropriate way.
RSS는 Wired 같은 대규모 뉴스 사이트 , Slashdot 같은 뉴스 위주의 커뮤니티 사이트, 퍼스널 웹로그 등의 뉴스와 컨텐츠 배급(syndicating)을 위한 포멧입니다. 하지만, 뉴스만을 위한 것은 아닙니다. 위키 페이지의 '최신글 목록', CVS 체크인을 위한 체인지 로그, 리버전된 자료의 이력 등도 RSS를 통해 배급 될 수 있습니다. 한번 RSS로 작성된 정보는, 변경된 내용이 RSS-어웨어 프로그램(RSS-aware program)에 의해 체크 될 수 있습니다.
-----------------------------
- syndicate
<기사 등을> 동시에 많은 신문잡지에 배급하다
- via
~을 거쳐, ~을 경유하여(by way of)
- feed
여물, 먹이, 사료(fodder)
기계 급송(給送) (장치), 공급 재료; 공급, 급수
- aware
알아차리고, 깨닫고 ((of, that)), 을 알고(knowing)
- appropriate
적당한, 적절한, 알맞은, 어울리는 ((to, for))
-----------------------------
RSS-aware programs called news aggregators are popular in the weblogging community. Many weblogs make content available in RSS. A news aggregator can help you keep up with all your favorite weblogs by checking their RSS feeds and displaying new items from each of them.
RSS-어웨어 프로그램이라 불리는 뉴스 수집기(aggregators)는 웹로깅 커뮤니티에서 인기있습니다. 많은 웹로그들은 RSS 를 써서 만들어 졌습니다. 뉴스 수집기(aggregators)는 당신의 즐겨찾기(favorite) 웹로그에서 RSS 피드(feeds)를 체크해서 새로운 아이템을 디스플레이 하는데 도움이 될 수 있습니다.
-----------------------------
- aggregate
[문어] 집합(체)
-----------------------------
A brief history
역사개요
But coders beware. The name 'RSS' is an umbrella term for a format that spans several different versions of at least two different (but parallel) formats. The original RSS, version 0.90, was designed by Netscape as a format for building portals of headlines to mainstream news sites. It was deemed overly complex for its goals; a simpler version, 0.91, was proposed and subsequently dropped when Netscape lost interest in the portal-making business. But 0.91 was picked up by another vendor, UserLand Software, which intended to use it as the basis of its weblogging products and other web-based writing software.
하지만 조심할 것은 'RSS' 는 병렬적인 두가지 다른 최종 버전을 함께 쓰이고 있다는 점입니다. 오리지널 RSS, 버전 0.90은 포털사이트의 메인스트림 뉴스사이트의 헤드라인을 위한 포멧으로 넷스케이프에서 디자인 되었습니다. 0.90 버전의 간략화 버전인 0.91 은 넷스케이프가 포털 마케팅에 흥미를 잃게 되어 중단되었다. 하지만, 0.91 은 웹로깅 제품과 웹베이스 저작 소프트 웨어를 계획하던 유저랜드 소프트웨어에 의해 픽업되었다.
-----------------------------
- beware
조심하다, 경계하다
- deem
(~으로) 생각하다, 간주하다(consider)
- intended
의도된, 계획된, 고의의
-----------------------------
In the meantime, a third, non-commercial group split off and designed a new format based on what they perceived as the original guiding principles of RSS 0.90 (before it got simplified into 0.91). This format, which is based on RDF, is called RSS 1.0. But UserLand was not involved in designing this new format, and, as an advocate of simplifying 0.90, it was not happy when RSS 1.0 was announced. Instead of accepting RSS 1.0, UserLand continued to evolve the 0.9x branch, through versions 0.92, 0.93, 0.94, and finally 2.0.
세번째, 비상업적 그룹으로 분리되어 디자인된 오리지널 RSS 0.90(0.91로 단순화 되기 이전)의 원리에 기초한 새로운 포맷입니다. RSS 0.90 RDF 에 기초한 포멧은 RSS 1.0 이라고 불리고 있습니다. 하지만, 유저랜드는 포함되지 않습니다. 1.0 이 발표되었을 때 해피하지는 않았습니다. RSS 1.0 은 승인되지 않았고, 유저랜드는 0.92, 0.93, 0.94, 최종으로 2.0까지 0.9X 버전의 개발을 계속했습니다.
-----------------------------
- mean solar time 천문 평균 태양시, 평균시(mean time)
- perceive
이해하다; 깨닫다, <의미진상 등을> 파악하다
- split
<당 등이> 분열하다 ((up)); 에서 분리하다 ((away, off)); 사이가 나빠지다; 이혼하다, 헤어지다
- involve
관계[관련]시키다 ((in, with))
- advocate
- Instead
mess
1 혼란, 뒤죽박죽, 엉망진창
-----------------------------
What a mess.
혼란의 원인
So which one do I use?
어떤 버전을 사용 할 것인가?
That's 7 -- count 'em, 7! -- different formats, all called 'RSS'. As a coder of RSS-aware programs, you'll need to be liberal enough to handle all the variations. But as a content producer who wants to make your content available via syndication, which format should you choose?
RSS 는 7가지의 다른 버전이 있다. RSS 어웨어 프로그램의 코더들과 같이 , 모든 변형들을 자유롭게 선택할 수 있다. 하지만 컨텐츠 프로듀서들은 당신의 컨텐츠를 배급(syndication) 할수 있기를 원할 것이다. 어떤 포멧을 사용 할 것인가?
RSS versions and recommendations
RSS 버전과 리코멘데이션
Version | Owner | Pros | Status | Recommendation
버전 | 오너 | 진행(progress) | 상태 | 권고
0.90 | Netscape | - | Obsoleted by 1.0 | Don't use
0.90 | 넷스케이프 | - | 1.0 에 의해 중단 | 사용되지 않음
0.91 | UserLand | Drop dead simple | Officially obsoleted by 2.0, but still quite popular | Use for basic syndication. Easy migration path to 2.0 if you need more flexibility
0.91 | 유저랜드 | 간략화 됨 | 2.0 에 의해 중단 되었지만, 여전히 많이 쓰인다. | 기본적인 배급(syndication)을 위해사용. 2.0으로 변경 용이
0.92, 0.93, 0.94 | UserLand | Allows richer metadata than 0.91 | Obsoleted by 2.0 | Use 2.0 instead
0.92, 0.93, 0.94 | 유저랜드 | 0.91의 상세화 | 2.0 으로 중단 | 2.0으로 대치
1.0 | RSS-DEV Working Group | RDF-based, extensibility via modules, not controlled by a single vendor | Stable core, active module development | Use for RDF-based applications or if you need advanced RDF-specific modules
1.0 | RSS 개발그룹 | RDF 기반, 모듈을 통한 확장, 단일 벤더에 종속적이지 않음 | 안정화 코어, 모듈개발 진행중 | RDF 기반 애플리케이션에사용. 진보된 RDF 특정 모듈이 필요할때 사용.
2.0 | UserLand | Extensibility via modules, easy migration path from 0.9x branch | Stable core, active module development | Use for general-purpose, metadata-rich syndication
2.0 | 유저랜드 | 모듈을 통한 확장 0.9X 계열에서 변경 용이 | 안정화 코어, 모듈개발 진행중 | 일반적 목적에 사용, 메타데이터 리치 신디케이드
---------------------------------
- progress
진행, 전진, 진척
- Obsolete
쓸모 없게 된, 안 쓰이는
- instead
그 대신에; 그 보다도
- migration
이주, 이동
- flexibility
적응성, 융통성, 탄력성
- extensibility
신장성(伸長性), 연장성, 확장 가능성
- Stable
1 안정된; 견실한, 동요하지 않는, 고정된
2 지속성[영구성]이 있는
---------------------------------
What does RSS look like?
Imagine you want to write a program that reads RSS feeds, so that you can publish headlines on your site, build your own portal or homegrown news aggregator, or whatever. What does an RSS feed look like? That depends on which version of RSS you're talking about. Here's a sample RSS 0.91 feed (adapted from XML.com's RSS feed):
RSS를 이용한 프로그램을 만든다고 생각해 봅시다. 당신의 사이트의 헤드라인을 어떻게 생성 할까요? 또 자신의 포털사이트, 홈그라운드 뉴스 어그리게이터를 어떻게 만들수 있을까요? RSS는 어떤 모양으로 보여질까요? 이러한 것은 당신이 RSS의 어떤 버전을 선택하느냐에 달려있습니다. 다음은 간단한 RSS 0.91 피드(feeds) 입니다.
---------------------
- variations
변화(change), 변동
- liberal
자유주의의
관대한, 도량이 넓은, 개방적인, 편견 없는
---------------------
http://www.xml.com/
http://www.xml.com/pub/a/2002/12/04/normalizing.html
http://www.xml.com/pub/a/2002/12/04/som.html
http://www.xml.com/pub/a/2002/12/04/svg.html
Simple, right? A feed comprises a channel, which has a title, link, description, and (optional) language, followed by a series of items, each of which have a title, link, and description.
간단한가요? 피드(feeds)는 channel로 구성된다. channel 은 title, link, description, (optional) language 로 구성되며, channel 의 또 다른 구성요소인 item 은 title, link, description 으로 구성된다.
------------------------------------
- comprise
포함하다; 의미하다; <전체가 부분으로> 이루어지다, 구성되다(consist of)
------------------------------------
Now look at the RSS 1.0 version of the same information:
이번에는 같은 정보를 RSS 1.0 버전으로 표현해 봅시다.
xmlns='http://purl.org/rss/1.0/'
xmlns:dc='http://purl.org/dc/elements/1.1/'>
http://www.xml.com/
http://www.xml.com/pub/a/2002/12/04/normalizing.html
http://www.xml.com/pub/a/2002/12/04/som.html
http://www.xml.com/pub/a/2002/12/04/svg.html
Quite a bit more verbose. People familiar with RDF will recognize this as an XML serialization of an RDF document; the rest of the world will at least recognize that we're syndicating essentially the same information. In fact, we're including a bit more information: item-level authors and publishing dates, which RSS 0.91 does not support.
RDF 문서의 XML 직렬화(serialization)와 같이 RDF 와 함께 사람들에게 알려졌다. 본질적으로 배급되는 정보는 같다. 사실, 더 많은 정보를 포함 해야한다. : item 레벨의 authors, publishing dates는 0.91에서는 제공되지 않는다.
------------------------
- verbose
말 수가 많은, 다변의; 용장(冗長)한, 장황한(wordy)
- familiar
잘 알고 있는, 정통한 ((with))
- serialization
연속물로서 연재[출판, 방송, 상영]하다
- essentially
본질적으로, 본질상(in essence); 본래
- recognize
인정하다(acknowledge), 인지하다, 알아보다
------------------------
---------------------------------------------------------------------------
원본문서 : http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html?page=2
Despite being RDF/XML, RSS 1.0 is structurally similar to previous versions of RSS -- similar enough that we can simply treat it as XML and write a single function to extract information out of either an RSS 0.91 or RSS 1.0 feed. However, there are some significant differences that our code will need to be aware of:
The root element is rdf:RDF instead of rss. We'll either need to handle both explicitly or just ignore the name of the root element altogether and blindly look for useful information inside it.
RSS 1.0 uses namespaces extensively. The RSS 1.0 namespace is http://purl.org/rss/1.0/, and it's defined as the default namespace. The feed also uses http://www.w3.org/1999/02/22-rdf-syntax-ns# for the RDF-specific elements (which we'll simply be ignoring for our purposes) and http://purl.org/dc/elements/1.1/ (Dublin Core) for the additional metadata of article authors and publishing dates.
We can go in one of two ways here: if we don't have a namespace-aware XML parser, we can blindly assume that the feed uses the standard prefixes and default namespace and look for item elements and dc:creator elements within them. This will actually work in a large number of real-world cases; most RSS feeds use the default namespace and the same prefixes for common modules like Dublin Core. This is a horrible hack, though. There's no guarantee that a feed won't use a different prefix for a namespace (which would be perfectly valid XML and RDF). If or when it does, we'll miss it.
If we have a namespace-aware XML parser at our disposal, we can construct a more elegant solution that handles both RSS 0.91 and 1.0 feeds. We can look for items in no namespace; if that fails, we can look for items in the RSS 1.0 namespace. (Not shown, but RSS 0.90 feeds also use a namespace, but not the same one as RSS 1.0. So what we really need is a list of namespaces to search.)
Less obvious but still important, the item elements are outside the channel element. (In RSS 0.91, the item elements were inside the channel. In RSS 0.90, they were outside; in RSS 2.0, they're inside. Whee.) So we can't be picky about where we look for items.
Finally, you'll notice there is an extra items element within the channel. It's only useful to RDF parsers, and we're going to ignore it and assume that the order of the items within the RSS feed is given by their order of the item elements.
But what about RSS 2.0? Luckily, once we've written code to handle RSS 0.91 and 1.0, RSS 2.0 is a piece of cake. Here's the RSS 2.0 version of the same feed:
RSS 2.0 은 어떨까? 다행스럽게도 RSS 0.91과 1.0, RSS 2.0의 차이는 크지 않다.다음은 RSS 2.0 버전의 간단한 피드(feeds)이다.
http://www.xml.com/
http://www.xml.com/pub/a/2002/12/04/normalizing.html
http://www.xml.com/pub/a/2002/12/04/som.html
http://www.xml.com/pub/a/2002/12/04/svg.html
As this example shows, RSS 2.0 uses namespaces like RSS 1.0, but it's not RDF. Like RSS 0.91, there is no default namespace and items are back inside the channel. If our code is liberal enough to handle the differences between RSS 0.91 and 1.0, RSS 2.0 should not present any additional wrinkles.
예제에서 보는 것 처럼, RSS 2.0은 RSS 1.0와 유사한 네임스페이스를 사용합니다. 하지만, RDF 는 아닙니다. RSS 0.91 과 같이 2.0에서는 디펄트 네임스페이스가 없고, item은 channel에 포함됩니다.
-------------------------------------
wrinkle
주름, 잔주름, (천 등의) 구김살, 주름살
--------------------------------------
How can I read RSS?
RSS 읽기
Now let's get down to actually reading these sample RSS feeds from Python. The first thing we'll need to do is download some RSS feeds. This is simple in Python; most distributions come with both a URL retrieval library and an XML parser. (Note to Mac OS X 10.2 users: your copy of Python does not come with an XML parser; you will need to install PyXML first.)
그럼 RSS의 간단한 예제를 파이선에서 실제 다운 받아봅시다. 먼저 RSS 에제를 다운로드 해야 할 것입니다. 파이선에서는 대부분의 URL 관련 라이브러리와 XML 파서를 쉽게 이용할 수 있습니다. (Mac OS X 10.2 유저 노트 : 파이선과 XML 파서를 함께 설치해서 안됩니다. PyXML 먼저 설치해야합니다.)
파이선코드부분
-----------------------------------------
from xml.dom import minidom
import urllib
def load(rssURL):
return minidom.parse(urllib.urlopen(rssURL))
--------------------------------------------
This takes the URL of an RSS feed and returns a parsed representation of the DOM, as native Python objects.
다음은 RSS 피드(feeds)의 DOM 의
네이티브 파이선 오브젝트와 같은
The next bit is the tricky part. To compensate for the differences in RSS formats, we'll need a function that searches for specific elements in any number of namespaces. Python's XML library includes a getElementsByTagNameNS which takes a namespace and a tag name, so we'll use that to make our code general enough to handle RSS 0.9x/2.0 (which has no default namespace), RSS 1.0 and even RSS 0.90. This function will find all elements with a given name, anywhere within a node. That's a good thing; it means that we can search for item elements within the root node and always find them, whether they are inside or outside the channel element.
파이선코드부분
-----------------------------------------
DEFAULT_NAMESPACES = \
(None, # RSS 0.91, 0.92, 0.93, 0.94, 2.0
'http://purl.org/rss/1.0/', # RSS 1.0
'http://my.netscape.com/rdf/simple/0.9/' # RSS 0.90
)
def getElementsByTagName(node, tagName, possibleNamespaces=DEFAULT_NAMESPACES):
for namespace in possibleNamespaces:
children = node.getElementsByTagNameNS(namespace, tagName)
if len(children): return children
return []
-----------------------------------------
Finally, we need two utility functions to make our lives easier. First, our getElementsByTagName function will return a list of elements, but most of the time we know there's only going to be one. An item only has one title, one link, one description, and so on. We'll define a first function that returns the first element of a given name (again, searching across several different namespaces). Second, Python's XML libraries are great at parsing an XML document into nodes, but not that helpful at putting the data back together again. We'll define a textOf function that returns the entire text of a particular XML element.
마지막으로
getElementsByTagName 함수는 엘리먼트 리스트를 리턴할 것이다. 하지만, 대부분의 item 은 title, link, description, 등을 하나씩만 가지고 있을 수도 있다.
파이선코드부분
-----------------------------------------
def first(node, tagName, possibleNamespaces=DEFAULT_NAMESPACES):
children = getElementsByTagName(node, tagName, possibleNamespaces)
return len(children) and children[0] or None
def textOf(node):
return node and ''.join([child.data for child in node.childNodes]) or ''
That's it. The actual parsing is easy. We'll take a URL on the command line, download it, parse it, get the list of items, and then get some useful information from each item:
DUBLIN_CORE = ('http://purl.org/dc/elements/1.1/',)
if __name__ == '__main__':
import sys
rssDocument = load(sys.argv[1])
for item in getElementsByTagName(rssDocument, 'item'):
print 'title:', textOf(first(item, 'title'))
print 'link:', textOf(first(item, 'link'))
print 'description:', textOf(first(item, 'description'))
print 'date:', textOf(first(item, 'date', DUBLIN_CORE))
print 'author:', textOf(first(item, 'creator', DUBLIN_CORE))
print
-----------------------------------------
Running it with our sample RSS 0.91 feed prints only title, link, and description (since the feed didn't include any other information on dates or authors):
파이선코드부분
-----------------------------------------
$ python rss1.py http://www.xml.com/2002/12/18/examples/rss091.xml.txt
title: Normalizing XML, Part 2
link: http://www.xml.com/pub/a/2002/12/04/normalizing.html
description: In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.
date:
author:
title: The .NET Schema Object Model
link: http://www.xml.com/pub/a/2002/12/04/som.html
description: Priya Lakshminarayanan describes in detail the use of the .NET Schema Object Model for programmatic manipulation of W3C XML Schemas.
date:
author:
title: SVG's Past and Promising Future
link: http://www.xml.com/pub/a/2002/12/04/svg.html
description: In this month's SVG column, Antoine Quint looks back at SVG's journey through 2002 and looks forward to 2003.
date:
author:
-----------------------------------------
For both the sample RSS 1.0 feed and sample RSS 2.0 feed, we also get dates and authors for each item. We reuse our custom getElementsByTagName function, but pass in the Dublin Core namespace and appropriate tag name. We could reuse this same function to extract information from any of the basic RSS modules. (There are a few advanced modules specific to RSS 1.0 that would require a full RDF parser, but they are not widely deployed in public RSS feeds.)
Here's the output against our sample RSS 1.0 feed:
파이선코드부분
-----------------------------------------
$ python rss1.py http://www.xml.com/2002/12/18/examples/rss10.xml.txt
title: Normalizing XML, Part 2
link: http://www.xml.com/pub/a/2002/12/04/normalizing.html
description: In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.
date: 2002-12-04
author: Will Provost
title: The .NET Schema Object Model
link: http://www.xml.com/pub/a/2002/12/04/som.html
description: Priya Lakshminarayanan describes in detail the use of the .NET Schema Object Model for programmatic manipulation of W3C XML Schemas.
date: 2002-12-04
author: Priya Lakshminarayanan
title: SVG's Past and Promising Future
link: http://www.xml.com/pub/a/2002/12/04/svg.html
description: In this month's SVG column, Antoine Quint looks back at SVG's journey through 2002 and looks forward to 2003.
date: 2002-12-04
author: Antoine Quint
-----------------------------------------
Running against our sample RSS 2.0 feed produces the same results.
This technique will handle about 90% of the RSS feeds out there; the rest are ill-formed in a variety of interesting ways, mostly caused by non-XML-aware publishing tools building feeds out of templates and not respecting basic XML well-formedness rules. Next month we'll tackle the thorny problem of how to handle RSS feeds that are almost, but not quite, well-formed XML.
----------------------------------
Related resources
Sample RSS feeds: RSS 0.91, RSS 1.0, RSS 2.0.
rss1.py
Specifications: RSS 0.90, RSS 0.91, RSS 1.0, RSS 2.0.
Syndic8.com, a directory of 10,000 publicly available RSS feeds.
News Readers in the Open Directory, a variety of client-side and server-side programs for reading RSS feeds.
-----------------------------------