Metadata-Version: 1.0
Name: topia.termextract
Version: 1.1.0
Summary: Content Term Extraction using POS Tagging
Home-page: http://pypi.python.org/pypi/topia.termextract
Author: Stephan Richter, Russ Ferriday and the Zope Community
Author-email: zope3-dev@zope.org
License: ZPL 2.1
Description: This package determines important terms within a given piece of content. It
        uses linguistic tools such as Parts-Of-Speech (POS) and some simple
        statistical analysis to determine the terms and their strength.
        
        
        Detailed Documentation
        **********************
        
        ===============
        Term Extraction
        ===============
        
        This package implements text term extraction by making use of a simple
        Parts-Of-Speech (POS) tagging algorithm.
        
        http://bioie.ldc.upenn.edu/wiki/index.php/Part-of-Speech
        
        
        The POS Tagger
        --------------
        
        POS Taggers use a lexicon to mark words with a tag. A list of available tags
        can be found at:
        
        http://bioie.ldc.upenn.edu/wiki/index.php/POS_tags
        
        Since words can have multiple tags, the determination of the correct tag is
        not always simple. This implementation, however, does not try to infer
        linguistic use and simply chooses the first tag in the lexicon.
        
        >>> from topia.termextract import tag
        >>> tagger = tag.Tagger()
        >>> tagger
        <Tagger for english>
        
        To get the tagger ready for its work, we need to initialize it. In this
        implementation the lexicon is loaded.
        
        >>> tagger.initialize()
        
        Now we are ready to rock and roll.
        
        Tokenizing
        ~~~~~~~~~~
        
        The first step of tagging is to tokenize the text into terms.
        
        >>> tagger.tokenize('This is a simple example.')
        ['This', 'is', 'a', 'simple', 'example', '.']
        
        While most tokenizers ignore punctuation, it is important for us to keep it,
        since we need it later for the term extraction. Let's now look at some more
        complex cases:
        
        - Quoted Text
        
        >>> tagger.tokenize('This is a "simple" example.')
        ['This', 'is', 'a', '"', 'simple', '"', 'example', '.']
        
        >>> tagger.tokenize('"This is a simple example."')
        ['"', 'This', 'is', 'a', 'simple', 'example', '."']
        
        - Non-letters within words.
        
        >>> tagger.tokenize('Parts-Of-Speech')
        ['Parts-Of-Speech']
        
        >>> tagger.tokenize('amazon.com')
        ['amazon.com']
        
        >>> tagger.tokenize('Go to amazon.com.')
        ['Go', 'to', 'amazon.com', '.']
        
        - Various punctuation.
        
        >>> tagger.tokenize('Quick, go to amazon.com.')
        ['Quick', ',', 'go', 'to', 'amazon.com', '.']
        
        >>> tagger.tokenize('Live free; or die?')
        ['Live', 'free', ';', 'or', 'die', '?']
        
        - Tolerance to incorrect punctuation.
        
        >>> tagger.tokenize('Hi , I am here.')
        ['Hi', ',', 'I', 'am', 'here', '.']
        
        - Possessive structures.
        
        >>> tagger.tokenize("my parents' car")
        ['my', 'parents', "'", 'car']
        >>> tagger.tokenize("my father's car")
        ['my', 'father', "'s", 'car']
        
        - Numbers.
        
        >>> tagger.tokenize("12.4")
        ['12.4']
        >>> tagger.tokenize("-12.4")
        ['-12.4']
        >>> tagger.tokenize("$12.40")
        ['$12.40']
        
        - Dates.
        
        >>> tagger.tokenize("10/3/2009")
        ['10/3/2009']
        >>> tagger.tokenize("3.10.2009")
        ['3.10.2009']
        
        Okay, that's it.
        
        
        Tagging
        -------
        
        The next step is tagging. Tagging is done in two phases. During the first
        phase terms are assigned a tag by looking at the lexicon and the normalized
        form is set to the term itself. In the second phase, a set of rules is applied
        to each tagged term and the tagging and normalization is tweaked.
        
        >>> tagger('This is a simple example.')
        [['This', 'DT', 'This'],
        ['is', 'VBZ', 'is'],
        ['a', 'DT', 'a'],
        ['simple', 'JJ', 'simple'],
        ['example', 'NN', 'example'],
        ['.', '.', '.']]
        
        So wow, this determination was dead on. Let's try a plural form noun and see
        what happens:
        
        >>> tagger('These are simple examples.')
        [['These', 'DT', 'These'],
        ['are', 'VBP', 'are'],
        ['simple', 'JJ', 'simple'],
        ['examples', 'NNS', 'example'],
        ['.', '.', '.']]
        
        So far so good. Let's test a few more cases:
        
        >>> tagger("The fox's tail is red.")
        [['The', 'DT', 'The'],
        ['fox', 'NN', 'fox'],
        ["'s", 'POS', "'s"],
        ['tail', 'NN', 'tail'],
        ['is', 'VBZ', 'is'],
        ['red', 'JJ', 'red'],
        ['.', '.', '.']]
        
        >>> tagger("The fox can't really jump over the fox's tail.")
        [['The', 'DT', 'The'],
        ['fox', 'NN', 'fox'],
        ['can', 'MD', 'can'],
        ["'t", 'RB', "'t"],
        ['really', 'RB', 'really'],
        ['jump', 'VB', 'jump'],
        ['over', 'IN', 'over'],
        ['the', 'DT', 'the'],
        ['fox', 'NN', 'fox'],
        ["'s", 'POS', "'s"],
        ['tail', 'NN', 'tail'],
        ['.', '.', '.']]
        
        Rules
        ~~~~~
        
        - Correct Default Noun Tag
        
        >>> tagger('Ikea')
        [['Ikea', 'NN', 'Ikea']]
        >>> tagger('Ikeas')
        [['Ikeas', 'NNS', 'Ikea']]
        
        - Verify proper nouns at beginning of sentence.
        
        >>> tagger('. Police')
        [['.', '.', '.'], ['police', 'NN', 'police']]
        >>> tagger('Police')
        [['police', 'NN', 'police']]
        >>> tagger('. Stephan')
        [['.', '.', '.'], ['Stephan', 'NNP', 'Stephan']]
        
        - Determine Verb after Modal Verb
        
        >>> tagger('The fox can jump')
        [['The', 'DT', 'The'],
        ['fox', 'NN', 'fox'],
        ['can', 'MD', 'can'],
        ['jump', 'VB', 'jump']]
        >>> tagger("The fox can't jump")
        [['The', 'DT', 'The'],
        ['fox', 'NN', 'fox'],
        ['can', 'MD', 'can'],
        ["'t", 'RB', "'t"],
        ['jump', 'VB', 'jump']]
        >>> tagger('The fox can really jump')
        [['The', 'DT', 'The'],
        ['fox', 'NN', 'fox'],
        ['can', 'MD', 'can'],
        ['really', 'RB', 'really'],
        ['jump', 'VB', 'jump']]
        
        - Normalize Plural Forms
        
        >>> tagger('examples')
        [['examples', 'NNS', 'example']]
        >>> tagger('stresses')
        [['stresses', 'NNS', 'stress']]
        >>> tagger('cherries')
        [['cherries', 'NNS', 'cherry']]
        
        Some cases that do not work:
        
        >>> tagger('men')
        [['men', 'NNS', 'men']]
        >>> tagger('feet')
        [['feet', 'NNS', 'feet']]
        
        
        Term Extraction
        ---------------
        
        Now that we can tag a text, let's have a look at the term extractions.
        
        >>> from topia.termextract import extract
        >>> extractor = extract.TermExtractor()
        >>> extractor
        <TermExtractor using <Tagger for english>>
        
        As you can see, the extractor maintains a tagger:
        
        >>> extractor.tagger
        <Tagger for english>
        
        When creating an extractor, you can also pass in a tagger to avoid frequent
        tagger initialization:
        
        >>> extractor = extract.TermExtractor(tagger)
        >>> extractor.tagger is tagger
        True
        
        Let's get the terms for a simple text.
        
        >>> extractor("The fox can't jump over the fox's tail.")
        []
        
        We got no terms. That's because by default at least 3 occurences of a
        term must be detected, if the term consists of a single word.
        
        The extractor maintains a filter component. Let's register the trivial
        permissive filter, which simply return everything that the extractor suggests:
        
        >>> extractor.filter = extract.permissiveFilter
        >>> extractor("The fox can't jump over the fox's tail.")
        [('tail', 1, 1), ('fox', 2, 1)]
        
        But let's look at the default filter again, since it allows tweaking its
        parameters:
        
        >>> extractor.filter = extract.DefaultFilter(singleStrengthMinOccur=2)
        >>> extractor("The fox can't jump over the fox's tail.")
        [('fox', 2, 1)]
        
        Let's now have a look at multi-word terms. Oftentimes multi-word nouns and
        proper names occur only once or twice in a text. But they are often great
        terms! To handle this scenario, the concept of "strength" was
        introduced. Currently the strength is simply the amount of words in the
        term. By default, all terms with a strength larger than 1 are selected
        regardless of the number of occurances.
        
        >>> extractor('The German consul of Boston resides in Newton.')
        [('German consul', 1, 2)]
        
        
        
        ===========================
        An Exmaple - A News Article
        ===========================
        
        This document provides a simple example of extracting the terms of a BBC
        article from May 29, 2009. We will use several term extraction tools to
        compare the outcome.
        
        >>> text ='''
        ... Police shut Palestinian theatre in Jerusalem.
        ...
        ... Israeli police have shut down a Palestinian theatre in East Jerusalem.
        ...
        ... The action, on Thursday, prevented the closing event of an international
        ... literature festival from taking place.
        ...
        ... Police said they were acting on a court order, issued after intelligence
        ... indicated that the Palestinian Authority was involved in the event.
        ...
        ... Israel has occupied East Jerusalem since 1967 and has annexed the
        ... area. This is not recognised by the international community.
        ...
        ... The British consul-general in Jerusalem , Richard Makepeace, was
        ... attending the event.
        ...
        ... "I think all lovers of literature would regard this as a very
        ... regrettable moment and regrettable decision," he added.
        ...
        ... Mr Makepeace said the festival's closing event would be reorganised to
        ... take place at the British Council in Jerusalem.
        ...
        ... The Israeli authorities often take action against events in East
        ... Jerusalem they see as connected to the Palestinian Authority.
        ...
        ... Saturday's opening event at the same theatre was also shut down.
        ...
        ... A police notice said the closure was on the orders of Israel's internal
        ... security minister on the grounds of a breach of interim peace accords
        ... from the 1990s.
        ...
        ... These laid the framework for talks on establishing a Palestinian state
        ... alongside Israel, but left the status of Jerusalem to be determined by
        ... further negotiation.
        ...
        ... Israel has annexed East Jerusalem and declares it part of its eternal
        ... capital.
        ...
        ... Palestinians hope to establish their capital in the area.
        ... '''
        
        
        Yahoo Keyword Extractor
        -----------------------
        
        Yahoo provides a service that extracts terms from a piece of content using
        its immense search database.
        
        http://developer.yahoo.com/search/content/V1/termExtraction.html
        
        As you can see, the result is excellent::
        
        <ResultSet>
        <Result>british consul general</Result>
        <Result>east jerusalem</Result>
        <Result>literature festival</Result>
        <Result>richard makepeace</Result>
        <Result>international literature</Result>
        <Result>israeli authorities</Result>
        <Result>eternal capital</Result>
        <Result>peace accords</Result>
        <Result>security minister</Result>
        <Result>israeli police</Result>
        <Result>internal security</Result>
        <Result>palestinian state</Result>
        <Result>palestinian authority</Result>
        <Result>british council</Result>
        <Result>palestinians</Result>
        <Result>negotiation</Result>
        <Result>breach</Result>
        <Result>1990s</Result>
        <Result>closure</Result>
        <Result>israel</Result>
        </ResultSet>
        
        Unfortunately, the service allows only 5000 requests per 24 hours. Also, there
        is no strength indicator on the terms.
        
        
        TreeTagger
        ----------
        
        A POS tagger that uses some linguistics to tag a text. Here is its output::
        
        Police          NNS       Police
        shut            VVD       shut
        Palestinian     JJ        Palestinian
        theatre         NN        theatre
        in              IN        in
        Jerusalem       NP        Jerusalem
        .               SENT      .
        Israeli         JJ        Israeli
        police          NNS       police
        have            VHP       have
        shut            VVN       shut
        down            RP        down
        a               DT        a
        Palestinian     JJ        Palestinian
        theatre         NN        theatre
        in              IN        in
        East            NP        East
        Jerusalem       NP        Jerusalem
        .               SENT      .
        The             DT        the
        action          NN        action
        ,               ,         ,
        on              IN        on
        Thursday        NP        Thursday
        ,               ,         ,
        prevented       VVD       prevent
        the             DT        the
        closing         NN        closing
        event           NN        event
        of              IN        of
        an              DT        an
        international   JJ        international
        literature      NN        literature
        festival        NN        festival
        from            IN        from
        taking          VVG       take
        place           NN        place
        .               SENT      .
        Police          NNS       Police
        said            VVD       say
        they            PP        they
        were            VBD       be
        acting          VVG       act
        on              IN        on
        a               DT        a
        court           NN        court
        order           NN        order
        ,               ,         ,
        issued          VVN       issue
        after           IN        after
        intelligence    NN        intelligence
        indicated       VVN       indicate
        that            IN        that
        the             DT        the
        Palestinian     NP        Palestinian
        Authority       NP        Authority
        was             VBD       be
        involved        VVN       involve
        in              IN        in
        the             DT        the
        event           NN        event
        .               SENT      .
        Israel          NP        Israel
        has             VHZ       have
        occupied        VVN       occupy
        East            NP        East
        Jerusalem       NP        Jerusalem
        since           IN        since
        1967            CD        @card@
        and             CC        and
        has             VHZ       have
        annexed         VVN       annex
        the             DT        the
        area            NN        area
        .               SENT      .
        This            DT        this
        is              VBZ       be
        not             RB        not
        recognised      VVN       recognise
        by              IN        by
        the             DT        the
        international   JJ        international
        community       NN        community
        .               SENT      .
        The             DT        the
        British         JJ        British
        consul-general  NN        <unknown>
        in              IN        in
        Jerusalem       NP        Jerusalem
        ,               ,         ,
        Richard         NP        Richard
        Makepeace       NP        Makepeace
        ,               ,         ,
        was             VBD       be
        attending       VVG       attend
        the             DT        the
        event           NN        event
        .               SENT      .
        "               ``        "
        I               PP        I
        think           VVP       think
        all             DT        all
        lovers          NNS       lover
        of              IN        of
        literature      NN        literature
        would           MD        would
        regard          VV        regard
        this            DT        this
        as              IN        as
        a               DT        a
        very            RB        very
        regrettable     JJ        regrettable
        moment          NN        moment
        and             CC        and
        regrettable     JJ        regrettable
        decision        NN        decision
        ,               ,         ,
        "               ''        "
        he              PP        he
        added           VVD       add
        .               SENT      .
        Mr              NP        Mr
        Makepeace       NP        Makepeace
        said            VVD       say
        the             DT        the
        festival        NN        festival
        's              POS       's
        closing         NN        closing
        event           NN        event
        would           MD        would
        be              VB        be
        reorganised     VVN       <unknown>
        to              TO        to
        take            VV        take
        place           NN        place
        at              IN        at
        the             DT        the
        British         NP        British
        Council         NP        Council
        in              IN        in
        Jerusalem       NP        Jerusalem
        .               SENT      .
        The             DT        the
        Israeli         JJ        Israeli
        authorities     NNS       authority
        often           RB        often
        take            VVP       take
        action          NN        action
        against         IN        against
        events          NNS       event
        in              IN        in
        East            NP        East
        Jerusalem       NP        Jerusalem
        they            PP        they
        see             VVP       see
        as              RB        as
        connected       VVN       connect
        to              TO        to
        the             DT        the
        Palestinian     JJ        Palestinian
        Authority       NP        Authority
        .               SENT      .
        Saturday        NP        Saturday
        's              POS       's
        opening         NN        opening
        event           NN        event
        at              IN        at
        the             DT        the
        same            JJ        same
        theatre         NN        theatre
        was             VBD       be
        also            RB        also
        shut            VVN       shut
        down            RP        down
        .               SENT      .
        A               DT        a
        police          NN        police
        notice          NN        notice
        said            VVD       say
        the             DT        the
        closure         NN        closure
        was             VBD       be
        on              IN        on
        the             DT        the
        orders          NNS       order
        of              IN        of
        Israel          NP        Israel
        's              POS       's
        internal        JJ        internal
        security        NN        security
        minister        NN        minister
        on              IN        on
        the             DT        the
        grounds         NNS       ground
        of              IN        of
        a               DT        a
        breach          NN        breach
        of              IN        of
        interim         JJ        interim
        peace           NN        peace
        accords         NNS       accord
        from            IN        from
        the             DT        the
        1990s           NNS       1990s
        .               SENT      .
        These           DT        these
        laid            VVD       lay
        the             DT        the
        framework       NN        framework
        for             IN        for
        talks           NNS       talk
        on              IN        on
        establishing    VVG       establish
        a               DT        a
        Palestinian     JJ        Palestinian
        state NN        state
        alongside       IN        alongside
        Israel          NP        Israel
        ,               ,         ,
        but             CC        but
        left            VVD       leave
        the             DT        the
        status          NN        status
        of              IN        of
        Jerusalem       NP        Jerusalem
        to              TO        to
        be              VB        be
        determined      VVN       determine
        by              IN        by
        further         JJR       further
        negotiation     NN        negotiation
        .               SENT      .
        Israel          NP        Israel
        has             VHZ       have
        annexed         VVN       annex
        East            NP        East
        Jerusalem       NP        Jerusalem
        and             CC        and
        declares        VVZ       declare
        it              PP        it
        part            NN        part
        of              IN        of
        its             PP$       its
        eternal         JJ        eternal
        capital         NN        capital
        .               SENT      .
        Palestinians    NPS       Palestinians
        hope            VVP       hope
        to              TO        to
        establish       VV        establish
        their           PP$       their
        capital         NN        capital
        in              IN        in
        the             DT        the
        area            NN        area
        .               SENT      .
        
        As you can see, the identification of TreeTagger is pretty good, but the
        output would need some analysis to produce a useful set of terms. Furthermore,
        TreeTagger is not free for commercial use.
        
        Topia's Term Extractor
        ----------------------
        
        Topia's Term Extractor tries to produce results somewhere between a POS
        tagger like TreeTagger and Yahoo Keyword Extraction.
        
        Since we are only interested in nouns, a very simple POS tagging algorithm can
        be deployed, which will provide good results most of the time. We then use
        some simple statistics and linguistics to produce a narrow but strong list of
        terms for the content.
        
        >>> from topia.termextract import extract
        >>> extractor = extract.TermExtractor()
        
        Let's look at the result of the tagger first:
        
        >>> printTaggedTerms(extractor.tagger(text)) #doctest: +REPORT_NDIFF
        police          NN    police
        shut            VBN   shut
        Palestinian     JJ    Palestinian
        theatre         NN    theatre
        in              IN    in
        Jerusalem       NNP   Jerusalem
        .               .     .
        Israeli         JJ    Israeli
        police          NN    police
        have            VBP   have
        shut            VBN   shut
        down            RB    down
        a               DT    a
        Palestinian     JJ    Palestinian
        theatre         NN    theatre
        in              IN    in
        East            NNP   East
        Jerusalem       NNP   Jerusalem
        .               .     .
        The             DT    The
        action          NN    action
        ,               ,     ,
        on              IN    on
        Thursday        NNP   Thursday
        ,               ,     ,
        prevented       VBN   prevented
        the             DT    the
        closing         VBG   closing
        event           NN    event
        of              IN    of
        an              DT    an
        international   JJ    international
        literature      NN    literature
        festival        NN    festival
        from            IN    from
        taking          VBG   taking
        place           NN    place
        .               .     .
        police          NN    police
        said            VBD   said
        they            PRP   they
        were            VBD   were
        acting          VBG   acting
        on              IN    on
        a               DT    a
        court           NN    court
        order           NN    order
        ,               ,     ,
        issued          VBN   issued
        after           IN    after
        intelligence    NN    intelligence
        indicated       VBD   indicated
        that            IN    that
        the             DT    the
        Palestinian     JJ    Palestinian
        Authority       NNP   Authority
        was             VBD   was
        involved        VBN   involved
        in              IN    in
        the             DT    the
        event           NN    event
        .               .     .
        Israel          NNP   Israel
        has             VBZ   has
        occupied        VBN   occupied
        East            NNP   East
        Jerusalem       NNP   Jerusalem
        since           IN    since
        1967            NN    1967
        and             CC    and
        has             VBZ   has
        annexed         VBD   annexed
        the             DT    the
        area            NN    area
        .               .     .
        This            DT    This
        is              VBZ   is
        not             RB    not
        recognised      VBD   recognised
        by              IN    by
        the             DT    the
        international   JJ    international
        community       NN    community
        .               .     .
        The             DT    The
        British         JJ    British
        consul-general  NN    consul-general
        in              IN    in
        Jerusalem       NNP   Jerusalem
        ,               ,     ,
        Richard         NNP   Richard
        Makepeace       NNP   Makepeace
        ,               ,     ,
        was             VBD   was
        attending       VBG   attending
        the             DT    the
        event           NN    event
        .               .     .
        "               "     "
        I               PRP   I
        think           VBP   think
        all             DT    all
        lovers          NNS   lover
        of              IN    of
        literature      NN    literature
        would           MD    would
        regard          VB    regard
        this            DT    this
        as              IN    as
        a               DT    a
        very            RB    very
        regrettable     JJ    regrettable
        moment          NN    moment
        and             CC    and
        regrettable     JJ    regrettable
        decision        NN    decision
        ,"              ,     ,"
        he              PRP   he
        added           VBD   added
        .               .     .
        Mr              NNP   Mr
        Makepeace       NNP   Makepeace
        said            VBD   said
        the             DT    the
        festival        NN    festival
        's              POS   's
        closing         VBG   closing
        event           NN    event
        would           MD    would
        be              VB    be
        reorganised     NN    reorganised
        to              TO    to
        take            VB    take
        place           NN    place
        at              IN    at
        the             DT    the
        British         JJ    British
        Council         NNP   Council
        in              IN    in
        Jerusalem       NNP   Jerusalem
        .               .     .
        The             DT    The
        Israeli         JJ    Israeli
        authorities     NNS   authority
        often           RB    often
        take            VB    take
        action          NN    action
        against         IN    against
        events          NNS   event
        in              IN    in
        East            NNP   East
        Jerusalem       NNP   Jerusalem
        they            PRP   they
        see             VB    see
        as              IN    as
        connected       VBN   connected
        to              TO    to
        the             DT    the
        Palestinian     JJ    Palestinian
        Authority       NNP   Authority
        .               .     .
        Saturday        NNP   Saturday
        's              POS   's
        opening         NN    opening
        event           NN    event
        at              IN    at
        the             DT    the
        same            JJ    same
        theatre         NN    theatre
        was             VBD   was
        also            RB    also
        shut            VBN   shut
        down            RB    down
        .               .     .
        A               DT    A
        police          NN    police
        notice          NN    notice
        said            VBD   said
        the             DT    the
        closure         NN    closure
        was             VBD   was
        on              IN    on
        the             DT    the
        orders          NNS   order
        of              IN    of
        Israel          NNP   Israel
        's              POS   's
        internal        JJ    internal
        security        NN    security
        minister        NN    minister
        on              IN    on
        the             DT    the
        grounds         NNS   ground
        of              IN    of
        a               DT    a
        breach          NN    breach
        of              IN    of
        interim         JJ    interim
        peace           NN    peace
        accords         NNS   accord
        from            IN    from
        the             DT    the
        1990            NN    1990
        s               PRP   s
        .               .     .
        These           DT    These
        laid            VBN   laid
        the             DT    the
        framework       NN    framework
        for             IN    for
        talks           NNS   talk
        on              IN    on
        establishing    VBG   establishing
        a               DT    a
        Palestinian     JJ    Palestinian
        state           NN    state
        alongside       IN    alongside
        Israel          NNP   Israel
        ,               ,     ,
        but             CC    but
        left            VBN   left
        the             DT    the
        status          NN    status
        of              IN    of
        Jerusalem       NNP   Jerusalem
        to              TO    to
        be              VB    be
        determined      VBN   determined
        by              IN    by
        further         JJ    further
        negotiation     NN    negotiation
        .               .     .
        Israel          NNP   Israel
        has             VBZ   has
        annexed         VBD   annexed
        East            NNP   East
        Jerusalem       NNP   Jerusalem
        and             CC    and
        declares        VBZ   declares
        it              PRP   it
        part            NN    part
        of              IN    of
        its             PRP$  its
        eternal         JJ    eternal
        capital         NN    capital
        .               .     .
        Palestinians    NNPS  Palestinian
        hope            NN    hope
        to              TO    to
        establish       VB    establish
        their           PRP$  their
        capital         NN    capital
        in              IN    in
        the             DT    the
        area            NN    area
        .               .     .
        
        Let's now apply the extractor.
        
        >>> sorted(extractor(text))
        [('British Council', 1, 2),
        ('British consul-general', 1, 2),
        ('East', 4, 1),
        ('East Jerusalem', 4, 2),
        ('Israel', 4, 1),
        ('Israeli authorities', 1, 2),
        ('Israeli police', 1, 2),
        ('Jerusalem', 8, 1),
        ('Mr Makepeace', 1, 2),
        ('Palestinian', 6, 1),
        ('Palestinian Authority', 2, 2),
        ('Palestinian state', 1, 2),
        ('Palestinian theatre', 2, 2),
        ('Palestinians hope', 1, 2),
        ('Richard Makepeace', 1, 2),
        ('court order', 1, 2),
        ('event', 6, 1),
        ('literature festival', 1, 2),
        ('opening event', 1, 2),
        ('peace accords', 1, 2),
        ('police', 4, 1),
        ('police notice', 1, 2),
        ('security minister', 1, 2),
        ('theatre', 3, 1)]
        
        
        =======
        CHANGES
        =======
        
        1.1.0 (2009-06-29)
        ------------------
        
        - Improved the dictionary a little bit to improve real scenarios.
        
        
        1.0.0 (2009-05-30)
        ------------------
        
        - Initial Release
        
        * Part-Of-Speech Text Tagging using existing lexicon ans very simplisitc
        linguistic rules.
        
        * Term Extraction based on occurances and term strength.
        
Keywords: content term extract pos tagger linguistics
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Zope Public License
Classifier: Programming Language :: Python
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
