"Classification" means assigning one or more categories to the text of a news document. Rules based classifiers use a set of Boolean rules, rather than machine-learning or statistical techniques, to determine which categories to apply.
EXTRA is the EXTraction Rules Apparatus, a multilingual open-source platform for rules-based classification of news content. IPTC was awarded a grant of €50,000 from the first round of Google’s Digital News Initiative Innovation Fund https://www.digitalnewsinitiative.com/ to build and freely distribute the initial version of EXTRA. DNI granted IPTC €50,000 for the whole project. This amount covers the development of the software (for about €35,000), the development of the categorization rules for two languages (for about €14,000) and some PR activities to communicate progress and success of the project (for about €1,000 EUR). https://iptc.org/news/iptcs-extra-project-update/
We are working with news providers to supply sets of news documents and with linguists to write rules to classify the documents. IPTC is looking for qualified developers to create the rules engine to accurately and efficiently categorize the documents using the rules. IPTC has drawn up a set of requirements for the engine and the rules language https://docs.google.com/document/d/1O8pmFlohcGXThzyrWil_OFbDyqJk1Hcjpml_RRXuw6U/pub
DELIVERABLESThe EXTRA developers will work closely with the IPTC EXTRA team to* Evaluate and recommend open source libraries and technologies to be incorporated within the EXTRA system* Devise and deliver a high performance rules based document classification engine* Design and implement the EXTRA REST APIs
MANDATORY REQUIREMENTS* Fluency in one or more of Python, Java or C/C++.* As an international organization, IPTC will mainly work with the EXTRA developers remotely - therefore proficiency with written and spoken English language communications is crucial.* Proficient understanding of Git/Github for code versioning* Working with data in different formats, such as HTML, XML and JSON* Available to work on the project during the projected timeline of October 2016 to June 2017
PREFERRED REQUIREMENTS* A demonstrated ability to develop natural-language processing or document analytics applications is ideal.* Familiarity with news, media or entertainment industries is helpful.* Experience with REST APIs is helpful
There’s some flexibility regarding the above requirements based on individual applicant relevant experience. Some travel may be required.
INTERESTLet us know if you are interested in developing EXTRA. First preference will be given to applications received by 21st October 2016, and review will continue until the position is filled.