1 of 1

Towards README-EVAL : Interpreting README File Instructions

James Paul White <jimwhite@uw.edu>, Department of Linguistics, University of Washington

Watch this Github repository for updates coming soon: �https://github.com/jimwhite/README-EVAL

Natural language is learned by humans in rich perceptual contexts. Sensory input is processed into a streams of utterance percepts and meaning percepts from which language is learned from their correlations. Some recent efforts in NLP research have employed grounding-inspired methods such as response-based and reinforcement learning but the domains have either not been very rich (as in games or database queries) or for which computers have poor perceptual capabilities (as in vision or robotics). To most effectively apply the concept of grounded natural language learning by machines the most appropriate domain will be that of computers. The task proposed here is learning to build software packages using instructions present in README files.

S. R. K. Branavan, Nate Kushman, Tao Lei, Regina Barzilay. (2012, July). Learning High-Level Planning from Text. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: v. 1 (pp. 126-135).

Dan Goldwasser and Dan Roth. 2013. Learning from Natural Instructions. Machine Learning, 94(2):205–232.

Related Work : Linux Plan Corpus

Lesh, Neal, Charles Rich, and Candace L. Sidner. "Using plan recognition in human-computer collaboration." Courses And Lectures-International Centre For Mechanical Sciences (1999): 23-32.

Blaylock, Nate, and James F. Allen. "Statistical Goal Parameter Recognition.” Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS). Vol. 4. 2004.

Goal: know-filespace-usage-file(rl.exe)

(find(,rl.exe,,,**dot_0**))

(du(mail,rl.exe))

The Linux Plan Corpus consists of 457 interactive shell script sessions, with an average of 6.1 actions each, captured from human experimental subjects attempting to satisfy one of 19 different goals stated as an English sentence. Although it has been used successfully by those and other researchers, the natural variation in human behavior means that a corpus of such relatively small size appears to be very noisy. As a result they have had to rely on artificially generated data such as the Monroe Plan Corpus in order to get results that are more easily compared across system evaluations.

Grounded Language Learning

Analogy (& Metaphor) in Language

That man runs fast.

That computer runs fast.

run = repeating_process(action, actor, time = short)

Douglas Hofstadter. (1995). Fluid Concepts And Creative Analogies: � Computer Models Of The Fundamental Mechanisms Of Thought.

George Lakoff & Rafael Núñez. (2000). Where Mathematics Comes From: � How the Embodied Mind Brings Mathematics into Being.

Grounding machine learning of natural language in the computer domain enables learning of non-computer domains by reversing the direction of metaphorical projection.

bsf�x: Source Files�y: RPM Spec

ant�x: Source Files�y: RPM Spec

java�x: Source Files�y: RPM Spec

gcc�x: Source Files�y: RPM Spec

V: Does this still work using “gold” ?

T: Generate labels (scripts) for one (or more) of these.

Dependent �(Source)

Dependency �(Target)

From Dependencies To Validation

The package dependency DAG becomes training, test, and evaluation data by choosing dependency targets for test (i.e. the system build script outputs will be used for them in test) and dependency sources (the dependent packages) for validation (their package maintainer written build scripts are used as is to observe whether the dependencies are likely to be good).

Shared Public Evaluation Platform

Fedora Core 17 has 1,673 package nodes with a build script (avg. 6.9 lines) and some declared dependency relationship. 1,009 are leaves and the 664 internal nodes are the target of an average of 7 dependencies each.

README-EVAL Score

I would like to make the README-EVAL evaluation system available in an easy-to-use manner as a publicly available shared service.

Implementing README-EVAL is a little bit involved

Create a shared evaluation resource

Reuse implementation effort
Encourage others to take on a new shared task
Improve reproducibility with provenance information

MLcomp (http://MLcomp.org)

MLcomp is a free website for objectively comparing machine learning programs across various datasets for multiple problem domains
Some domains are popular (classification, regression) and some are not (NLP)
A task that CS students find interesting should lead to more engagement