Domain name options

It might be nice to use the same name for both the wiki and the community

name

advantages

disadvantages

nalm

Hart used the acronym “NALM”

.com (domain squatter), .net (New Adult Learning Movement), .org (National Association for Lay Ministry) and .info are all taken!

nilm

.com (domain squatter), .net (NILM Education Center) and .org (National Institute of Leadership and Management) are all taken

nialm

All suffixes except .com are available.

nialm.com is taken (by a person called Nial Murray)

NIALM is a bit of a mouth full and may be hard for non-experts to remember.

disaggregation

All suffixes are available (I already own .com).

“disaggregation” is a normal word so should be easy for non-experts to remember.  No need to explain what a “hyphen” is (!).

“disaggregation” is a general term and is not specific to electricity disaggregation.

electricity-disaggregation

All suffixes are available.

Name is self explanatory, cleanly describes the scope of the wiki and and should be easy to remember.

a bit long to type!

what if, in a year or two, we want to broaden the scope to, for example, water-usage disaggregation?

nilm-wiki / nalm-wiki / nialm-wiki

All domain suffixes available!

What if the site wanted to extend beyond just a wiki (e.g. hosting data repositories / discussion forums / blogs)

energy-disaggregation <- I HAVE JUST REGISTERED THIS

All suffixes available.  Name is self explanatory and should be easy to remember. Captures gas and electricity.

Still doesn’t capture water disaggregation.

nilmtk

all suffixes are available.

The site might be more general than just for nilmtk (e.g. researchers are likely to find useful information on the wiki even if they don’t use nilmtk)

utility-disaggregation

all suffixes are available.

Not an obvious search term.  I don’t think I’ve ever searched for “utility disaggregation”.

Votes for favorites

Done:

TODO:

Deciding whether to put NILM metadata schema and defaults in wiki or nilm_metadata github

Metadata use cases (wiki versus JSON Schema):

1) Describing a dataset (e.g. UKPD) including appliances, what pre-processing has been applied, wiring etc

We can’t put the full description on the wiki.  Some dataset owners (e.g. for HES) will probably just forbid this.  And it’s just not practical.

2) For disaggregation: Describing priors (e.g. usage patterns) and defaults

3) Categories

4) Human-readable reports on appliances (e.g. plots from different data sets), datasets (e.g. gotchas, which appliances are sampled) etc

5) Appliance models

Advantages of the wiki:

Disadvantage of wiki:

Advantages of JSON on github:

Disadvantages of JSON:

Conclusions:

Thinking again of using wiki for defining the human-readable schema

Appliance

Tags:

KEY

TYPE

VALUES

UNIT

DESCRIPTION

minimum_off_duration

number

>0

seconds

control

array of strings

timer, manual, motion sensor, light sensor, temperature sensor

Give a list of all the control methods which apply.  For example, a video recorder would be both ‘manual’ and ‘timer’.

energy_rating

string

subtype_of

string

Fridge

Defaults:

{

  “components”: [

    {“name”:“compressor”, “control”: “temperature sensor”},

    {“name”: “light”, “control”: “manual” }

  ],

  “subtype_of”: “appliance”

}

Model Blah Fridge

{

  “subtype_of”: “fridge”,

  “manufacturer”: “blah”,

  “model”: “foo”

}

Ideas for wiki content

  1. Appliances database.  Communally-defined controlled vocabulary / thesaurus / hierarchy. Basically like DBpedia (e.g. http://live.dbpedia.org/page/Refrigerator) but much more detail on appliances. General ideas:
  1. Wiki is the authoritative source for everything, including defaults. But we also download a flat file version of appliance defaults to nilmtk for ease of access.
  2. probably use “tags” like used in Project Haystack
  3. need a set of defined types that each field can accept?  e.g.
  1. distribution
  1. discrete (bin) distribution
  2. fitted parameters for most suitable prob distribution (e.g. GMM)
  1. categories of attributes?  e.g.
  1. power
  2. usage
  3. categorisation
  4. TODO: enumerate all metadata fields and try putting them into categories
  1. Many attributes are shared.  Use inheritance?
  1. I think there are three or four levels of abstraction:
  1. the schema (which fields; what values can those fields take, etc)
  2. each general appliance class (fridge, kettle etc)  Maybe this level can be split into two:
  1. the classifications etc for the appliance (e.g. that a fridge is in the  ‘cold’ category)
  2. the priors for the appliance (that fridges use about 120W etc)
  1. specific appliances (the fridge in REDD house 1 etc).
  1. It feels like there must be a clean, tidy way to represent this structure.  Maybe we need a naming convention like “Template:Fridge” (the schema), “Appliance:Fridge” (the broad class), “AppliancePriors:Fridge”, and “Dataset:REDD:Building1:Fridge”
  2. Schema (defining the types of values etc)
  1. for all appliances:
  1. “name”: {“oneOf”: [“fridge”, “tv”, …]}
  2. “instance”: integer, >= 1
  3. “category”: {“oneOf”: [“ICT”, “cold”... ]} use data from HES appliance_type_codes.csv
  4. “subtypeOf”: {“oneOf”: [“appliance”, “fridge”... ]}
  5. “example images”: list of URLs
  6. Links to other wikis (e.g wikipedia page on fridges)
  7. “room”: $room
  8. “control”: $control
  9. “power”: $distribution
  10. “components”: list (or dict) of $components
  11. “model parameters”: list (or dict?) of $models <- I think Model Parameters should be its own class.  An association class between Appliance and Model.
  12. “appears in datasets”: list of dataset names (links in wiki)
  13. ‘appliance correlations’: a list of (<other appliance>, <minimum probability>, <mean probability>, <max probability>, ‘data used to generate appliance correlations’: list of (<dataset>, <building>, <date range>)). If this appliance is on then what is the probability of `other appliance` also being on?  We express a range of probabilities across all measured datasets.
  14. Provide machine-downloadable data describing prior probabilities of usage / appliance behaviour.  Perhaps this could be specified as a Bayes net?  Or perhaps just a set of (conditional) probability tables.  Information to represent:
  1. dependencies: country, weekday / weekend, season, external temperature, external sunshine, house occupied?
  2. output ‘behaviours’ expressed as probability distributions (GMMs or tables of discrete values?  Maybe tables because GMMs can always be estimated from tables):
  1. average usage pattern per day
  2. on-duration
  3. off-duration
  4. power
  1. ‘data used to generate usage patterns’: list of (<dataset>, <building>, <date range>)
  2. Properties per specific appliance:
  1. Nominal rated power (range?) See http://energy.gov/energysaver/articles/estimating-appliance-and-home-electronic-energy-use
  2. Link to photo of label
  3. Link to photo of appliance
  4. Link to manufacturer's spec page
  1. additional metadata per appliance:
  1. domestic chiller (abstract class)
  1. “type”: oneOf “upright”, “chest”, “american” etc
  2. subtypes:
  1. fridge
  2. freezer
  3. fridge freezer
  1. freezer is separate thermal zone: bool
  1. components:
  1. “name”: oneOf [“motor”, “heater”, etc]
  2. “power”: $distribution
  3. “control”: $control
  1. signatures:
  1. <dataset>
  2. <building>
  3. <channel>
  4. <download URL>
  1. model:
  1. “name”: oneOf [“FHMM”, “CO”, etc]
  2. “date prepared”: $date
  3. “software name”: oneOf [“NILMTK”, etc]
  4. “software version”: float
  5. “training data”: e.g. REDD: ‘all’, UKPD: house1 from 2012-2013; all of house 2
  6. “parameters”: dict
  1. control:
  1. list where each item is oneOf: [manual, timer, sensor, always on]
  1. rooms:
  1. “name”: oneOf [“living”, ‘kitchen’, etc]
  2. “instance”: integer >= 1
  1. distribution:
  1. “name”: oneOf [“min max”, “normal”, etc]
  2. “parameters”: dict
  1. Priors per appliance (usual values to fill in the schema)
  1. fridge:
  1. subtypeOf: domestic chiller (abstract class)
  2. category: cold
  3. room: 95% kitchen
  4. components: [compressor, light]
  5. control: {‘compressor’: sensor, ‘light’: manual}
  6. Human-readable information:
  1. plots of the appliance waveform (from different datasets)
  2. histograms of appliance usage over an average day etc.
  3. Highlight differences between different datasets (e.g. US washing machines look nothing like UK washing machines).
  1. Each article may have some combination of:
  1. Machine-readable fields (most can be overridden by subtypes or by specific instances):
  1. ‘subtypes’: list of articles describing subtypes of this Appliance (TODO: check how best to structure hierarchies in SMW)
  2. use DBpedia as the URI?
  3. alternative names (perhaps per dataset)
  4. 'control': list. Describes whether this appliance is controlled manually, or through a timer, or sensor.  The list should be some combination of: {'manual', 'timer', 'sensor',motion timer, 'always on'}. Could also express likelihood of eaxh type. For example, a tv would have `'control':['manual']` whilst a home theatre PC would have `'control':['timer', 'manual']`because it can be turned on.  There’s also
  5. 'components': dict. Keys are tuples (<name>, <index>) e.g. ('motor', 1). Each value is a dict describing summary statistics for the power usage of that component.  e.g. {'distribution name': 'normal', 'mean':100, 'stdev': 25} For example, a fridge's 'components' might be: {('compressor', 1): {'distribution name': 'min max', 'min':100, 'max':200},('light', 1) : {'distribution name': 'normal', 'mean': 20, 'stdev': 5}}
  6. Appliance type (cold, ICT etc… use data from HES appliance_type_codes.csv)
  7. parameters for various models of this appliance (ideally these parameters should be easy to download into the relevant Disaggregator subclass in NILMTK):
  1. 'power state graph': networkx.DiGraph Each node is a dict with fields: 'power': dict describing summary statistics for the power of the state 'components': list of component tuples (<name>, <index>) Each edge describes a valid state transition.  Edges may be annotated with a 'duration' attribute which is a dict describing summary statistics for the duration of the source power state.
  1. ‘example signatures’: list of (<dataset>, <building>, <channel>, <download URL>) where signatures of this appliance can be downloaded from.  This should be automatically generated when we specify which appliances each dataset contains
  2. Example photos
  3. Links to other wikis (e.g wikipedia page on fridges)
  4. ‘appliance correlations’: a list of (<other appliance>, <minimum probability>, <mean probability>, <max probability>). If this appliance is on then what is the probability of `other appliance` also being on?  We express a range of probabilities across all measured datasets.
  5. ‘data used to generate appliance correlations’: list of (<dataset>, <building>, <date range>)
  6. Provide machine-downloadable data describing prior probabilities of usage / appliance behaviour.  Perhaps this could be specified as a Bayes net?  Or perhaps just a set of (conditional) probability tables.  Information to represent:
  1. dependencies: country, weekday / weekend, season, external temperature, external sunshine, house occupied?
  2. output ‘behaviours’ expressed as probability distributions (GMMs or tables of discrete values?  Maybe tables because GMMs can always be estimated from tables):
  1. average usage pattern per day
  2. on-duration
  3. off-duration
  4. power
  1. ‘data used to generate usage patterns’: list of (<dataset>, <building>, <date range>)
  2. Properties per specific appliance:
  1. Nominal rated power (range?) See http://energy.gov/energysaver/articles/estimating-appliance-and-home-electronic-energy-use
  2. Link to photo of label
  3. Link to photo of appliance
  4. Link to manufacturer's spec page
  1. Human-readable information:
  1. plots of the appliance waveform (from different datasets)
  2. histograms of appliance usage over an average day etc.
  3. Highlight differences between different datasets (e.g. US washing machines look nothing like UK washing machines).
  1. Articles on NALM details per country (e.g. mains supply (split phase? single phase? three phase? etc), mains wiring, common high-consuming appliances (AC?  Electric resistive heating? Heat pumps for heating? EVs?))
  2. List of conferences, workshops, summer schools and journals
  3. List of data collection hardware
  4. Software resources (NALM code, lists of HMM toolkits etc)
  5. List of energy recommendation projects
  6. Proposed metadata ontology for energy data
  7. List of nilm blogs
  8. Table of research. Each row would be a “NALM technique” rather than strictly a 1-to-1 mapping from rows to papers.  This is so we can handle commercial projects and academic projects which are described in multiple papers (e.g. Hart’s technique).   Item titles would be things like “Hart et al” or “Hart et al B” or “Onzo”.  (for ideas for a semi-automated system for doing this, see the “automated citation extraction” section below) For each item, have fields for:
  1. Title: <primary author>_<design> e.g. “technique\Hart_et_al” or “technique\Hart_et_al_b”  or “technique\Onzo”.  When dealing with primary authors who have multiple distinct techniques, it’s probably best to use an arbitrary label for each technique (like Hart et al B) rather than dates (like Hart 1990-1995) because dates might need to be changed.  Although it would be worth checking whether it is problematic to change article titles in Semantic Media Wiki because names like “Hart 1990-95” makes a lot more sense.
  2. Author(s)  (e.g. “George Hart”)  (clickable links to pages describing authors)
  3. Institution(s)  (e.g. “MIT”, “EPRI” etc)
  4. Papers / patents (DOIs as clickable links) (e.g. all of Hart’s papers)
  5. Commercial / academic / both (e.g. Hart’s started as academic and then was implemented commercially)
  6. measured parameters (voltage, reactive power, active power, apparent power, 4-quadrant) (list) (indicate if some are essential and some are optional)
  7. pre-processing = ‘none’ | ‘normalised’ |  FFT | custom etc
  8. sample rate (single number, multiple numbers, range)
  9. features extracted (steady states, transients etc) (list)
  10. Training data: aggregate data only / IAM data / user diaries / appliance manuals
  11. Latent variables learnt: inter-appliance relationships / time of day patterns of use / weather
  12. supervised / unsupervised / semi-supervised
  13. smallest appliance detectable (10W? 100W? 500W?)
  14. quantity of data used for training / validation / testing
  15. classifier (HMMs, NNs, SVMs, KNN, DTs etc)
  16. Explicitly models multi-state appliances?
  17. performance (specify which metrics are used)
  18. dataset(s) used (REDD etc)
  19. code available?  Commercial implementations?
  20. data available?  (is this where we’d like to REDD etc)?
  21. computational complexity (if mentioned)
  22. List of other paper describing same approach (often authors will publish multiple papers on the same basic NALM approach)
  1. Entries for researchers, institutions, companies (basic stub entries should come “for free” as a side effect of entering data for each research entry).  institution web page, blog, twitter, google plus, current country, current institution, past institutions, associated papers, photo, list of disaggregation papers etc.
  2. Table of publicly available datasets.  For each dataset, have fields for:
  1. link to the dataset
  2. DOI of paper(s) describing dataset
  3. country
  4. duration
  5. number of houses
  6. for both mains and individual appliances:
  1. sample rate
  2. measured parameters (voltage, reactive power, active power, apparent power, 4-quadrant) (list)
  1. Also have an unlimited-length wiki article for each dataset discussing things like:
  1. “gotchas” to be aware of (e.g. “stuck” readings, unsorted rows etc)
  2. recommended preprocessing steps (sort; filter out insanely large values etc)
  3. links to open-source code libraries which can load this dataset
  1. some specific datasets to list:
  1. http://ukedc.rl.ac.uk/data.html
  1. Companies doing NILM.  Fields could include: HQ Country, date they started doing NILM, NILM researchers, official website, twitter,

Specific info for the wiki (kept here in leu of having the wiki ready…)

Cleaned HES dataset:

On Linuz, `unzip` complains that `skipping: CAR_HEUS_dataGlossary.doc  unsupported compression method 99`.  According to this blog post, this error is caused because WinZip 9.0 or newer uses a new encryption method, AES, which unzip can’t handle. The 7z tool can handle these files.  This can be installed on Ubuntu with `sudo apt-get install p7zip-full` then extract with `7z e <filename.zip>`

Bash script for extracting files:

#!/bin/sh

PASSWORD=<password>

for file in *.zip

do

        echo "Processing $file"

        7z e -p"$PASSWORD" "$file"

done

Another dataset: The DEBS dataset: http://www.cse.iitb.ac.in/debs2014/?page_id=42 (although I think that the data isn’t really freely available)

Another dataset: Stanford’s PowerNet - measures >130 laptops, servers, etc

Interesting meter: a small personal company and I build Watto, a completely open platform for wireless metering http://watto.prestilab.it The meter is based on ad7953 ic and can measure active and reative power, voltage, current and power factor. The wireless part is implemented by flyport openpicus, a software open source WiFi board that send data (with configurable sampling frequency) to a server. All the source is furnished free and you can customize it to request just the interesting measure, to change the sampling frequency and to configure the data server.  https://mail.google.com/mail/u/0/#inbox/146f67227d21b30c

Related work

Wiki / CMS software options

Broad requirements

Automated citation extraction

The basic idea is something like this: it would be nice to be able to just enter a list of, say, DOIs and then have a system which automatically resolves those DOIs and also comes up with standard names for authors and publication venues so we can then unambiguously link to wiki entries for each of these objects; especially if we can then add additional fields (e.g. sample rate) to each paper.

Some notes:

Doing this with Drupal:

Doing this wiki Semantic Media Wiki (SMW)

List of candidate wiki / CMS software options

MediaWiki 

Relevant extensions

Advantages:

Disadvantages:

Wikidata

Info:

Disadvantages:

Wikia:

Disadvantages:

Drupal

Relevant extensions

Blue-sky thinking for what could be hosted on the website

Community platform options

Votes

Things we could discuss on the community

Publicising the wiki and community once they are set up and have some initial content