1. Getting started with voice coding
2. Voice Recognition and Development Set up
4. Opening and switching to files - high value add
5.1. Moving the cursor left right, up and down
5.2. Moving the cursor left and right across word boundaries
5.3. For moving to the beginning and end of the line
5.4. Going to a specific line: high value add
5.5. Finding and jumping to a particular symbol or character, forwards or backwards high value add
5.6. Finding and jumping to a particular word, forward or backwards high value add
5.7. Other ideas -(more advanced, editor specific) high value add
5.7.1. Goto line & symbol in one utterance
5.7.3. Command Sequences aka Continuous Command Recognition
5.7.5. Goto line Mod 100 - high value add
5.7.6. Regular expression find for selection and movement
5.7.6.1. Jumping in, out, back, last
5.7.6.2. Jumping to function name, arguments, first non-whitespace
6. Writing code and modifying code
6.2. selection,copying,cutting, deleting
6.3. selection,copying,cutting, deleting lines (Mod) - high value add
Template driven programming (Mod) - high value add
There is an abundance of information on voice coding including numerous open source projects.
This article will list the main things needed to become a productive voice coding developer, with an emphasis on high value add items. I will discuss my specific implementations, and in a future article I will discuss alternatives for further research along with pros and cons.
My goal is that after reading this article you should:
∙ have a pretty good understanding of what sorts of commands you will need to find or implement to start voice coding effectively
∙ have at least a few sample utterances
∙ have some basic sample code for each utterance that you can apply to your system.
This is a pretty big topic, so my goal is that this article should get you 70% to 80% of the way there.
I use Dragon NaturallySpeaking 12 or 13, along with Vocola. Vocola is an easy to use, easy to learn, very terse language for adding commands to Dragon NaturallySpeaking. (Or Windows speech recognition). (One note, Dragon 13 currently has some compatibility issues and as such may not be the best choice until those are fixed).
Vocola installation instructions.
For text editing I use sublime text 3. Many other editors exist, so pretty much everything below is applicable.
In order to be a productive developer you need the ability to at least:
I use a global voice command (in the _vocola.vcl file), which automatically opens sublime text with a predefined project folder. the vocola code looks like this: (lines with a # are comments)
# project folder definitions <project_folder> := ( my project = "X:\dev\myproject" | framework ="x:\dev\framework" ); |
# vocola command for the utterance “subber ” followed # by a project folder. This starts the sublime 3 exe # with the folder as a parameter subber <project_folder>=ShellExecute("C:\Program Files\Sublime Text 3\sublime_text.exe "$1); |
Therefore I can say:
subber framework
At any time and it will bring up sublime text opened at the framework directory.
Here I make use of the "go to anything" functionality available in sublime text.
Sublime text allows you to press control+P, then type in parts of a filename on which it will do fuzzy matching for any file that in the current project. This is immensely useful for voice programming. There are other editors available which have similar functionality with varying levels of filename matching.
Here are some sample vocola commands:
Go to={Ctrl+p}; |
This lets me say "go to", to bring up the fuzzy search edit field and drop-down. (At that point I can say the words that I want to search for, and they will be entered into that edit box)
# list of commonly used files <go_to_shortcuts> := ( (conf|app| app .conf ) = app.conf | manage= manage.py); # go to command to those shortcuts immediately Go to <go_to_shortcuts> = {Ctrl+p} Wait(600) SendKeys($1) Wait(200) {enter}; |
For files I go to a lot, I predefined a shortcuts list so I can say "go to app" or "go to manage", and it will immediately open up that file. (Notice the enter command)
# <_anything> is a special variable which accepts any # form of dictation. go to <_anything> ={Ctrl+p} Wait(600) SendKeys($1) ; |
I can say "go to main CSS", "go to Apple core", or just about anything, to bring up the list of matched files. I would then have to choose a file using up-and-down commands and then send an enter key.
gone <_anything> ={Ctrl+p} Wait(600) SendKeys($1) Wait(200) {enter}; |
This is pretty much the same as "go to" except it presses the enter right away. So if I say "gone main CSS" it will go to the 1st fuzzy file match.
Eclipse has “open resource”, vim has “CtrlP”, and emacs has “projectile”, which are analogous to this.
There are many ways to get around code using a mouse and keyboard. But when you have to do so by voice, things get a little bit trickier since you can't just click or press buttons to page up or move the cursor.
As a reference take a look at the following file which Mark (the current maintainer of Vocola) put together, since there are many great suggestions and sample commands there.
http://vocola.net/unofficial/commands_for_Win32Pad.html
Here are a number of ways you can get around code, which you can simulate with voice :
To move the cursor left and right or up-and-down, I have some pretty simple commands. For some reason the default commands in Dragon are very verbose. They want you to say "move left four", so I have some commands to just make things more terse.
<navs> := ( Down | Up | soar = Up | left | right); <navs> = {$1}; <navs> <nn> = {$1_$2}; |
This way I can quickly say
"left twelve",
"down",
"soar twenty"
Before I continue, one word about code organization. Vocola allows you to have header files where you can define commonly used constructs. I use this mechanism extensively for text editing. That way I can keep common text editing commands in one header file, and use them across many different applications. 2 categories of commands are commands that only make sense as single commands (like going to the end of a line) versus commands that can be used as single or repeating commands (like going right once or 4 times).
For moving across word boundaries for example:
# textedit.vch <text_navigation_variable> := ( Law ={Ctrl+left} | raw = {Ctrl+right} ); |
#================================= # start TextEdit stuff include textedit.vch; <nn> := 1..99; <text_navigation> = $1; <text_navigation_variable> <nn> = Repeat($2, $1); <text_navigation_variable> = $1; #================================ |
My contraction law and raw stand for "left a word", "right a word", so I can say things like:
law
law six
raw
raw 4
I find that the words home and end often get recognized incorrectly, I define these in a non-variable section for text navigation:
# textedit.vch <text_navigation> := ( meg =SendKeys({Home}) | Mel =SendKeys({end}) ); |
#================================= # start TextEdit stuff include textedit.vch; <text_navigation> = $1; |
And as you can see above I do not include a <nn>, so these can only be said by themselves.
meg stands for "move to beginning", mel stands for "move end of line"
this is a pretty big win in terms of getting around, so I strongly suggest using a coding editor that shows the line numbers and lets you go to them. Sublime text lets you do that using the shortcut key control+G.
I pretty much copy what the Vocola language tutorial has here, so I will just reproduce the text:
http://vocola.net/v2/AlternativeWords.asp
Vocola: Line 1..100 = {Ctrl+g} $1 {Enter}; Line 1..99 Oh 1..9 = {Ctrl+g} $1 0 $2 {Enter}; Line 1..99 10..99 = {Ctrl+g} $1 $2 {Enter}; |
Say: Line Sixty One Oh Nine Sent: {Ctrl+g}6109{Enter}
Say: Line Eight Forty Three Sent: {Ctrl+g}843{Enter}
This lets you go to a pretty large numbers without causing recognition problems.
This is also a pretty big win in terms of moving around without a keyboard or mouse. 1st you will need to define your list of symbols and/or alphabet characters. There are many examples of abbreviations for symbols and alphabets around the web, and I have borrowed freely. If you look at the Win32 pad example given by Mark and search for the word “printable” you will find his list.
http://vocola.net/unofficial/commands_for_Win32Pad.html
For a side by side comparison, take a look at this “Rosetta stone spreadsheet”
Personally I have symbol definitions like the following: (some are borrowed from short talk)
<symbols> := ( hash="#" | lip = "(" | rip = ")" | (vert| vertie|bar) = "|" | semi = ";" | equal='=' #The following are all shorttalk | lace = "{" | race = "}" | lack = "[" | rack = "]" ... ); |
I have code that does something similar as what Mark has for Win32 pad. For example a leap (goto the next symbol) and retreat (go to the previous symbol) utterance command
leap <symbols> = _Leap($1, 1 ); retreat <symbols> = _Retreat ($1, 1 ); |
That way I can say something like
leap rip
retreat race
To quickly go to the next ) or previous }
I have also implemented Mark’s suggested count mechanism so that I can jump to up to the 4th symbol.
<count> := ( first = 1 | second = 2 | third = 3 | fourth = 4 ); leap <count> <symbols> = _Leap($2, $1 ); retreat <count> <symbols> = _Retreat ($2,$1); |
This lets me can say things like:
leap third lip
retreat forth semi
To jump to the 3rd (, or go back to the 4th ;
Note that the way this is implemented it will flash the find dialog, but given that the keystrokes are pretty quick, it barely affects your flow.
A natural extension of jumping to a particular symbol is jumping to any word. This can easily be accomplished by just substituting the special variable _anything.
Again it's worth looking at Mark's example in the Win32 pad. I do something similar but I've chosen slightly different words, sort of based off of Shorttalk.
(ghin|gin) <_anything> = SendSystemKeys({Ctrl+f}) Wait(500) SendKeys($2) Wait(300) SendSystemKeys({esc} {left}); ex <_anything> = SendSystemKeys({Ctrl+shift+i}) SendKeys($1) {enter} {left}; |
ghin is supposed to imitate the sound "begin".
This allows me to say things like
ghin import
ex function
To quickly find the next instance of import or the previous instance of function for example.
A few other ideas I’ve been experimenting with and believe will be high value add are:
A quick terse way to find a symbol or anything on a particular line. I've experimented with a command like:
jump <nn> <symbol> |
Which given a number will go to the 1st symbol on that line. I've also played with a command to choose which symbol to go to:
jump <nn> <symbol> <count> |
So I can say things like
jump 35 lip
jump 55 com 3rd
To go to the 1st (on line 35, or the 3rd ',' on line 55
one can also add the optional utterance "after" to go after the symbol.
jump 45 after lip
Another idea which seems to be of high value is the use of a third-party utility for sublime text called easymotion. (There are also things like acejump for Emacs and easy motion for VI)
https://github.com/tednaleid/sublime-EasyMotion
Easymotion lets you type in any letter and it will highlight and label all instances of that letter in the visible page. The only disadvantage of this approach is that it requires 2 steps. So you would have to utter 2 things for example:
Easy lip # this would call up easy motion with (
bravo # this would select the ( labeled with b
The advantage of this approach is that you don't have to take your eye off of the particular symbol or letter that you are looking for, which you have to do even for the line jumping approach.
Many VoiceCoder's use continuous command recognition or command sequences. What this allows you to do is say multiple commands as part of the same utterance, which allows you to save time and not pause when you're saying your commands. More advanced VoiceCoder's have stated that this is a huge value add and allows them to be more productive. I have included a number of articles to read.
Vocola link on enabling Command sequences.
David’s link
James link
The one caveat with enabling continuous command recognition is that you need to be more careful designing your grammar. When one utterance can become multiple commands, it is possible that a grammar designed around one command per utterance could start failing. The articles linked above discuss this in a fair amount of detail.
5.7.5. tbd. Set mark, jump to mark, copy from mark etc...
In the Win32pad Vocola example version 0.2, there is a section on referencing visible line numbers by using the last 2 digits of the line number (mod 100). There are examples of scripts that reference lines as long there as fewer than 51 lines visible on the screen. This particular example takes advantage of a clipboard extension and the fact that Win32 pad presents the current line during the go to line command.
Being able to carry out selection and movement line commands based on the last 2 digits alone is handy for when your source file gets to be quite long.
I have implemented a go to line mod 100 command in sublime text using their plug-in architecture, but am a ways from implementing all of the commands in the Win32 pad example. Being able to reference lines by just the last 2 numbers is definitely a high value, but that would need to be weighed against how much work it is to implement it in your editor of choice.
in - parens, out of parens, back to opening paren, right before last paren (bracket, or brace too)
all of these techniques give you pretty quick control of movement in a file. The list is certainly not exhaustive, but I think it will certainly get you started.
In programming we do a lot of special casing. So it's useful to have commands that can do things like snake_case, PascalCase, camelCase, UPPERCASE, verytersecase, among others.
There are certainly others who have done more sophisticated casing, but here is my list of functions
CamelCase(x) := EvalTemplate('("x" + %s).title()[1:].replace(" ","")', $x); PascalCase(x) := EvalTemplate('("" + %s).title()[:].replace(" ","")', $x); Under(x) := EvalTemplate('("" + %s).lower()[:].replace(" ","_")', $x); CapUnder(x) := EvalTemplate('("" + %s).upper()[:].replace(" ","_")', $x); Hyphen(x) := EvalTemplate('("" + %s).lower()[:].replace(" ","-")', $x); Terse(x) := EvalTemplate('("" + %s).lower()[:].replace(" ","")', $x); Path(x) := EvalTemplate('("" + %s).lower()[:].replace(" ","/")', $x); |
Anywhere I need to use these I add the following commands:
camel <_anything> = CamelCase($1); Pascal <_anything> = PascalCase($1); score <_anything> = Under($1); hype <_anything> = Hyphen($1); terse <_anything> = Terse($1); |
That lets me say things like
score this is an test
camel a camel case demonstration
To quickly produce things like this:
this_is_a_test
aCamelCaseDemonstration
More…
When it comes to selecting, cutting, copying, deleting, there are a number of possibilities, and I suggest experimenting with what may work best for you. This is still
an area of active experimentation for myself.
There are many examples in the Win32 pad code which are well worth looking over.
One idea to consider: usually you select text for reason: either to delete, copy, cut, paste over etc. as such, you could combine this into one utterance. For example you could write a command to
copy line
Instead of uttering two commands
select line
copy
This can be problematic however if you are not completely sure how your editor will do selection. For example Windows Notepad does word boundary selection quite differently from sublime text. So if you come up with a command that does word selection by pressing control left or control right, and you have a single utterance like:
cut 4 words right
You may get different results depending on your editor.
So far I have primarily taken the two-step approach of selecting first followed by doing the action. Which can be less efficient. Here are some sample utterances:
# selecting words shekar <n> # shift c(k)ontrol right shekal <n> # shift c(k)ontrol Left # selecting in line sel end # select to end of line seleg # select to beginning of line # selecting up and down lines slup <n> # select up sloud <n> # select down |
<op> single <n>
<op> through <n>
<op> <n> ( through | comma ) <n>
uses mark
this is especially useful with mod line numbers
class <_anything>
function <_anything>
list comp
Depending on what you develop, obviously running your project means many things. Currently for me it means that I need to switch to a terminal, in this case putty, start or restart some commands on the command line, then switch to a web browser. For example I have a command that executes a sequence
go putty=HeardWord(putty) Wait(500) HeardWord(restart) Wait(200) HeardWord(chrome) Wait(500) HeardWord(go, local, 40); |
In sequence this will activate putty, call restart command that's defined in putty's context, activate chrome, then call a command to go to a particular local port in chrome. Everybody's needs here and be different so I'll leave it at that.