During my latest project (Smart Mirror), I wanted to implement a continuous speech recognition that would work without stopping. I spent a lot of time finding a library that could work nicely, there were two of them which are worth mentioning: DroidSpeech and Pocketsphinx . DroidSpeech is a nice Android library which gives you a continuous speech recognition, although there were parts of it that were not as configurable as I would hope so. Pocketsphinx came in to save the day.
INTRODUCTION TO SPEECH RECOGNITION
As I had no experience with speech recognition libraries before I started this project, it was a big complicated and time-consuming for me to implement such feature. There is no specific step-by-step tutorial that would make things easier and faster, that is why I’m putting together a small walk-through.
This article will describe usage of a library called Pocketsphinx which brings the functionality. I suggest that you read articles before we get further, so you have a fundamental understanding of how the library works. The project is available on these URLs: https://github.com/cmusphinx/pocketsphinx and https://cmusphinx.github.io/wiki/tutorialandroid/
In case you want to check out Vikram Ezhil’s DroidSpeech, you can proceed to this URL: https://github.com/vikramezhil/DroidSpeech
PREPARATIONS
These are the first steps you’re about to do:
- Create a new Android project in Android Studio (this tutorial does not include Eclipse and IntelliJ steps)
- Go to the Pocketsphinx Android demo Github page, open ‘aars‘ directory and download ‘pocketsphinx-android-5prealpha-release.aar‘. In case the link isn’t working, it’s probably because there is a new version of the library or so. Check out the directory and download a file that has an *.aar extension
- Go to Android Studio. Click File -> New -> New module -> Import Jar/Aar Package -> Finish
- Open settings.gradle in your project and (if its not there already) add pocketsphinx to your include line:
include ':app', ':pocketsphinx-android-5prealpha-release'
- Open app/build.gradle and add this line to dependencies:
compile project(':pocketsphinx-android-5prealpha-release')
- Add permissions to your project Manifest file. Pocketsphinx can record your voice commands and save them to app’s folder. I did not find any usage for these files, so I did not include this permission. A way to disable this setting will be shown later.
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" /> <uses-permission android:name="android.permission.RECORD_AUDIO" />
- Go to Pocketsphinx Android demo page on github and download file assets.xml from ‘models‘ directory, and put it in the app/ folder of your project.
- Go back to app/build.gradle in your project and add these lines to its absolute end:
ant.importBuild 'assets.xml' preBuild.dependsOn(list, checksum) clean.dependsOn(clean_assets)
- On the Pocketsphinx Android demo page, navigate to models/src/main/assets, download the ‘sync’ folder and copy it to your ‘assets‘ folder in your project. This folder contains resources for speech recognition and will be synchronized on the first application run.
That is all for now. You should have the pocketsphinx ready for use in your project.
POCKETSPHINX USAGE
The PocketSphinxActivity.java file in on the github page covers the whole functionality. You can find it in app/src/main/java/edu/cmu/pocketsphinx/demo folder. The demo project is set to display some information on screen, but we will skip those because I’m pretty sure you want to have your own implementation. I did not do any UI changes, my code runs on background and I will provide a code with explanations to every part of the code. The permission part, where you ask for RECORD_AUDIO permission, will be skipped – you have to implement that yourself.
Initialize fields and constants
/* We only need the keyphrase to start recognition, one menu with list of choices, and one word that is required for method switchSearch - it will bring recognizer back to listening for the keyphrase*/ private static final String KWS_SEARCH = "wakeup"; private static final String MENU_SEARCH = "menu"; /* Keyword we are looking for to activate recognition */ private static final String KEYPHRASE = "oh mighty computer"; /* Recognition object */ private SpeechRecognizer recognizer;
Start recognizer configuration
@Override public void onCreate(Bundle state) { super.onCreate(state); runRecognizerSetup(); }
Run recognizer setup
private void runRecognizerSetup() { // Recognizer initialization is a time-consuming and it involves IO, // so we execute it in async task new AsyncTask<Void, Void, Exception>() { @Override protected Exception doInBackground(Void... params) { try { Assets assets = new Assets(PocketSphinxActivity.this); File assetDir = assets.syncAssets(); setupRecognizer(assetDir); } catch (IOException e) { return e; } return null; } @Override protected void onPostExecute(Exception result) { if (result != null) { System.out.println(result.getMessage()); } else { switchSearch(KWS_SEARCH); } } }.execute(); }
Initialize your custom dictionary (dictionary explained at the end of article)
private void setupRecognizer(File assetsDir) throws IOException { recognizer = SpeechRecognizerSetup.defaultSetup() .setAcousticModel(new File(assetsDir, "en-us-ptm")) .setDictionary(new File(assetsDir, "cmudict-en-us.dict")) // Disable this line if you don't want recognizer to save raw // audio files to app's storage //.setRawLogDir(assetsDir) .getRecognizer(); recognizer.addListener(this); // Create keyword-activation search. recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE); // Create your custom grammar-based search File menuGrammar = new File(assetsDir, "mymenu.gram"); recognizer.addGrammarSearch(MENU_SEARCH, menuGrammar); }
Destroy recognizer objects on app exit
@Override public void onStop() { super.onStop(); if (recognizer != null) { recognizer.cancel(); recognizer.shutdown(); } }
Switch between keyphrase or menu listening
@Override public void onPartialResult(Hypothesis hypothesis) { if (hypothesis == null) return; String text = hypothesis.getHypstr(); if (text.equals(KEYPHRASE)) switchSearch(MENU_SEARCH); else { System.out.println(hypotesis.getHypstr()); } }
Print out voice command when recognized as full sentence
@Override public void onResult(Hypothesis hypothesis) { if (hypothesis != null) { System.out.println(hypothesis.getHypstr()); } }
Custom action on beginning of speech – we don’t need any action
@Override public void onBeginningOfSpeech() { }
Reset recognizer back to keyphrase listening, or listen to menu options after end of speech
@Override public void onEndOfSpeech() { if (!recognizer.getSearchName().equals(KWS_SEARCH)) switchSearch(KWS_SEARCH); }
This method will switch between continuous recognition of keyphrase, or recognition of menu items with 10 seconds timeout.
private void switchSearch(String searchName) { recognizer.stop(); if (searchName.equals(KWS_SEARCH)) recognizer.startListening(searchName); else recognizer.startListening(searchName, 10000); }
Print out any errors
@Override public void onError(Exception error) { System.out.println(error.getMessage()); }
If the 10 second timeout is finished, switch back to keyphrase recognition, as no menu command was received
@Override public void onTimeout() { switchSearch(KWS_SEARCH); }
DICTIONARY
As you probably noticed, we are using our own mymenu.gram file. This file is going to contain all the options for our menu. Create a new file in assets/sync/ called mymenu.gram and put this inside:
#JSGF V1.0; grammar mymenu; public <smart> = (good morning | hello);
Now go back to your onPartialResult() method and chance the if sentence to this form:
if (text.equals(KEYPHRASE)) switchSearch(MENU_SEARCH); } else if (text.equals("hello")) { System.out.println("Hello to you too!"); } else if (text.equals("good morning")) { System.out.println("Good morning to you too!"); } else { System.out.println(hypotesis.getHypstr()); }
CONCLUSION
… and that’s it! Now you have a continuous speech recognition that gets activated by using a custom-defined keyphrase, with a menu of options. You can extend these options and any other functionality of the recognizer code. I myself had to tweak a lot, as no recognition is 100% bulletproof, but that’s is something that I will leave to you to play with. Good luck.