How to create your own voice assistant with python and ChatGPT API

This post describes how to build your own voice assistant based on OpenAI api with ChatGPT. The assistant is build with python and uses python libraries for speech to text and text to speech conversion. Additionally the script contains some special commands for activation, deactivation and test mode.

The source is available on github: https://github.com/benni-wdev/openai-voiceassistant

Voice assistant script running in Pycharm

Prerequisites

You need a computer with Linux (I use Linux mint in this tutorial) with Microphone and Sound. Additionally you need an OpenAI API key for calling the API of ChatGPT. For more information you can check the following link: how-to-fix-error-code-429-you-exceeded-your-current-quota-please-check-your-plan-and-billing-details-for-openai-api

Install needed python packages

To run the script some additional packages are needed.

$ sudo apt install python3-pyaudio
$ sudo apt install python3-dotenv
$ sudo apt install espeak

$ pip3 install pyttsx3
$ pip3 install speechrecognition
$ pip3 install openai


How to install script and start the voice assistant

You can install the script by just copying it. We will directly clone the project from github here to our machine (To learn more about git check this post).

$ git clone https://github.com/benni-wdev/openai-voiceassistant.git
Cloning into 'openai-voiceassistant'...
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 14 (delta 4), reused 7 (delta 3), pack-reused 0
Receiving objects: 100% (14/14), 6.11 KiB | 894.00 KiB/s, done.
Resolving deltas: 100% (4/4), done.

Now the environment file has to be created which contains your OpenAI API key to interact with chatgpt.

$ nano .env

Here add your api key (you can create one here) like this

OPENAI_API_KEY={your api key}

(Replace {your api key} with your concrete value)

After saving the file you are ready to start the script with:

$ python3 voiceassistant.py

You should see something like this.

$ python3 voiceassistant.py 
2023-12-03 14:33:30,050 INFO     Voice Assistant is listening...
ALSA lib pcm_dmix.c:1032:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:1032:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dmix.c:1032:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:1032:(snd_pcm_dmix_open) unable to open slave
2023-12-03 14:33:30,174 INFO     adjust noise
2023-12-03 14:33:30,386 INFO     Listening in mode 0
2023-12-03 14:33:35,379 INFO     audio captured

A note on the ALSA logs: These logs are looking like errors but do not harm and to get rid of them is quite tricky as they are coming from OS level programs so I decided to ignore them.

How to configure and use it

The beginning of the script contains some configuration parameters which can be adapted based on your requirements. It is mainly about the language which also has an impact on the keywords for the different modes.

# configure used openai model
openai_model = "gpt-3.5-turbo"
....
# activation name (small letters)
assistant_name = "alberto"
# timeout in seconds when no noise
timeout_listen_per_round = 10
# minimal seconds listened on input (name must fit)
min_timeout_listen_on_voice = 5
# dynamic seconds listened on input (x listeningMode)
max_timeout_factor_listen_on_voice = 5
# when 1 no openai api call instead a fix text is returned

The openai_model parameter can be used to configure which openai model should be used, e.g. you can change it to use gpt-4.

The assistant_name defines the activation word of the voice assistant. In my case it is alberto, so whenever someone says "alberto" the voice assistant start listening for a sentence which is then send to the API. It is also used for ending the listening (together with another key word).

The timeout parameters are defining how long the script is listening before the captured text is evaluated. For example is it important that min_timeout_listen_on_voice is long enough (in seconds) to speak the assistant_name.

The next part in the script is specifically for the language and keyword config. My version is configured for german.

# ----------- Language and keyword config -----------
# output speech - language
engine.setProperty("voice", "german")
# input speech - language
speech_to_text_lang = "de-DE"
# Message when activated by name
confirm_listening_text = "Ja ich höre"
# assistant_name + this string signals stop listening
end_keyword = " ende"
# Message when deactivated
confirm_stop_text = "Es war mir eine Ehre zu dienen"
# Trigger long question mode
long_input_keyword = "lange frage"
# Message when long question mode activated
confirm_long_input_text = "OK ich höre dir länger zu"
# Trigger program exit
program_exit_keyword = "beende dich"
# Message when long question mode activated
confirm_program_exit_text = "Lebe lang und in Frieden"
# key word to bring running assistant into test mode
test_mode_keyword = "testmodus"
# the fixed message for test mode
test_message = "Das ist nur eine Testausgabe"

The config parameters should be self explanatory. A last note on the test mode. In test mode the API call is not executed, instead just a fixed message is returned. This is useful if you play around with language and keyword settings because the API call is not for free (although it is very cheap) and the API call usually takes some time.

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *