Tips | Alexa Skill Development Tricks & Lessons

Considerations when building an Alexa Skill

It’s always great to learn a new technology, and it’s even better when you have a fun project to drive that learning. With that in mind, I have created Mind Blown, an Alexa Skill.

Alexa Skills are just those apps that run on Amazon’s Alexa. And the skill I built is basically a memory game with the following basic idea:

There is a ball in a cup. There is a star in a glass. There is a marble in a box.

Now, without looking back at the text above, can you answer the following questions?

  • What is in the box?
  • Where is the star?
  • What is in the cup?

That’s the basic premise. I learned a lot of interesting things through this process of building the skill, especially around the engineering and the design decisions that are different from traditional app development and I’ve outlined some of the most surprising ones below:

Coding in the Cloud

The skills you create are completely coded and run in the cloud. Amazon has some pretty slick web-based IDEs (development environments) and lots of tools to help you build the conversation interface. Since it’s all online, you can immediately test your app on your device, which is super slick and important. Whatever your previous experience you have developing apps, building one in which the only interface is audio is likely going to challenge you in ways you’d never expect. Quick iteration is absolutely necessary and Amazon has provided some great tools to support Alexa developers.

You don’t even need to have an Amazon Echo to test your app (although you can pick up an Alexa Dot for like $20 bucks). There is even a third party emulator that I can attest works great.

Cloud Sessions

As a player, while any given game session is on the same host, a second session could route you to another host without your knowledge. This means there are some interesting challenges around saving state that need to be addressed or else you’ll confuse the user.

As an example, let’s say I have a skill called “John’s App”. Consider the following two examples:

Example 1:

Player:Alexa, start John’s App
Alexa:“What is your name?”
Player:“My name is Fred.”
[App store response in variable player_name]
Alexa:“Okay, Fred is your name.”

In this example, the entire conversation happens in a single session and the same host.

Example 2:

Player:Alexa, tell John’s App my name is Fred.”
[App store response in variable player_name]
Alexa:“Okay, Fred is your name.”
Player:Alexa, ask John’s App what is my name?”
Alexa:“Fred is your name.”

In this example, there are two sessions and they are on the same host. But you got lucky! Do not expect this to work every time.

Example 3:

Player:Alexa, tell John’s App my name is Fred.”
[App store response in variable player_name]
Alexa:“Okay, Fred is your name.”
Player:Alexa, ask John’s App what is my name?”
Alexa:“I do not know your name.”

In this example, there are two sessions and those sessions happen to occur on different hosts. So when the player asks “What is my name?” The local variable is not yet defined (although it is defined on a totally different host).

So, as soon as your design calls for your data to be stored between sessions, you must use a database. (And these means storing data that identifies the player.) Again, Amazon makes connecting to a database super easy.. but it does increase the complexity of your app.

In the case of “Mind Blown”, I designed around this limitation by ensuring that:

  • The start and end of sessions are clearly understood by the player.
  • Created a “keyword-based checkpoint” system, to allow players to continue where they left off. Basically, if you get to level 5 your get the secret code that allows you start off at level 5 next time.

I have another side-project that is in development that absolutely requires saving states over sessions. As I say, this has definitely increased the complexity and suddenly put me in a place where my app must store user data and subsequently requires developers to think more deeply about the player’s privacy expectations:

  • Are you storing just the device ID or do you allow players to register an account.
  • If not, what happens if multiple users access the same device?
  • If not, what happens if the same user accesses from different devices?

A can of worms for sure.

Conversational UI

If you’re not thinking deeply about conversational UI, you don’t yet now what you’re building. That’s fine, you’ll learn as you go.. but everything you think you know about what your app is going to be is likely to change based on the challenges around making an enjoyable experience for your users.

I really love this topic. I don’t have all the answers, but it is amazing how it requires you to rethink everything we take for granted when we can see our User Interface. It also naturally circles back to great accessibility design because now you’re faced with the challenges of building UI for the blind.

Most of us take for granted that we can offer the user say… five choices… by putting five buttons on the screen. Then the user just selects which of the five things they want. But with conversational UI, listing out five options is going to lead to a horrid user experience.

In fact, it is this very question around how many options can a person store in their head before they are overwhelmed that is the basis for my game, “Mind Blown“.

But it’s not just about, “What can the player store in their head?” … more importantly, “How long are they willing to listen to that list of options before they hate your app?

What works for you is likely to be unique to what you’re trying to accomplish, but it cannot be an afterthought (just like accessibility as an afterthought leads to bad experiences even if they it is technically ‘WC3 accessibility compliant’).

Making Money

Making Money with your Alexa Skill is going to be your biggest challenge. Consider the following:

  • Users are not expecting to buy skills. If your app isn’t free you’ll face a huge hurdle.
  • In-app purchases are allowed, but again… are users ready to pay? It is not unheard of, but it IS a challenge.
  • And remember, your app is running on Amazon servers. They will eventually charge you for all that cloud-based processing power if your skill is successful.

I don’t have the answer for this. These are the things I’ve seen:

  • Brand tie-ins: Where basically these Skills are just advertisements.
  • Hardware supported: Many Skills are just controllers for IOT devices. The company is making money from the sale of they physical devices and the skill is free.
  • Software extensions: In these cases, you purchase software that also happens to have an Alexa Skill that gives the user additional functionality.

There are a few apps that are heavily pushed by Amazon that make their money by charging for additional content. For example, the story book skill that will read your child a story. Want more stories? You’ll have to buy them.

But again, like Conversational UI… you have to think about this from the start.

Invocation Phrase

This is probably the simplest concept, and yet the trickiest of all. Users need a phrase to start your app that follow some fairly simple concepts.
However, those simple concepts can be incredibly limiting.

  • If it’s not a brand name, your skill’s invocation phrase needs to have at least two words. For example: You can say “Alexa play Jeopardy“. But not all of us have access to that kind of IP.
  • It needs to be short, memorable, and roll-off-the tongue. Imagine if Blizzard developed a World of Warcraft game called, “NPC Dialogue” .. and then expected the player to remember to say: “Alexa, play Blizzard’s World of Warcraft NPC Dialogue” each time.
  • You’ll want the invocation to make sense for one-off commands. For example, lets say you build an app that helps a user find their stuff using some IOT magic and you call your skill, “Where’s My Stuff“. As expected by the system, the user must say, “Alexa, ask Where’s my stuff to find my keys.“. Great! But what is the user actually going to do? They are going to say, “Alexa, where’s my keys?” .. and then get frustrated with Alexa’s default response to “where’s my”. The user doesn’t know that your skill was never invoked, but they will blame you never-the-less!
  • Oh, and your skill can share the name as another skill.. but that will likely confuse users when they are trying to find and enable your skill, even if your skill is more popular. (I know this from experience because there is another skill called “Mind Blown” and if you try to enable it via a voice command, you will get the *inferior one.

(Inferior is obviously my personal opinion only.)

Discoverability

There are some interesting ways Amazon has been trying to make apps discoverable. They have those “Hey if you like this app you might also like these other apps.. do you want to enable them?” messages when you close a skill.
However, there is a huge amount of junk skills that will give most users the expectation that enabling skills is not worth their time. It reminds me of the early days of the Google Play store. You’ll have to find a way to stand out of the crowd.

Support for Amazon Show

While you’ve built a great conversational UI experience, you’ll need to also support and test your skill on Amazon Show, the visual experience. For most, Amazon Show will work seamlessly and no changes will be needed since it will just type out exactly what was said.
However, my app required some alternative text because, if Alexa says, “There is a ball in a cup. What is in the cup?” but the Amazon Show displays the entire text. Well, there is not much of a challenge there.

Summary

Yeah, conversational UI will be the future of Human Computer Interfaces.
Yeah, Amazon makes it super easy to develop Alexa Skills.
Yeah, Amazon offers various incentives if you make one. (I got a free hoodie and a free Alexa Dot from my skill). They will even have incentives that cover the cost of the AWS processing-power.
Yeah, they are super fun to build.
Yeah, I don’t know if anyone actually installs them or how you make money with just a Alexa skill without being tied to a brand or other hardware/software.

If this was helpful.. perhaps you could check out Mind Blown and give it a rating!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.