About the Project
AI technology has developed significantly over the past few years. For example, there are many types of AI such as voice assistants like Siri and Alexa, generated models like Chat GDP and DALL-E, self-driving AI, and image recognition AI. They are very useful close to us and support our lives. However, AI does not have emotion, making it difficult for humans to have natural conversations with AI. Also, since the AI we are familiar with only generates text, it is difficult to have natural conversations like humans.
In order to get AI and humans close to having a natural conversation, I aimed to develop AI Assistant that combines various technologies such as generative AI, Machine Learning, voice recognition, avatars, animation, and voice. The main target of AI Assistant is new or current Texas Wesleyan University students who are interested in the Computer Science Major.
Features
- Speech to Text:
- Skill: Web Speech API
- Role: Converts user voice input into text data in real time.
- Data format: Speech - Text (input)
- AI Response:
- Skill: Open AI API
- Role: Based on text input, generate text responses accordingly.
- Data format: Text (input) - Text (response)
- Emotional Analysis:
- Skill: Open AI API
- Role: Analyzes the user's input text and the AI's response to determine the avatar's expression. [smiling, sleepy, expectation, sad, embarrassed, surprised, angry]
- Data format: Text (input + response) - Text (emotion/expression data)
- Text to Speech:
- Skill: Amazon Polly
- Role: Converts response text into natural speech.
- Data format: Text(response) - Mp4
- Audio Analysis and Play:
- Skill: Web Audio API
- Role: Playback of mp4 converted by Amazon Polly. Analyze audio data and get frequency data for lip sync of avatar.
- Data format: Mp4 - Frequency data
- Avatar Motion:
- Skill: Live 2D Cubism SDK for web
- Role: Provide avatars with expressions and movements, and display animations based on user responses.
- Avatar:
- Skill: Live2D
- Role: Avatar made with Live2D
- Auth:
- Skill: Firebase Auth
- Role: Manage user authentication and provide secure access to the system.
System
- System Architecture:
- Data Flow:
Technologies
- Web Speech API (speech-to-text):
- Open AI API:
- Amazon Polly (text-to-speech):
- Web Audio API:
- Live2D Cubism SDK for Web:
- Firebase Auth:
The Web Speech API is a native browser API developed by the W3C. It is free to use and does not require any auth keys. It allows speech recognition to text and text to speech using JavaScript. It requires no backend and is easy to implement. I use this API as speech to text.
The OpenAI API is an advanced platform that provides a wide variety of functions using natural language processing (NLP). By using API keys to authenticate, the user can use text generation, image recognition, voice processing, image generation, video generation, and more.
Amazon Polly is a text-to-speech service that uses deep learning to generate natural, human-like speech. It is also available in many languages. You can choose between standard voice and high-quality neural voice, 62 different voice qualities, and gender and usage. The output files are encrypted with 256-bit Advanced Encryption Standard, providing security. Amazon Polly is suitable for video narration and automated voice. To use it, you must create an AWS account.
Web Audio API is an API that allows users to generate, edit, and analyze audio in real time in the browser. It is able to perform speech synthesis, frequency analysis, and more.
Live 2D SDK is a development kit for displaying and manipulating Live2D models in application. It supports a variety of platforms including Unity, WebGL, Web, and consumer platforms such as PS5 and Nintendo Switch. It enables real-time rendering and character expression. This time I will use the web.
Live 2D SDK is a development kit for displaying and manipulating Live2D models in application. It supports a variety of platforms including Unity, WebGL, Web, and consumer platforms such as PS5 and Nintendo Switch. It enables real-time rendering and character expression. This time I will use the web.
Demos
LINK
Contact
For inquiries, please reach out to me at rtsunamura01@gmail.com.