On the development trend and application prospect of speech recognition technology

1. Definition of voice recognition technology

Speech recognition technology, also known as Automatic Speech Recognition (ASR), whose goal is to convert vocabulary content in human speech into computer-readable input, such as keystrokes, binary codes, or character sequences. Unlike speaker recognition and speaker confirmation, the latter attempts to recognize or confirm the speaker who made the speech rather than the vocabulary content contained therein.

Applications of voice recognition technology include voice dialing, voice navigation, indoor device control, voice document retrieval, and simple dictation data entry. The combination of speech recognition technology and other natural language processing technologies such as machine translation and speech synthesis technology can build more complex applications, such as speech-to-speech translation.

On the development trend and application prospect of speech recognition technology

Second, the principle of speech recognition technology

The voice recognition system prompts the customer to use a new password for new occasions, so that users do not need to remember a fixed password, and the system will not be deceived by recording. Text-related voice recognition methods can be divided into dynamic time scaling or hidden Markov model methods. Text-independent voice recognition has been studied for a long time, and the performance degradation caused by inconsistent environments is a big obstacle in the application.

Its working principle:

The dynamic time scaling method uses instantaneous, variable cepstrum. In 1963 Bogert et al published "Analysis of Timing Cepstrum of Echoes". By swapping the alphabetical order, they defined a new signal processing technique with a broad vocabulary. The calculation of cepstrum usually uses fast Fourier transform.

Since 1975, the hidden Markov model has become very popular. Using the method of Hidden Markov Model, the statistical variation of the spectral characteristics can be measured. Examples of text-independent speech recognition methods include average spectrum method, vector quantization method, and multivariate autoregressive method.

The average spectrum method uses a favorable cepstral distance, and the phoneme effect in the speech spectrum is removed by the average spectrum. Using vector quantization, a set of short-term training feature vectors of the speaker can be directly used to describe the essential characteristics of the speaker. However, when the number of training vectors is large, this direct depiction is impractical because the amount of storage and calculation becomes bizarrely large. So try to use vector quantization to find an effective method to compress the training data. Montacie et al used multivariate autoregressive models to determine the characteristics of speakers in the time series of cepstrum vectors, and achieved very good results.

Wanting to deceive the voice recognition system requires a high-quality recorder, it is not very easy to buy. General audio recorders cannot record the complete spectrum of sound, and the quality loss of the recording system must also be very low. For most speech recognition systems, the imitated voice will not succeed. Recognizing identity with voice recognition is very complicated, so the voice recognition system will combine personal identification number recognition or chip card.

The speech recognition system benefits from cheap hardware equipment. Most computers have sound cards and microphones, which are also easy to use. But speech recognition still has some disadvantages. Voice changes with time, so biometric templates must be used. Voice can also change due to cold, hoarse voice, emotional stress, or puberty. Voice recognition systems have a higher rate of misrecognition than fingerprint recognition systems because people ’s voices are not as unique and unique as fingerprints. For fast Fourier transform calculations, the system requires a co-processor and more performance than the fingerprint system. Current speech recognition systems are not suitable for mobile applications or battery-powered systems.

On the development trend and application prospect of speech recognition technology

Third, the technical realization of voice recognition

Speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criterion and model training technology. Among them, the most basic is the selection of speech recognition unit.

(1) Selection of voice recognition unit. The basis of speech recognition research is the selection of speech recognition units. There are three types of speech recognition units: words (sentences), syllables and phonemes. The specific choice of speech recognition unit is determined by the type of specific research task:

Word (sentence) units are widely used in small and medium vocabulary speech recognition systems, but because the model library is too large, the model matching algorithm is complex, and the real-time is not strong, so it is not suitable for large vocabulary systems;

Syllable units are mainly used in Chinese speech recognition, because Chinese is a single-syllable structure language. Although there are about 1300 syllables, there are 408 untuned syllables, which are relatively few. It is feasible on the system.

Phoneme units have been widely used in English speech recognition before, and are increasingly used in medium and large vocabulary Chinese speech recognition systems. The reason is that the Chinese syllable is composed of only 22 initials and 28 finals. The initials are refined, although the number of models is increased, but the ability to distinguish confusing syllables is improved

(2) Feature parameter extraction technology. Feature extraction is to analyze and process the voice signal, remove the redundant information in the rich voice information, and obtain useful information for voice recognition. This is a process of compressing the information of speech signals. The feature parameter extraction technology that is often used at present is linear prediction (LP) analysis technology. The cepstrum parameters extracted based on the LP technology, coupled with the Mel parameters and the perceptual linear prediction (PLP) analysis based on the perceptual linear prediction cepstrum, simulate the sound processed by the human ear, further improving the performance of the speech recognition system.

(3) Pattern matching and model training techniques. The pattern matching and model training technology of early speech recognition applications is dynamic time correction technology (DTW), which has achieved good performance in isolated word speech recognition, but due to the inaccuracy of large vocabulary and continuous speech recognition, it has been It was replaced by Hidden Markov Model (HMM) and Artificial Neuron Network (ANN).

On the development trend and application prospect of speech recognition technology

Windows Tablet

The latest Windows has multiple versions, including Basic, Home, and Ultimate. Windows has developed from a simple GUI to a typical operating system with its own file format and drivers, and has actually become the most user-friendly operating system. Windows has added the Multiple Desktops feature. This function allows users to use multiple desktop environments under the same operating system, that is, users can switch between different desktop environments according to their needs. It can be said that on the tablet platform, the Windows operating system has a good foundation.

Windows Tablet,New Windows Tablet,Tablet Windows

Jingjiang Gisen Technology Co.,Ltd , https://www.jsgisentec.com