Intelligent Speech Technology

1.NLP(Natural Language Processing)

Natural language processing is an important direction in the field of computer science and artificial intelligence. It studies theories and methods that enable effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science and mathematics. Research in this area will involve natural language and the language that people use in their daily lives, so it is closely related to the study of linguistics, but it is different. Natural language processing is not a general study of natural language, but the development of computer systems that can effectively realize natural language communication, especially the software system, is a part of computer science.
Natural language processing is mainly used in machine translation, public opinion monitoring, automatic summarization, opinion extraction, text classification, question answering, text semantic comparison, speech recognition, Chinese OCR and so on.
Natural language processing refers to the technology of interacting with machines by using the natural language used by humans to communicate. Natural language can be read and understood by computers through artificial processing. The related research of natural language processing began with human’s exploration of machine translation. Although natural language processing involves multi-dimensional operations such as speech, grammar, semantics and pragmatics, in short, the basic task of natural language processing is to divide the processed corpus into words based on ontology dictionaries, word frequency statistics, contextual semantic analysis, etc., so as to form lexical units with the smallest part of speech as the unit and rich in semantics.
Natural language processing is a subject that uses computer technology to analyze, understand and process natural language. It takes computer as a powerful tool for language research, carries out quantitative research on language information with the support of computer, and provides language description that can be used by both human and computer.
Natural Language Understanding (NLU) and Natural Language Generation (NLG) are two core tasks. It is typically a borderline interdisciplinary field that involves language science, computer science, mathematics, cognitive science, logic, etc., and focuses on the interaction between computers and human (natural) language.
To realize natural language communication between human and machine means to make the computer not only understand the meaning of natural language text, but also express the given intention and thought in natural language text. The former becomes natural language understanding and the latter natural language generation.

Neural network natural language processing
After 2008, deep learning began to exert its power in speech and images. NLP researchers began to turn their attention to deep learning. Use deep learning to compute features or build a new feature, and then experience the effects within the existing statistical learning framework. For example, search engines have added deep learning to calculate the similarity of search terms and documents to improve the relevance of searches. Since 2014, attempts have been made to model end-to-end training directly through deep learning. Progress has been made in machine translation, question answering, reading comprehension and other fields, and there is a boom in deep learning.

2. Concept and Technology

1) Information Extraction (IE)

Information extraction is the process of extracting and converting unstructured information embedded in text into structured data, and extracting the relationship between named entities from the corpus composed of natural language, which is a deeper research based on named entity recognition.
The main process of information extraction has three steps:

(1) Automatic processing of unstructured data.
(2) Targeted extraction of text information.
(3) Structured representation of extracted information.
The most basic work of information extraction is named entity recognition, and the core lies in the extraction of entity relations.

2) Automatic summarization

An information compression technology that uses a computer to automatically extract text information according to a certain rule and assemble it into a short summary, aiming to achieve two goals: first, to make the language short, and second, to preserve important information.

3) Speech recognition technology

The technology that allows the machine to convert the speech signal into the corresponding text or command through the recognition and understanding process, that is, to make the machine understand the human speech, its goal is to convert the lexical content of the human speech into computer-readable data. To do this, one must first decompose continuous speech into words, phonemes, and other units, and also establish a set of rules for understanding semantics. Speech recognition technology from the process of front-end noise reduction, speech cutting and framing, feature extraction, state matching these several parts. The framework can be divided into three parts: acoustic model, language model and decoding.

4) Transformer model

It was first proposed by the Google team in 2017. Transformer is a model that uses attention mechanisms to accelerate deep learning algorithms. The model consists of a set of encoders and a set of decoders. The encoder is responsible for processing input of any length and generating its expression, and the decoder is responsible for converting the new expression into the target word.
The Transformer model leverages the attention mechanism to take the relationships between all the other words and generate a new representation of each word.
The advantage of Transformer is that its attention mechanism can capture the relationships between all the words in a sentence without considering their position. Traditional encoder-decoder models must be combined with Recurrent Neural Networks (RNN) or Convolutional Nerual Networks (CNN) before the model is abandoned. The inherent model of convolutional neural network (Convolutional neural network) uses the structure of full Attention instead of LSTM(Long and Short Term Memory, long and short term memory), which reduces the amount of computation and improves the parallel efficiency without damaging the final experimental results. The disadvantage of this model is that the calculation amount is too large, and the use of location information is not obvious, which can not capture long-distance information.

5) Natural language processing technology based on traditional machine learning

Natural language processing can classify processing tasks to form multiple sub-tasks. Traditional mechanical learning methods can use SVM(Support Vector Machine) model,Markov(Markov) model,CRF(Conditional Random Field, Conditional random field, conditional random field, conditional random field, conditional random field, conditional random field, conditional random field, conditional random field, conditional random field, conditional random field, conditional random field, conditional random field. Methods such as conditional random field (CCF) model are used to process multiple subtasks in natural language to further improve the precision of processing results.
However, from the actual effect, there are still the following deficiencies:

(1) The performance of the traditional machine learning training model exceeds the quality of the training set, and the training set needs to be manually labeled, which reduces the training efficiency.
(2) The training set in the traditional machine learning model will have greatly different application effects in different fields, which weakens the applicability of training and exposes the drawbacks of a single learning method. Making a training data set applicable to multiple different domains requires a lot of human resources to manually annotate.
(3) When processing higher-order and more abstract natural languages, machine learning cannot manually label these natural language features, so that traditional machine learning can only learn pre-established rules, but can not learn complex language features outside the rules.

6) Natural language processing technology based on deep learning

Deep learning is a branch of machine learning. Deep learning models, such as convolutional neural networks and recurrent neural networks, need to be applied in natural language processing to complete the process of classification and understanding of natural language by learning generated word vectors.
Compared with traditional machine learning, natural language processing technologies based on deep learning have the following advantages:
(1) Based on the vectorization of words or sentences, deep learning can continuously learn language features, master higher-level and more abstract language features, and meet the natural language processing requirements of a large number of feature engineering.
(2) Deep learning does not require experts to manually define training sets, and can automatically learn high-level features through neural networks.

3. Technical difficulties

There are no rules/free combinations/open sets/knowledge dependencies/contexts

1) Effective definition of content

In daily life, the words between sentences usually do not exist in isolation, and all the words in the discourse need to be related to each other to express the corresponding meaning. Once a specific sentence is formed, the corresponding defining relationship between the words will be formed. Without effective definition, the content becomes ambiguous and cannot be effectively understood.

2) Disambiguation and ambiguity

The use of words and sentences in different situations often has multiple meanings, and it is easy to produce vague concepts or different ideas.

3) Defective or irregular input

Dealing with foreign or regional accents in speech processing, dealing with spelling, grammar, or OCR errors in text processing.

4) Language behavior and planning

Sentences often take more than their meaning literally and need to be analyzed in context.

4. Relational technology

1) Computer science

2) Internet technology

3) Machine learning methods

5. Tools and platforms

NLTK: The comprehensive python base NLP library
StanfordNLP: a library of NLP algorithms commonly used in academia.

6. Typical applications

Sentiment analysis
chatbot
Speech recognition
Machine translation