柚子快報激活碼778899分享:深度學(xué)習(xí) 人工智能 RASA
文章目錄
NLU Training Data - 自然語言理解訓(xùn)練數(shù)據(jù)Training Examples - 訓(xùn)練示例Entities - 實體Synonyms - 同義詞Regular Expressions - 正則表達式`Regular Expressions` for `Intent Classification` - `正則表達式` 用于 `意圖分類`Regular Expressions for Entity Extraction - 用于實體提取的正則表達式Regular Expressions as `Features` - 正則表達式作為 `特征`Regular Expressions for `Rule-based Entity Extraction` - 用于 `基于規(guī)則的實體提取` 的正則表達式
Lookup Tables - 查找表Entities Roles and Groups - 實體角色和組Entity Roles and Groups influencing `dialogue predictions` - 實體角色和組影響`對話預(yù)測``BILOU` Entity Tagging - `BILOU`實體標記
NLU Training Data - 自然語言理解訓(xùn)練數(shù)據(jù)
NLU training data stores structured information about user messages.
自然語言理解(NLU)訓(xùn)練數(shù)據(jù)存儲了關(guān)于用戶消息的結(jié)構(gòu)化信息。
The goal of NLU (Natural Language Understanding) is to extract structured information from user messages.
自然語言理解(NLU)的目標是從用戶消息中提取結(jié)構(gòu)化信息。
This usually includes the user’s intent and any entities their message contains.
這通常包括用戶的意圖以及他們的消息中包含的任何實體。
You can add extra information such as regular expressions and lookup tables to your training data to help the model identify intents and entities correctly.
你可以向你的訓(xùn)練數(shù)據(jù)中添加額外信息,例如正則表達式和查找表,以幫助模型正確地識別意圖和實體。
這里就有幾個概念,第一個概念是message,第二個概念是intent和entities,
從message到intent entities的轉(zhuǎn)化過程,就是NLU。
第三個概念是訓(xùn)練數(shù)據(jù)
第四個概念是訓(xùn)練數(shù)據(jù)的額外信息,regular exressions和lookup tables。
Training Examples - 訓(xùn)練示例
NLU training data consists of example user utterances categorized by intent.
自然語言理解(NLU)訓(xùn)練數(shù)據(jù)由根據(jù)意圖分類的用戶示例語句組成。
message
intent - utterances
To make it easier to use your intents, give them names that relate to what the user wants to accomplish with that intent, keep them in lowercase, and avoid spaces and special characters.
為了使你的意圖更容易使用,給它們起一個與用戶想要通過該意圖``完成的事情相關(guān)的名字,保持它們?yōu)樾?,并避免使用空格和特殊字符?/p>
給意圖起個名字的意思。
NOTE - 注意
The / symbol is reserved as a delimiter to separate retrieval intents from response text identifiers.
/ 符號被保留作為分隔符,用于將檢索意圖與響應(yīng)文本標識符分開。
retrieval intents和響應(yīng)文本標識符
Make sure not to use it in the name of your intents.
請確保不要在意圖的名稱中使用/。
Entities - 實體
Entities are structured pieces of information inside a user message.
實體是用戶消息中的結(jié)構(gòu)化信息片段。
For entity extraction to work, you need to either specify training data to train an ML model or you need to define regular expressions to extract entities using the RegexEntityExtractor based on a character pattern.
為了進行實體提取,你需要指定訓(xùn)練數(shù)據(jù)來訓(xùn)練機器學(xué)習(xí)模型,或者你需要定義正則表達式,使用基于字符模式的RegexEntityExtractor來提取實體。
實體抽取可以使用模型,也可以使用正則表達式
When deciding which entities you need to extract, think about what information your assistant needs for its user goals.
在決定需要提取哪些實體時,要考慮你的助手實現(xiàn)其用戶目標需要哪些信息。
user goals,這也是一個概念。
The user might provide additional pieces of information that you don’t need for any user goal; you don’t need to extract these as entities.
用戶可能會提供、你不需要用于任何用戶目標的額外信息片段;你不需要將這些作為實體提取。
See the training data format for details on how to annotate entities in your training data.
有關(guān)如何在訓(xùn)練數(shù)據(jù)中標注實體的詳細信息,請參閱訓(xùn)練數(shù)據(jù)格式。
Synonyms - 同義詞
Synonyms map extracted entities to a value other than the literal text extracted in a case-insensitive manner.
同義詞將提取的實體映射到除提取的文本字面量之外的其他值,以不區(qū)分大小寫的方式。
You can use synonyms when there are multiple ways users refer to the same thing.
當用戶有多種方式來指代同一件事時,你可以使用同義詞。
Think of the end goal of extracting an entity, and figure out from there which values should be considered equivalent.
考慮提取實體的最終目標,并從那里開始確定哪些值應(yīng)被視為等效。
Let’s say you had an entity account that you use to look up the user's balance.
假設(shè)你有一個賬戶實體,用于查找用戶的余額。
One of the possible account types is “credit”.
可能的賬戶類型之一是"credit"。
Your users also refer to their “credit” account as “credit account” and “credit card account”.
你的用戶還將他們的"credit"賬戶稱為"credit account"和"credit card account"。
In this case, you could define “credit card account” and “credit account” as synonyms to “credit”:
在這種情況下,你可以將"credit card account"和"credit account"定義為"credit"的同義詞:
nlu:
- synonym: credit
examples: |
- credit card account
- credit account
Then, if either of these phrases is extracted as an entity, it will be mapped to the value credit.
然后,如果其中任何一個短語被提取為實體,它將被映射到值credit。
Any alternate casing of these phrases (e.g. CREDIT, credit ACCOUNT) will also be mapped to the synonym.
這些短語的任何替代大小寫(例如CREDIT,credit ACCOUNT)也將映射到同義詞。
PROVIDE TRAINING EXAMPLES
提供訓(xùn)練示例
Synonym mapping only happens after entities have been extracted.
同義詞映射只在實體被提取之后發(fā)生。
That means that your training examples should include the synonym examples (credit card account and credit account) so that the model will learn to recognize these as entities and replace them with credit.
這意味著你的訓(xùn)練示例應(yīng)該包括同義詞示例(credit card account和credit account),以便模型能夠識別這些實體并用credit替換它們。
See the training data format for details on how to include synonyms in your training data.
有關(guān)如何在訓(xùn)練數(shù)據(jù)中包括同義詞的詳細信息,請參閱訓(xùn)練數(shù)據(jù)格式。
Regular Expressions - 正則表達式
You can use regular expressions to improve intent classification and entity extraction in combination with the RegexFeaturizer and RegexEntityExtractor components in the pipeline.
您可以使用 正則表達式 來與管道中的 RegexFeaturizer 和 RegexEntityExtractor 組件結(jié)合,改善 意圖分類 和 實體提取。
Regular Expressions for Intent Classification - 正則表達式 用于 意圖分類
You can use regular expressions to improve intent classification by including the RegexFeaturizer component in your pipeline.
您可以通過在管道中包含 RegexFeaturizer 組件來使用 正則表達式 改善 意圖分類。
When using the RegexFeaturizer, a regex does not act as a rule for classifying an intent.
當使用 RegexFeaturizer 時,正則表達式 不作為 分類 一個 意圖 的 規(guī)則。
It only provides a feature that the intent classifier will use to learn patterns for intent classification.
它只提供一個 特征,意圖分類器 將使用這個 特征 來 學(xué)習(xí)模式 進行 意圖分類。
意圖分類的時候,正則表達式不是規(guī)則,不是規(guī)則,不是規(guī)則。
意圖分類的時候,正則表達式是特征,是特征,是feature。
Currently, all intent classifiers make use of available regex features.
目前,所有的意圖分類器都使用可用的正則表達式特征。
The name of a regex in this case is a human readable description.
在這種情況下,正則表達式的名稱是人類可讀的描述。
It can help you remember what a regex is used for, and it is the title of the corresponding pattern feature.
它可以幫助你記住正則表達式的用途,并且是相應(yīng)模式特征的標題。
It does not have to match any intent or entity name.
它不必與任何意圖或?qū)嶓w名稱匹配。
A regex for a “help” request might look like this:
一個用于“幫助”請求的正則表達式可能如下所示:
nlu:
- regex: help
examples: |
- \bhelp\b
The intent being matched could be greet,help_me, assistance or anything else.
正在匹配的意圖可能是greet、help_me、assistance或其他任何內(nèi)容。
正則表達式就是正則表達式 ,正則表達式的名字就是正則表達式的名字
正則表達式的名字跟意圖的名字沒有什么關(guān)系,跟實體的名字沒有什么關(guān)系
Try to create your regular expressions in a way that they match as few words as possible.
嘗試以匹配盡可能少的單詞的方式創(chuàng)建你的正則表達式。
E.g. using \bhelp\b instead of help.*, as the later one might match the whole message whereas the first one only matches a single word.
例如,使用\bhelp\b而不是help.*,因為后者可能會匹配整個消息,而第一個只匹配一個單詞。
PROVIDE TRAINING EXAMPLES
提供訓(xùn)練示例
The RegexFeaturizer provides features to the intent classifier, but it doesn’t predict the intent directly.
RegexFeaturizer為意圖分類器提供特征,但它不直接預(yù)測意圖。
RegexFeaturizer,瑞吉克斯費切艾澤
Include enough examples containing the regular expression so that the intent classifier can learn to use the regular expression feature.
包含足夠的包含正則表達式的示例,以便意圖分類器可以學(xué)習(xí)使用正則表達式特征。
Regular Expressions for Entity Extraction - 用于實體提取的正則表達式
If your entity has a deterministic structure, you can use regular expressions in one of two ways:
如果您的 實體 具有 確定的結(jié)構(gòu),您可以通過以下兩種方式之一使用 正則表達式:
Regular Expressions as Features - 正則表達式作為 特征
You can use regular expressions to create features for the RegexFeaturizer component in your NLU pipeline.
您可以使用 正則表達式 為 NLU管道 中的 RegexFeaturizer 組件創(chuàng)建 特征。
When using a regular expression with the RegexFeaturizer, the name of the regular expression does not matter.
當使用 正則表達式 與 RegexFeaturizer 時,正則表達式 的名稱 無關(guān)緊要。
When using the RegexFeaturizer, a regular expression provides a feature that helps the model learn an association between intents/entities and inputs that fit the regular expression.
當使用 RegexFeaturizer 時,正則表達式提供了一個 特征,有助于 模型 學(xué)習(xí) 意圖/實體 和符合 正則表達式 的 輸入 之間的 關(guān)聯(lián)。
PROVIDE TRAINING EXAMPLES
提供訓(xùn)練示例
The RegexFeaturizer provides features to the entity extractor, but it doesn’t predict the entity directly.
RegexFeaturizer 為 實體提取器 提供 特征,但它不會直接 預(yù)測 實體。
Include enough examples containing the regular expression so that the entity extractor can learn to use the regular expression feature.
包含足夠多的包含 正則表達式 的示例,以便 實體提取器 可以學(xué)習(xí)使用 正則表達式特征。
Regex features for entity extraction are currently only supported by the CRFEntityExtractor and DIETClassifier components.
目前,僅 CRFEntityExtractor 和 DIETClassifier 組件支持用于 實體提取 的 正則表達式特征。
Other entity extractors, like MitieEntityExtractor or SpacyEntityExtractor, won’t use the generated features and their presence will not improve entity recognition for these extractors.
其他 實體提取器,如 MitieEntityExtractor 或 SpacyEntityExtractor,將不會使用生成的特征,并且它們的 存在 不會改善這些提取器的 實體識別。
Regular Expressions for Rule-based Entity Extraction - 用于 基于規(guī)則的實體提取 的正則表達式
You can use regular expressions for rule-based entity extraction using the RegexEntityExtractor component in your NLU pipeline.
您可以使用 NLU管道 中的 RegexEntityExtractor 組件進行 基于規(guī)則的實體提取。
When using the RegexEntityExtractor, the name of the regular expression should match the name of the entity you want to extract.
當使用 RegexEntityExtractor 時,正則表達式的名稱 應(yīng)該匹配 您要提取的 實體 的名稱。
For example, you could extract account numbers of 10-12 digits by including this regular expression and at least two annotated examples in your training data:
例如,您可以通過在訓(xùn)練數(shù)據(jù)中包含此 正則表達式 和至少 兩個注釋過的示例 來提取 10-12位的賬號:
nlu:
- regex: account_number
examples: |
- \d{10,12}
- intent: inform
examples: |
- my account number is [1234567891](account_number)
- This is my account number [1234567891](account_number)
Whenever a user message contains a sequence of 10-12 digits, it will be extracted as an account_number entity.
每當用戶消息包含 10-12位的數(shù)字序列 時,它將被提取為 account_number實體。
RegexEntityExtractor doesn’t require training examples to learn to extract the entity, but you do need at least two annotated examples of the entity so that the NLU model can register it as an entity at training time.
RegexEntityExtractor 不需要訓(xùn)練示例來學(xué)習(xí)提取 實體,但您 確實需要 至少 實體的兩個注釋過的示例,以便 NLU模型 可以在訓(xùn)練時將其 注冊 為 實體。
Lookup Tables - 查找表
Lookup tables are lists of words used to generate case-insensitive regular expression patterns.
查找表是用于生成不區(qū)分大小寫的正則表達式模式的單詞列表。
查找表是單詞列表
They can be used in the same ways as regular expressions are used, in combination with the RegexFeaturizer and RegexEntityExtractor components in the pipeline.
它們可以像使用正則表達式一樣與管道中的RegexFeaturizer和RegexEntityExtractor組件結(jié)合使用。
You can use lookup tables to help extract entities which have a known set of possible values.
您可以使用查找表來幫助提取實體,這些實體具有已知的可能值集。
Keep your lookup tables as specific as possible.
使您的查找表盡可能具體。
For example, to extract country names, you could add a lookup table of all countries in the world:
例如,為了提取國家名稱,您可以添加一個包含世界上所有國家的查找表:
nlu:
- lookup: country
examples: |
- Afghanistan
- Albania
- ...
- Zambia
- Zimbabwe
When using lookup tables with RegexFeaturizer, provide enough examples for the intent or entity you want to match so that the model can learn to use the generated regular expression as a feature.
當使用查找表與RegexFeaturizer時,為要匹配的意圖或?qū)嶓w提供足夠的示例,以便模型可以學(xué)習(xí)使用生成的正則表達式作為特征。
When using lookup tables with RegexEntityExtractor, provide at least two annotated examplesof the entity so that the NLU model can register it as an entity at training time.
當使用查找表與RegexEntityExtractor時,為實體提供至少兩個注釋過的示例,以便NLU模型可以在訓(xùn)練時將其注冊為實體。
Entities Roles and Groups - 實體角色和組
Annotating words as custom entities allows you to define certain concepts in your training data.
將單詞標注為自定義實體允許您在訓(xùn)練數(shù)據(jù)中定義某些概念。
For example, you can identify cities by annotating them:
例如,您可以通過標注來識別城市:
I want to fly from [Berlin]{"entity": "city"} to [San Francisco]{"entity": "city"} .
However, sometimes you want to add more details to your entities.
但是,有時您希望為您的實體添加更多詳細信息。
For example, to build an assistant that should book a flight, the assistant needs to know which of the two cities in the example above is the departure city and which is the destination city.
例如,要構(gòu)建一個應(yīng)該預(yù)訂航班的助手,該助手需要知道上面示例中的兩個城市中哪個是出發(fā)城市,哪個是目的城市。
Berlin and San Francisco are both cities, but they play different roles in the message.
柏林和舊金山都是城市,但它們在消息中扮演的角色不同。
To distinguish between the different roles, you can assign a role label in addition to the entity label.
為了區(qū)分不同的角色,您可以除了實體標簽外,還可以分配一個角色標簽。
- I want to fly from [Berlin]{"entity": "city", "role": "departure"} to [San Francisco]{"entity": "city", "role": "destination"}.
You can also group different entities by specifying a group label next to the entity label.
您還可以通過在實體標簽旁邊指定組標簽來分組不同的實體。
The group label can, for example, be used to define different orders.
組標簽例如可以用于定義不同的順序。
In the following example, the group label specifies which toppings go with which pizza and what size each pizza should be.
在以下示例中,組標簽指定了哪個配料與哪個披薩相匹配,以及每個披薩應(yīng)該是什么尺寸。
Give me a [small]{"entity": "size", "group": "1"} pizza with [mushrooms]{"entity": "topping", "group": "1"} and
a [large]{"entity": "size", "group": "2"} [pepperoni]{"entity": "topping", "group": "2"}
See the Training Data Format for details on how to define entities with roles and groups in your training data.
請參閱訓(xùn)練數(shù)據(jù)格式,以了解如何在訓(xùn)練數(shù)據(jù)中定義具有角色和組的實體。
The entity object returned by the extractor will include the detected role/group label.
提取器返回的實體對象將包括檢測到的角色/組標簽。
{
"text": "Book a flight from Berlin to SF",
"intent": "book_flight",
"entities": [
{
"start": 19,
"end": 25,
"value": "Berlin",
"entity": "city",
"role": "departure",
"extractor": "DIETClassifier",
},
{
"start": 29,
"end": 31,
"value": "San Francisco",
"entity": "city",
"role": "destination",
"extractor": "DIETClassifier",
}
]
}
NOTE
注意
Entity roles and groups are currently only supported by the DIETClassifier and CRFEntityExtractor.
實體角色和組目前僅由DIETClassifier和CRFEntityExtractor支持。
In order to properly train your model with entities that have roles and groups, make sure to include enough training examples for every combination of entity and role or group label.
為了正確地使用具有角色和組的實體來訓(xùn)練您的模型,請確保為每個實體和角色或組標簽的組合包含足夠的訓(xùn)練示例。
To enable the model to generalize, make sure to have some variation in your training examples.
為了使模型能夠泛化,請確保您的訓(xùn)練示例中有一些變化。
For example, you should include examples like fly TO y FROM x, not only fly FROM x TO y.
例如,您應(yīng)該包含像fly TO y FROM x這樣的示例,而不僅僅是fly FROM x TO y。
To fill slots from entities with a specific role/group, you need to define a from_entity slot mapping for the slot and specify the role/group that is required.
要從具有特定角色/組的實體中填充插槽,您需要為插槽``定義from_entity插槽映射,并指定所需的角色/組。
For example:
例如:
entities:
- city:
roles:
- departure
- destination
slots:
departure:
type: any
mappings:
- type: from_entity
entity: city
role: departure
destination:
type: any
mappings:
- type: from_entity
entity: city
role: destination
Entity Roles and Groups influencing dialogue predictions - 實體角色和組影響對話預(yù)測
If you want to influence the dialogue predictions by roles or groups, you need to modify your stories to contain the desired role or group label.
如果您想通過角色或組來影響對話預(yù)測,則需要修改您的故事以包含所需的角色或組標簽。
You also need to list the corresponding roles and groups of an entity in your domain file.
您還需要在域文件中列出實體的相應(yīng)角色和組。
Let’s assume you want to output a different sentence depending on what the user’s location is.
假設(shè)您想根據(jù)用戶的位置輸出不同的句子。
E.g. if the user just arrived from London, you might want to ask how the trip to London was.
例如,如果用戶剛從倫敦抵達,您可能會想問用戶的倫敦之行怎么樣。
But if the user is on the way to Madrid, you might want to wish the user a good stay.
但是,如果用戶正在前往馬德里,您可能會希望祝用戶旅途愉快。
You can achieve this with the following two stories:
您可以使用以下兩個故事來實現(xiàn)這一點:
stories:
- story: The user just arrived from another city.
steps:
- intent: greet
- action: utter_greet
- intent: inform_location
entities:
- city: London
role: from
- action: utter_ask_about_trip
- story: The user is going to another city.
steps:
- intent: greet
- action: utter_greet
- intent: inform_location
entities:
- city: Madrid
role: to
- action: utter_wish_pleasant_stay
BILOU Entity Tagging - BILOU實體標記
The DIETClassifier and CRFEntityExtractor have the option BILOU_flag, which refers to a tagging schema that can be used by the machine learning model when processing entities.
DIETClassifier和CRFEntityExtractor具有BILOU_flag選項,它指的是標記模式,該模式可用于機器學(xué)習(xí)模型在處理實體時使用。
BILOU is short for Beginning, Inside, Last, Outside, and Unit-length.
BILOU是Beginning,Inside,Last,Outside和Unit-length的縮寫。
For example, the training example
[Alex]{"entity": "person"} is going with [Marty A. Rick]{"entity": "person"} to [Los Angeles]{"entity": "location"}.
is first split into a list of tokens.
例如,訓(xùn)練示例首先被拆分成一個令牌列表。
Then the machine learning model applies the tagging schema as shown below depending on the value of the option BILOU_flag:
然后,機器學(xué)習(xí)模型會根據(jù)BILOU_flag選項的值,如下所示應(yīng)用標記模式:
The BILOU tagging schema is richer compared to the normal tagging schema.
BILOU標記模式與正常的標記模式相比更豐富。
It may help to improve the performance of the machine learning model when predicting entities.
它可能有助于提高機器學(xué)習(xí)模型在預(yù)測實體時的性能。
INCONSISTENT BILOU TAGS
不一致的BILOU標記
When the option BILOU_flag is set to True, the model may predict inconsistent BILOU tags, e.g. B-person I-location L-person.
當BILOU_flag選項設(shè)置為True時,模型可能會預(yù)測不一致的BILOU標記,例如B-person I-location L-person。
Rasa uses some heuristics to clean up the inconsistent BILOU tags.
Rasa使用一些啟發(fā)式方法來清理不一致的BILOU標記。
For example, B-person I-location L-person would be changed into B-person I-person L-person.
例如,B-person I-location L-person將被更改為B-person I-person L-person。
柚子快報激活碼778899分享:深度學(xué)習(xí) 人工智能 RASA
相關(guān)閱讀
本文內(nèi)容根據(jù)網(wǎng)絡(luò)資料整理,出于傳遞更多信息之目的,不代表金鑰匙跨境贊同其觀點和立場。
轉(zhuǎn)載請注明,如有侵權(quán),聯(lián)系刪除。