[chatbot + AI = 下一代操作模式][26]賦予chatbot OCR的能力 - 加入對發票的功能

2018-08-05 Sunday

「chatbot + AI = 下一代操作模式」 ai azure bot framework chatbot cognitive service computer-vision

[chatbot + AI = 下一代操作模式][26]賦予chatbot OCR的能力 - 加入對發票的功能.jpg — 圖片來源：https://pixabay.com/en/books-spine-colors-pastel-1099067/

在上一篇([25]使用Computer Vision - 如何設定、看文件以及使用REST API測試)看完了如何建立Computer Vision的Key，瞭解如何看REST Api的文件并且用Postman做服務測試。

這一篇將把OCR的功能整合到chatbot裡面，看看實際開發起來是個什麽感覺。

這篇的程式碼github頁面是alantsai-samples/mhat-hotelbot:blog/chapter-26

加入發票識別功能
調整程式碼

建立一個新的Dialog專門處理對發票
建立一個Service把OCR服務包起來
把Dialog和Service結合起來
把RootLuisDialog和對發票結合
測試結果

結語

加入發票識別功能

一個訂房相關的chatbot哪裏會用到OCR其實我想不太出來，目前想到兩種情景：

訂房完有發票或訂房確認訊息 - 可以用OCR方式把他變成電子内容給使用者提供一些服務
飯店設施有錯誤的時候的查詢 - 例如假設有借電腦的服務，如果電腦有問題可能會出現錯誤訊息，但是這個錯誤訊息對不常用電腦的人來説可能看不懂，因此可以用OCR服務搭配顯示出任可看的訊息

OCR只是一個技術，怎麽使用就看大家的創意了。

今天要加入的功能是，讓chatbot可以透過拍照發票的方式直接識別出發票號碼，然後看有沒有中獎。（好吧，其實就是一個對發票的功能，沒什麽創意Orz）。

調整程式碼

那要加入這個功能大概會拆成以下幾個步奏：

建立一個新的Dialog專門處理對發票
建立一個Service把OCR服務包起來
把Dialog和Service結合起來
把RootLuisDialog和對發票結合
測試

建立一個新的Dialog專門處理對發票

建立一個Dialog叫做ReceiptRecognizerDialog。這個Dialog作用很簡單，就是會呼叫OCR的服務，然後返回識別出來的發票號碼。

[Serializable]
public class ReceiptRecognizerDialog : IDialog<string>
{
	public Task StartAsync(IDialogContext context)
	{
		throw new NotImplementedException();
	}
}

目前實作的部分先暫時不理他，等一下再回來處理。

建立一個Service把OCR服務包起來

還記得在上篇介紹API文件的時候有提到，在最下面有提供sample code。而C#的sample code是透過HttpClient直接對接REST Api取得結果。

不過在實際開發上還是希望透過物件的方式來呼叫服務，因此Sample Code的方式最好還是包一層比較好呼叫。

微軟在Computer Vision的部分有提供一個SDK，已經把REST Api包好了，因此將會使用這個SDK作爲基礎。

首先，先安裝nuget套件：Install-Package Microsoft.ProjectOxford.Vision

題外話，以前Cognitive Service有個另外個名稱是Project Oxford（牛津計劃），後來又改名了。不過SDK名稱沒有改。其他Cognitive Service有提供SDK也將會在ProjectOxford名稱下面。

安裝好了之後，建立出一個OCRService，這個Service提供了兩個方法：

傳入圖片url的方式來辨別
傳入一個Stream，可以用作本地檔案上傳的時候用

public class OCRService
    {
        public OCRService()
        {
            VisionServiceClientInstance =
                 new VisionServiceClient
                    (ConfigurationManager.AppSettings["ComputerVision.Key"],
                    ConfigurationManager.AppSettings["ComputerVision.Url"]);
        }

        public VisionServiceClient VisionServiceClientInstance { get; }

        public async Task GetOcrResultAsync
            (Stream imageStream, string languageCode = "unk")
        {
            return await VisionServiceClientInstance.RecognizeTextAsync(imageStream, languageCode);
        }

        public async Task GetOcrResultAsync
            (string imageUrl, string languageCode = "unk")
        {
            return await VisionServiceClientInstance.RecognizeTextAsync
                (imageUrl, languageCode);
        }

這邊有兩個值是從AppSetting取得，分別為：

ComputerVision.Key：這個是取得建立服務得到的Key
ComputerVision.Url：這個是服務的網址，例如：https://southeastasia.api.cognitive.microsoft.com/vision/v1.0

把Dialog和Service結合起來

這邊將整合兩種情況：

如果傳入的是一個網址
直接傳送圖片的方式

首先，調整Dialog裡面的内容：

public async Task StartAsync(IDialogContext context)
{
	await context.PostAsync
		("請上傳發票圖片或者發票圖片的網址");

	context.Wait(MessageReceivedAsync);
}

private async Task MessageReceivedAsync
	(IDialogContext context, 
		IAwaitable<IMessageActivity> result)
{
	var messageResult = await result;

	var cvs = new OCRService();

	var finalResult = string.Empty;

	// 上傳圖片的處理
	if (messageResult.Attachments
			?.Any(a => a.ContentType.Contains("image")) 
				?? false)
	{
		var attachment =
			messageResult.Attachments.FirstOrDefault
				(x => x.ContentType.Contains("image"));

		var imageStream = await
			messageResult.GetConnector()
				.GetImageStream(attachment);

		var ocrResult = await cvs
			.GetOcrResultAsync(imageStream, "zh-Hant");

		finalResult = ProcessImageOcrResult(context, ocrResult);
	}
	// 圖片網址的處理
	else if (Uri.IsWellFormedUriString
				(messageResult.Text, UriKind.Absolute))
	{
		var ocrResult = await cvs
			.GetOcrResultAsync(messageResult.Text, "zh-Hant");

		finalResult = ProcessImageOcrResult(context, ocrResult);
	}

	context.Done(finalResult);
}

上面這段的程式碼有幾個重要的地方：

圖片取得的邏輯

有些channel的圖片直接在Attachment裡面，不過有些要做額外處理。例如Skype就需要用特殊的API才能夠取得圖片。因此，裡面有幾個方法是從Extension Helper來的，分別為：

GetConnector()
GetImageStream()

這個程式碼就不貼在部落格裏面了，詳細可以參考github上面。 Helper/IActivityHelper.cs、 Helper/IConnectorClientHelper.cs

處理OCR識別的結果

private string ProcessImageOcrResult(IDialogContext context,
	OcrResults ocrResult)
{
	var result = string.Empty;

	// 偷懶，發票號碼格式是：AA-12345678
	// 因此找出第3個字母是-的就算是發票號碼
	var foundErrorCode = ocrResult.Regions.SelectMany(x => x.Lines)
						.SelectMany(x => x.Words)
						.FirstOrDefault(x => x.Text.Length > 3 
							&& x.Text.Substring(2, 1) == "-");

	if (foundErrorCode != null)
	{
		result = foundErrorCode.Text;
	}

	return result;
}

這邊偷懒了，如果真的是production程式碼，判斷哪裏屬於發票號碼需要更加嚴謹一些。

把RootLuisDialog和對發票結合

到目前爲止，一切都准備好了，剩下就是怎麽觸發對發票的Dialog。

這邊，先去luis.ai上面加入一個新的intent叫做ReceiptRecognizer這個將會觸發剛剛建立的Dialog：

[LuisIntent("ReceiptRecognizer")]
public Task ReceiptRecognizer
	(IDialogContext context, LuisResult result)
{
	context.Call(new ReceiptRecognizerDialog(),
		ReceiptRecognizerAfterAsync);

	return Task.CompletedTask;
}

private async Task ReceiptRecognizerAfterAsync
	(IDialogContext context,
		IAwaitable<string> result)
{
	var finalResult = await result;

	if(string.IsNullOrEmpty(finalResult) == false)
	{
		await context.PostAsync($"您的發票號碼是：{finalResult}");
	}
	else
	{
		await context.PostAsync("識別發票號碼失敗");
	}

	context.Wait(MessageReceived);
}