{"id":259,"date":"2021-12-26T17:43:53","date_gmt":"2021-12-26T09:43:53","guid":{"rendered":"http:\/\/139.9.1.231\/?p=259"},"modified":"2021-12-28T20:45:12","modified_gmt":"2021-12-28T12:45:12","slug":"learning-cnn-lstm-architectures-for-image","status":"publish","type":"post","link":"http:\/\/139.9.1.231\/index.php\/2021\/12\/26\/learning-cnn-lstm-architectures-for-image\/","title":{"rendered":"Learning CNN-LSTM Architectures for Image\u8bba\u6587\u9605\u8bfb"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large is-style-default\"><img src=\"https:\/\/cdn.pixabay.com\/photo\/2021\/11\/19\/15\/21\/christmas-6809681_1280.png\" alt=\"\"\/><figcaption>\u5723\u8bde\u5feb\u4e50<\/figcaption><\/figure>\n\n\n\n<p>Abstract\uff1a<\/p>\n\n\n\n<p>       \u81ea\u52a8\u63cf\u8ff0\u56fe\u50cf\u7684\u5185\u5bb9\u662f\u8fde\u63a5\u8ba1\u7b97\u673a\u89c6\u89c9\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7684\u4eba\u5de5\u667a\u80fd\u7684\u57fa\u672c\u95ee\u9898\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u57fa\u4e8e\u6df1\u5ea6\u9012\u5f52\u4f53\u7cfb\u7ed3\u6784\u7684\u751f\u6210\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u7ed3\u5408\u4e86\u8ba1\u7b97\u673a\u89c6\u89c9\u548c\u673a\u5668\u7ffb\u8bd1\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u53ef\u7528\u4e8e\u751f\u6210\u63cf\u8ff0\u56fe\u50cf\u7684\u81ea\u7136\u53e5\u5b50\u3002\u8bad\u7ec3\u6a21\u578b\u4ee5\u5728\u7ed9\u5b9a\u8bad\u7ec3\u56fe\u50cf\u7684\u60c5\u51b5\u4e0b\u6700\u5927\u5316\u76ee\u6807\u63cf\u8ff0\u8bed\u53e5\u7684\u53ef\u80fd\u6027\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u7684\u5b9e\u9a8c\u8868\u660e\uff0c\u8be5\u6a21\u578b\u7684\u51c6\u786e\u6027\u4ee5\u53ca\u4ec5\u4ece\u56fe\u50cf\u63cf\u8ff0\u4e2d\u5b66\u4e60\u7684\u8bed\u8a00\u7684\u6d41\u7545\u6027\u3002\u6211\u4eec\u7684\u6a21\u578b\u901a\u5e38\u975e\u5e38\u51c6\u786e\uff0c\u6211\u4eec\u53ef\u4ee5\u5728\u5b9a\u6027\u548c\u5b9a\u91cf\u4e0a\u8fdb\u884c\u9a8c\u8bc1\u3002\u4f8b\u5982\uff0c\u5728Pascal\u6570\u636e\u96c6\u4e0a\uff0c\u5f53\u524d\u6700\u5148\u8fdb\u7684BLEU-1\u5f97\u5206\uff08\u8d8a\u9ad8\u8d8a\u597d\uff09\u662f25\uff0c\u800c\u6211\u4eec\u7684\u65b9\u6cd5\u5f97\u51fa\u7684\u7ed3\u679c\u662f59\uff0c\u4e0e\u4eba\u7c7b\u8868\u73b0\u572869\u5de6\u53f3\u76f8\u6bd4\u3002\u6211\u4eec\u8fd8\u663e\u793a\u4e86BLEU-1 Flickr30k\u7684\u5f97\u5206\u4ece56\u63d0\u5347\u523066\uff0cSBU\u7684\u5f97\u5206\u4ece19\u63d0\u5347\u523028\u3002\u6700\u540e\uff0c\u5728\u65b0\u53d1\u5e03\u7684COCO\u6570\u636e\u96c6\u4e0a\uff0c\u6211\u4eec\u7684BLEU-4\u4e3a27.7\uff0c\u8fd9\u662f\u5f53\u524d\u7684\u6700\u65b0\u6c34\u5e73\u3002<\/p>\n\n\n\n<p>Introduction\uff1a<\/p>\n\n\n\n<p>        \u80fd\u591f\u4f7f\u7528\u683c\u5f0f\u6b63\u786e\u7684\u82f1\u8bed\u53e5\u5b50\u81ea\u52a8\u63cf\u8ff0\u56fe\u50cf\u7684\u5185\u5bb9\u662f\u4e00\u9879\u975e\u5e38\u5177\u6709\u6311\u6218\u6027\u7684\u4efb\u52a1\uff0c\u4f46\u5b83\u53ef\u80fd\u4f1a\u4ea7\u751f\u5de8\u5927\u7684\u5f71\u54cd\uff0c\u4f8b\u5982\uff0c\u901a\u8fc7\u5e2e\u52a9\u89c6\u529b\u969c\u788d\u7684\u4eba\u4eec\u66f4\u597d\u5730\u7406\u89e3\u7f51\u7edc\u4e0a\u7684\u56fe\u50cf\u5185\u5bb9\u3002\u4f8b\u5982\uff0c\u6b64\u4efb\u52a1\u6bd4\u7ecf\u8fc7\u5145\u5206\u7814\u7a76\u7684\u56fe\u50cf\u5206\u7c7b\u6216\u5bf9\u8c61\u8bc6\u522b\u4efb\u52a1\u8981\u56f0\u96be\u5f97\u591a\uff0c\u800c\u8fd9\u4e9b\u4efb\u52a1\u5df2\u6210\u4e3a\u8ba1\u7b97\u673a\u89c6\u89c9\u9886\u57df\u7684\u4e3b\u8981\u5173\u6ce8\u70b9[27]\u3002\u5b9e\u9645\u4e0a\uff0c<strong>\u63cf\u8ff0\u4e0d\u4ec5\u5fc5\u987b\u6355\u83b7\u56fe\u50cf\u4e2d\u5305\u542b\u7684\u5bf9\u8c61\uff0c\u800c\u4e14\u8fd8\u5fc5\u987b\u8868\u8fbe\u8fd9\u4e9b\u5bf9\u8c61\u5982\u4f55\u76f8\u4e92\u5173\u8054\u4ee5\u53ca\u5b83\u4eec\u7684\u5c5e\u6027\u548c\u5b83\u4eec\u6240\u6d89\u53ca\u7684\u6d3b\u52a8<\/strong>\u3002\u6b64\u5916\uff0c\u5fc5\u987b\u8868\u8fbe\u4e0a\u8ff0\u8bed\u4e49\u77e5\u8bc6\u4ee5\u81ea\u7136\u8bed\u8a00\uff08\u4f8b\u5982\u82f1\u8bed\uff09\u8868\u793a\uff0c\u8fd9\u610f\u5473\u7740\u9664\u4e86\u89c6\u89c9\u7406\u89e3\u5916\u8fd8\u9700\u8981\u4e00\u79cd\u8bed\u8a00\u6a21\u578b\u3002<\/p>\n\n\n\n<p>\u4ee5\u524d\u7684\u5927\u591a\u6570\u5c1d\u8bd5\u90fd\u662f\u5c06\u4e0a\u8ff0\u5b50\u95ee\u9898\u7684\u73b0\u6709\u89e3\u51b3\u65b9\u6848\u7ec4\u5408\u5728\u4e00\u8d77\uff0c\u4ee5\u4fbf\u4ece\u56fe\u50cf\u8fdb\u884c\u63cf\u8ff0[6\uff0c16]\u3002\u76f8\u53cd\uff0c\u6211\u4eec\u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\u63d0\u51fa\u4e00\u4e2a\u8054\u5408\u6a21\u578b\uff0c\u8be5\u6a21\u578b\u4ee5\u56fe\u50cf I \u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u7ecf\u8fc7\u8bad\u7ec3\u4ee5\u6700\u5927\u5316\u4ea7\u751f\u76ee\u6807\u5355\u8bcd\u5e8f\u5217 S = S 1 , S 2 , . . . \u7684\u53ef\u80fd\u6027 p ( S\u2223I ) \uff0c\u5176\u4e2d\u6bcf\u4e2a\u5355\u8bcd S t \u6765\u81ea\u7ed9\u5b9a\u7684\u5b57\u5178\uff0c\u5373\u5145\u5206\u63cf\u8ff0\u56fe\u50cf\u3002<\/p>\n\n\n\n<p>\u6211\u4eec\u5de5\u4f5c\u7684\u4e3b\u8981\u7075\u611f\u6765\u81ea\u673a\u5668\u7ffb\u8bd1\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5176\u4e2d\u7684\u4efb\u52a1\u662f\u901a\u8fc7\u6700\u5927\u5316 p ( T\u2223S) \uff0c\u5c06\u4ee5\u6e90\u8bed\u8a00\u7f16\u5199\u7684\u53e5\u5b50 S \u8f6c\u6362\u4e3a\u76ee\u6807\u8bed\u8a00\u7684\u8bd1\u6587 T \u3002\u591a\u5e74\u4ee5\u6765\uff0c\u673a\u5668\u7ffb\u8bd1\u8fd8\u901a\u8fc7\u4e00\u7cfb\u5217\u5355\u72ec\u7684\u4efb\u52a1\u6765\u5b9e\u73b0\uff08\u5206\u522b\u7ffb\u8bd1\u5355\u8bcd\uff0c\u5bf9\u9f50\u5355\u8bcd\uff0c\u91cd\u65b0\u6392\u5e8f\u7b49\uff09\uff0c\u4f46\u662f\u6700\u8fd1\u7684\u5de5\u4f5c\u8868\u660e\uff0c\u4f7f\u7528\u9012\u5f52\u795e\u7ecf\u7f51\u7edc\uff08RNN\uff09\u53ef\u4ee5\u4ee5\u66f4\u7b80\u5355\u7684\u65b9\u5f0f\u5b8c\u6210\u7ffb\u8bd1\u3002 [3\uff0c2\uff0c30]\u5e76\u4ecd\u8fbe\u5230\u6700\u5148\u8fdb\u7684\u6027\u80fd\u3002 \u201c\u7f16\u7801\u5668\u201d RNN\u8bfb\u53d6\u6e90\u8bed\u53e5\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u4e30\u5bcc\u7684\u56fa\u5b9a\u957f\u5ea6\u5411\u91cf\u8868\u793a\u5f62\u5f0f\uff0c\u7136\u540e\u5c06\u5176\u7528\u4f5c\u751f\u6210\u76ee\u6807\u8bed\u53e5\u7684\u201c\u89e3\u7801\u5668\u201d RNN\u7684\u521d\u59cb\u9690\u85cf\u72b6\u6001\u3002<\/p>\n\n\n\n<p>      \u5728\u8fd9\u91cc\uff0c\u6211\u4eec\u5efa\u8bae\u9075\u5faa\u8fd9\u79cd\u4f18\u96c5\u7684\u65b9\u6cd5\uff0c\u7528\u6df1\u5ea6\u5377\u79ef\u795e\u7ecf\u7f51\u7edc\uff08CNN\uff09\u4ee3\u66ff\u7f16\u7801\u5668RNN\u3002\u5728\u8fc7\u53bb\u7684\u51e0\u5e74\u4e2d\uff0c\u4ee4\u4eba\u4fe1\u670d\u7684\u8868\u660e\uff0cCNN\u53ef\u4ee5\u901a\u8fc7\u5c06\u8f93\u5165\u56fe\u50cf\u5d4c\u5165\u5230\u56fa\u5b9a\u957f\u5ea6\u7684\u5411\u91cf\u4e2d\u6765\u751f\u6210\u8f93\u5165\u56fe\u50cf\u7684\u4e30\u5bcc\u8868\u793a\uff0c\u4ece\u800c\u8fd9\u79cd\u8868\u793a\u53ef\u4ee5\u7528\u4e8e\u5404\u79cd\u89c6\u89c9\u4efb\u52a1[28]\u3002\u56e0\u6b64\uff0c\u81ea\u7136\u662f\u5c06CNN\u7528\u4f5c\u56fe\u50cf\u201c\u7f16\u7801\u5668\u201d\uff0c\u65b9\u6cd5\u662f\u5148\u5bf9\u5176\u8fdb\u884c\u9884\u8bad\u7ec3\u4ee5\u8fdb\u884c\u56fe\u50cf\u5206\u7c7b\u4efb\u52a1\uff0c\u7136\u540e\u5c06\u6700\u540e\u4e00\u4e2a\u9690\u85cf\u5c42\u7528\u4f5c\u751f\u6210\u8bed\u53e5\u7684RNN\u89e3\u7801\u5668\u7684\u8f93\u5165\uff08\u8bf7\u53c2\u89c1\u56fe1\uff09\u3002\u6211\u4eec\u5c06\u6b64\u6a21\u578b\u79f0\u4e3a\u795e\u7ecf\u56fe\u50cf\u6807\u9898\u6216NIC\u3002<\/p>\n\n\n\n<p>\u6211\u4eec\u7684\u8d21\u732e\u5982\u4e0b\u3002\u9996\u5148\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u4e2a\u89e3\u51b3\u95ee\u9898\u7684\u7aef\u5230\u7aef\u7cfb\u7edf\u3002\u5b83\u662f\u4e00\u79cd\u795e\u7ecf\u7f51\u7edc\uff0c\u53ef\u4ee5\u4f7f\u7528\u968f\u673a\u68af\u5ea6\u4e0b\u964d\u8bad\u7ec3\u3002\u5176\u6b21\uff0c\u6211\u4eec\u7684\u6a21\u578b\u7ed3\u5408\u4e86\u7528\u4e8e\u89c6\u89c9\u548c\u8bed\u8a00\u6a21\u578b\u7684\u6700\u65b0\u7f51\u7edc\u3002\u8fd9\u4e9b\u53ef\u4ee5\u5728\u8f83\u5927\u7684\u8bed\u6599\u5e93\u4e0a\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u56e0\u6b64\u53ef\u4ee5\u5229\u7528\u5176\u4ed6\u6570\u636e\u3002\u6700\u540e\uff0c\u4e0e\u6700\u5148\u8fdb\u7684\u65b9\u6cd5\u76f8\u6bd4\uff0c\u5b83\u7684\u6027\u80fd\u663e\u7740\u63d0\u9ad8\u3002 \u4f8b\u5982\uff0c\u5728Pascal\u6570\u636e\u96c6\u4e0a\uff0cNIC\u7684BLEU\u5f97\u5206\u4e3a59\uff0c\u4e0e\u5f53\u524d\u7684\u6700\u65b0\u6c34\u5e7325\u76f8\u6bd4\uff0c\u800c\u4eba\u7c7b\u7684\u6027\u80fd\u8fbe\u523069\u3002\u5728Flickr30k\u4e0a\uff0c\u6211\u4eec\u7684\u5f97\u5206\u4ece56\u63d0\u9ad8\u523066\uff0c\u5728SBU\uff0c\u4ece19\u523028\u3002<\/p>\n\n\n\n<p>Related Work\uff1a<\/p>\n\n\n\n<p>        \u4ece\u89c6\u89c9\u6570\u636e\u4e2d\u751f\u6210\u81ea\u7136\u8bed\u8a00\u63cf\u8ff0\u7684\u95ee\u9898\u5df2\u7ecf\u5728\u8ba1\u7b97\u673a\u89c6\u89c9\u4e2d\u8fdb\u884c\u4e86\u957f\u671f\u7814\u7a76\uff0c\u4f46\u4e3b\u8981\u9488\u5bf9\u89c6\u9891[7\uff0c32]\u3002\u8fd9\u5bfc\u81f4\u4e86\u7531\u89c6\u89c9\u539f\u59cb\u8bc6\u522b\u5668\u4e0e\u7ed3\u6784\u5316\u5f62\u5f0f\u8bed\u8a00\uff08\u4f8b\u5982\uff0c And-Or\u56fe\u5f62\u6216\u903b\u8f91\u7cfb\u7edf\uff0c\u5b83\u4eec\u901a\u8fc7\u57fa\u4e8e\u89c4\u5219\u7684\u7cfb\u7edf\u8fdb\u4e00\u6b65\u8f6c\u6362\u4e3a\u81ea\u7136\u8bed\u8a00\u3002\u8fd9\u6837\u7684\u7cfb\u7edf\u662f\u624b\u5de5\u8bbe\u8ba1\u7684\uff0c\u76f8\u5bf9\u8f83\u8106\uff0c\u5e76\u4e14\u4ec5\u5728\u6709\u9650\u7684\u9886\u57df\uff08\u4f8b\u5982\uff0c\u56fe1\uff09\u4e2d\u88ab\u8bc1\u660e\u3002\u4f8b\u5982\uff0c\u4ea4\u901a\u573a\u666f\u6216\u8fd0\u52a8\u63cf\u8ff0\u3002<\/p>\n\n\n\n<p>       \u5e26\u6709\u81ea\u7136\u6587\u672c\u7684\u9759\u6b62\u56fe\u50cf\u63cf\u8ff0\u95ee\u9898\u6700\u8fd1\u5f15\u8d77\u4e86\u4eba\u4eec\u7684\u5173\u6ce8\u3002\u501f\u52a9\u5bf9\u8c61\uff0c\u5c5e\u6027\u548c\u4f4d\u7f6e\u8bc6\u522b\u65b9\u9762\u7684\u6700\u65b0\u8fdb\u5c55\uff0c\u5c3d\u7ba1\u8fd9\u4e9b\u8bed\u8a00\u7684\u8868\u8fbe\u80fd\u529b\u53d7\u5230\u9650\u5236\uff0c\u4f46\u6211\u4eec\u4ecd\u53ef\u4ee5\u9a71\u52a8\u81ea\u7136\u8bed\u8a00\u751f\u6210\u7cfb\u7edf\u3002 Farhadi\u7b49\u3002 [6]\u4f7f\u7528\u68c0\u6d4b\u6765\u63a8\u65ad\u573a\u666f\u5143\u7d20\u7684\u4e09\u5143\u7ec4\uff0c\u5e76\u4f7f\u7528\u6a21\u677f\u5c06\u5176\u8f6c\u6362\u4e3a\u6587\u672c\u3002\u540c\u6837\uff0c\u674e\u7b49\u3002 [19]\u4ece\u68c0\u6d4b\u5f00\u59cb\uff0c\u5e76\u4f7f\u7528\u5305\u542b\u68c0\u6d4b\u5230\u7684\u5bf9\u8c61\u548c\u5173\u7cfb\u7684\u77ed\u8bed\u62fc\u51d1\u51fa\u6700\u7ec8\u63cf\u8ff0\u3002 Kulkani\u7b49\u4eba\u4f7f\u7528\u4e86\u4e00\u4e2a\u66f4\u590d\u6742\u7684\u68c0\u6d4b\u56fe\uff08\u4e09\u91cd\u6001\u9664\u5916\uff09\u3002 [16]\uff0c\u4f46\u5177\u6709\u57fa\u4e8e\u6a21\u677f\u7684\u6587\u672c\u751f\u6210\u3002\u4e5f\u4f7f\u7528\u4e86\u57fa\u4e8e\u8bed\u8a00\u89e3\u6790\u7684\u66f4\u5f3a\u5927\u7684\u8bed\u8a00\u6a21\u578b[23\uff0c1\uff0c17\uff0c18\uff0c5]\u3002\u4e0a\u9762\u7684\u65b9\u6cd5\u5df2\u7ecf\u80fd\u591f\u201cin the wild\u201d\u63cf\u8ff0\u56fe\u50cf\uff0c\u4f46\u662f\u5728\u6587\u672c\u751f\u6210\u65b9\u9762\uff0c\u5b83\u4eec\u662f\u7ecf\u8fc7\u5927\u91cf\u624b\u5de5\u8bbe\u8ba1\u548c\u4e25\u683c\u8bbe\u8ba1\u7684\u3002<\/p>\n\n\n\n<p>       \u5927\u91cf\u5de5\u4f5c\u89e3\u51b3\u4e86\u5bf9\u7ed9\u5b9a\u56fe\u50cf[11\u30018\u300124]\u8fdb\u884c\u63cf\u8ff0\u6392\u540d\u7684\u95ee\u9898\u3002\u8fd9\u6837\u7684\u65b9\u6cd5\u57fa\u4e8e\u5728\u76f8\u540c\u5411\u91cf\u7a7a\u95f4\u4e2d\u5171\u540c\u5d4c\u5165\u56fe\u50cf\u548c\u6587\u672c\u7684\u60f3\u6cd5\u3002\u5bf9\u4e8e\u56fe\u50cf\u67e5\u8be2\uff0c\u5c06\u83b7\u53d6\u63a5\u8fd1\u5d4c\u5165\u7a7a\u95f4\u4e2d\u56fe\u50cf\u7684\u63cf\u8ff0\u3002\u6700\u7d27\u5bc6\u5730\uff0c\u795e\u7ecf\u7f51\u7edc\u7528\u4e8e\u5171\u540c\u5d4c\u5165\u56fe\u50cf\u548c\u53e5\u5b50[29]\uff0c\u751a\u81f3\u5d4c\u5165\u56fe\u50cf\u4f5c\u7269\u548c\u53e5\u5b50[13]\uff0c\u4f46\u5e76\u672a\u5c1d\u8bd5\u751f\u6210\u65b0\u9896\u7684\u63cf\u8ff0\u3002\u901a\u5e38\uff0c\u5373\u4f7f\u53ef\u80fd\u5df2\u7ecf\u5728\u8bad\u7ec3\u6570\u636e\u4e2d\u89c2\u5bdf\u5230\u4e86\u5355\u4e2a\u5bf9\u8c61\uff0c\u4e0a\u8ff0\u65b9\u6cd5\u4e5f\u65e0\u6cd5\u63cf\u8ff0\u4ee5\u524d\u770b\u4e0d\u89c1\u7684\u5bf9\u8c61\u7ec4\u6210\u3002\u800c\u4e14\uff0c\u5b83\u4eec\u907f\u514d\u4e86\u89e3\u51b3\u8bc4\u4f30\u6240\u751f\u6210\u7684\u63cf\u8ff0\u7684\u826f\u597d\u7a0b\u5ea6\u7684\u95ee\u9898\u3002<\/p>\n\n\n\n<p>       \u5728\u8fd9\u9879\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u5c06\u7528\u4e8e\u56fe\u50cf\u5206\u7c7b\u7684\u6df1\u5c42\u5377\u79ef\u7f51\u7edc[12]\u4e0e\u7528\u4e8e\u5e8f\u5217\u5efa\u6a21\u7684\u5faa\u73af\u7f51\u7edc[10]\u76f8\u7ed3\u5408\uff0c\u4ee5\u521b\u5efa\u4e00\u4e2a\u751f\u6210\u56fe\u50cf\u63cf\u8ff0\u7684\u5355\u4e00\u7f51\u7edc\u3002\u5728\u8fd9\u4e2a\u5355\u4e00\u7684\u201c\u7aef\u5230\u7aef\u201d\u7f51\u7edc\u7684\u80cc\u666f\u4e0b\u5bf9RNN\u8fdb\u884c\u4e86\u8bad\u7ec3\u3002\u8be5\u6a21\u578b\u7684\u7075\u611f\u6765\u81ea\u673a\u5668\u7ffb\u8bd1\u4e2d\u5e8f\u5217\u751f\u6210\u7684\u6700\u65b0\u6210\u529f[3\uff0c2\uff0c30]\uff0c\u533a\u522b\u5728\u4e8e\u6211\u4eec\u63d0\u4f9b\u7684\u4e0d\u662f\u5377\u79ef\u53e5\u5b50\uff0c\u800c\u662f\u63d0\u4f9b\u4e86\u7531\u5377\u79ef\u7f51\u7edc\u5904\u7406\u7684\u56fe\u50cf\u3002\u6700\u8fd1\u7684\u8457\u4f5c\u662f\u57fa\u6d1b\u65af\u7b49\u4eba\u3002 [15]\u4ed6\u4eec\u4f7f\u7528\u795e\u7ecf\u7f51\u7edc\uff0c\u4f46\u4f7f\u7528\u524d\u9988\u795e\u7ecf\u7f51\u7edc\uff0c\u6839\u636e\u56fe\u50cf\u548c\u524d\u4e00\u4e2a\u5355\u8bcd\u6765\u9884\u6d4b\u4e0b\u4e00\u4e2a\u5355\u8bcd\u3002\u6bdb\u7b49\u4eba\u7684\u6700\u65b0\u8457\u4f5c\u3002 [21]\u4f7f\u7528\u9012\u5f52\u795e\u7ecf\u7f51\u7edc\u8fdb\u884c\u76f8\u540c\u7684\u9884\u6d4b\u4efb\u52a1\u3002\u8fd9\u4e0e\u5f53\u524d\u5efa\u8bae\u975e\u5e38\u76f8\u4f3c\uff0c\u4f46\u662f\u6709\u8bb8\u591a\u91cd\u8981\u7684\u533a\u522b\uff1a\u6211\u4eec\u4f7f\u7528\u529f\u80fd\u66f4\u5f3a\u5927\u7684RNN\u6a21\u578b\uff0c\u5e76\u76f4\u63a5\u5411RNN\u6a21\u578b\u63d0\u4f9b\u53ef\u89c6\u8f93\u5165\uff0c\u8fd9\u4f7f\u5f97RNN\u53ef\u4ee5\u8ddf\u8e2a\u90a3\u4e9b\u7531\u6587\u5b57\u89e3\u91ca\u3002\u7531\u4e8e\u8fd9\u4e9b\u770b\u4f3c\u5fae\u4e0d\u8db3\u9053\u7684\u5dee\u5f02\uff0c\u6211\u4eec\u7684\u7cfb\u7edf\u5728\u5df2\u5efa\u7acb\u7684\u57fa\u51c6\u4e0a\u53d6\u5f97\u4e86\u660e\u663e\u66f4\u597d\u7684\u7ed3\u679c\u3002\u6700\u540e\uff0c\u57fa\u6d1b\u65af\u7b49\u3002 [14]\u63d0\u51fa\u901a\u8fc7\u4f7f\u7528\u529f\u80fd\u5f3a\u5927\u7684\u8ba1\u7b97\u673a\u89c6\u89c9\u6a21\u578b\u548c\u5bf9\u6587\u672c\u8fdb\u884c\u7f16\u7801\u7684LSTM\u6765\u6784\u5efa\u8054\u5408\u591a\u5cf0\u5d4c\u5165\u7a7a\u95f4\u3002\u4e0e\u6211\u4eec\u7684\u65b9\u6cd5\u76f8\u53cd\uff0c\u5b83\u4eec\u4f7f\u7528\u4e24\u4e2a\u5355\u72ec\u7684\u8def\u5f84\uff08\u4e00\u4e2a\u7528\u4e8e\u56fe\u50cf\uff0c\u4e00\u4e2a\u7528\u4e8e\u6587\u672c\uff09\u6765\u5b9a\u4e49\u8054\u5408\u5d4c\u5165\uff0c\u5e76\u4e14\u5373\u4f7f\u5b83\u4eec\u53ef\u4ee5\u751f\u6210\u6587\u672c\uff0c\u4e5f\u5bf9\u5176\u65b9\u6cd5\u8fdb\u884c\u4e86\u9ad8\u5ea6\u8c03\u6574\u4ee5\u8fdb\u884c\u6392\u540d\u3002<\/p>\n\n\n\n<p>Model\uff1a<\/p>\n\n\n\n<p>\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86\u4e00\u79cd\u795e\u7ecf\u548c\u6982\u7387\u6846\u67b6\u6765\u4ece\u56fe\u50cf\u751f\u6210\u63cf\u8ff0\u3002\u7edf\u8ba1\u673a\u5668\u7ffb\u8bd1\u7684\u6700\u65b0\u8fdb\u5c55\u8868\u660e\uff0c\u7ed9\u5b9a\u5f3a\u5927\u7684\u5e8f\u5217\u6a21\u578b\uff0c\u53ef\u4ee5\u901a\u8fc7\u5728\u201c\u7aef\u5230\u7aef\u201d\u7684\u65b9\u5f0f\u4e2d\u7ed9\u5b9a\u8f93\u5165\u53e5\u5b50\u7684\u60c5\u51b5\u4e0b\uff0c\u76f4\u63a5\u6700\u5927\u5316\u6b63\u786e\u7ffb\u8bd1\u7684\u6982\u7387\u6765\u83b7\u5f97\u6700\u65b0\u7684\u7ed3\u679c\u2013\u65e2\u7528\u4e8e\u8bad\u7ec3\u53c8\u7528\u4e8e\u63a8\u7406\u3002\u8fd9\u4e9b\u6a21\u578b\u4f7f\u7528\u5faa\u73af\u795e\u7ecf\u7f51\u7edc\uff0c\u8be5\u7f51\u7edc\u5c06\u53ef\u53d8\u957f\u5ea6\u8f93\u5165\u7f16\u7801\u4e3a\u56fa\u5b9a\u7ef4\u5411\u91cf\uff0c\u5e76\u4f7f\u7528\u6b64\u8868\u793a\u5f62\u5f0f\u5c06\u5176\u201c\u89e3\u7801\u201d\u4e3a\u6240\u9700\u7684\u8f93\u51fa\u8bed\u53e5\u3002 \u56e0\u6b64\uff0c\u5f88\u81ea\u7136\u5730\u4f7f\u7528\u76f8\u540c\u7684\u65b9\u6cd5\uff0c\u5373\u5728\u7ed9\u5b9a\u56fe\u50cf\uff08\u800c\u4e0d\u662f\u6e90\u8bed\u8a00\u4e2d\u7684\u8f93\u5165\u53e5\u5b50\uff09\u7684\u60c5\u51b5\u4e0b\uff0c\u5e94\u7528\u76f8\u540c\u7684\u539f\u7406\u5c06\u5176\u201c\u7ffb\u8bd1\u201d\u6210\u5176\u63cf\u8ff0\u3002<\/p>\n\n\n\n<p>Thus, we propose to directly maximize the probability of the correct description given the image by using the following formulation:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img src=\"https:\/\/img-blog.csdnimg.cn\/2020042207150826.png#pic_center\" alt=\"\"\/><\/figure>\n\n\n\n<p>where \u03b8  are the parameters of our model, I is an image, and S its correct transcription. Since S  represents any sentence, its length is unbounded(\u65e0\u9650\u7684). Thus, it is common to apply the chain rule to model the joint probability over S 0 , . . . , S N , where N is the length of this particular example as<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img src=\"https:\/\/img-blog.csdnimg.cn\/20200422071826206.png#pic_center\" alt=\"\"\/><\/figure>\n\n\n\n<p>where we dropped the dependency on \u03b8for convenience. At training time, ( S , I ) is a training example pair, and we optimize the sum of the log probabilities as described in (2) over the whole training set using stochastic gradient descent (further training details are given in Section 4).<\/p>\n\n\n\n<p>It is natural to model p ( S t \u2223 I , S 0 , . . . , S t \u2212 1 )with a Recurrent Neural Network (RNN), where the variable number of words we condition upon up to t \u2212 1  is expressed by a fixed length hidden state or memory h t  This memory is updated after seeing a new input xtby using a non-linear function f :<\/p>\n\n\n\n<p>                                        <em>ht<\/em>+1\u200b=<em>f<\/em>(<em>ht<\/em>\u200b;<em>xt<\/em>\u200b)<\/p>\n\n\n\n<p>\u4e3a\u4e86\u4f7f\u4e0a\u8ff0RNN\u66f4\u5177\u4f53\uff0c\u9700\u8981\u505a\u51fa\u4e24\u4e2a\u5173\u952e\u7684\u8bbe\u8ba1\u9009\u62e9\uff1af\u7684\u786e\u5207\u5f62\u5f0f\u662f\u4ec0\u4e48\u4ee5\u53ca\u5982\u4f55\u5c06\u56fe\u50cf\u548c\u5355\u8bcd\u4f5c\u4e3a\u8f93\u5165xt\u8f93\u5165\u3002\u5bf9\u4e8e f\uff0c\u6211\u4eec\u4f7f\u7528\u957f\u65f6\u8bb0\u5fc6\uff08LSTM\uff09\u7f51\u7edc\uff0c\u8be5\u7f51\u7edc\u5df2\u663e\u793a\u51fa\u8bf8\u5982\u7ffb\u8bd1\u4e4b\u7c7b\u7684\u5e8f\u5217\u4efb\u52a1\u7684\u6700\u65b0\u6027\u80fd\u3002\u4e0b\u4e00\u8282\u5c06\u6982\u8ff0\u6b64\u6a21\u578b\u3002<\/p>\n\n\n\n<p>\u5bf9\u4e8e\u56fe\u50cf\u7684\u8868\u793a\uff0c\u6211\u4eec\u4f7f\u7528\u5377\u79ef\u795e\u7ecf\u7f51\u7edc\uff08CNN\uff09\u3002\u5b83\u4eec\u5df2\u88ab\u5e7f\u6cdb\u5730\u7528\u4e8e\u56fe\u50cf\u4efb\u52a1\u5e76\u5df2\u88ab\u7814\u7a76\uff0c\u5e76\u4e14\u76ee\u524d\u662f\u7269\u4f53\u8bc6\u522b\u548c\u68c0\u6d4b\u7684\u6700\u65b0\u6280\u672f\u3002\u6211\u4eec\u5bf9CNN\u7684\u7279\u5b9a\u9009\u62e9\u4f7f\u7528\u4e00\u79cd\u65b0\u9896\u7684\u65b9\u6cd5\u5bf9\u6279\u5904\u7406\u8fdb\u884c\u5f52\u4e00\u5316\uff0c\u5e76\u5728ILSVRC 2014\u5206\u7c7b\u7ade\u8d5b\u4e2d\u83b7\u5f97\u5f53\u524d\u7684\u6700\u4f73\u8868\u73b0[12]\u3002\u6b64\u5916\uff0c\u5b83\u4eec\u5df2\u88ab\u8bc1\u660e\u53ef\u4ee5\u901a\u8fc7\u8f6c\u79fb\u5b66\u4e60\u63a8\u5e7f\u5230\u5176\u4ed6\u4efb\u52a1\uff0c\u4f8b\u5982\u573a\u666f\u5206\u7c7b[4]\u3002\u5355\u8bcd\u7528\u5d4c\u5165\u6a21\u578b\u8868\u793a\u3002<\/p>\n\n\n\n<p>LSTM-based Sentence Generator<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img src=\"https:\/\/img-blog.csdnimg.cn\/20200422073815485.png?x-oss-process=image\/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl8zOTY1Mzk0OA==,size_16,color_FFFFFF,t_70#pic_center\" alt=\"\"\/><\/figure>\n\n\n\n<p>\u5728\uff083\uff09\u4e2df\u7684\u9009\u62e9\u53d6\u51b3\u4e8e\u5b83\u5904\u7406\u6d88\u5931\u548c\u7206\u70b8\u68af\u5ea6\u7684\u80fd\u529b[10]\uff0c\u8fd9\u662f\u8bbe\u8ba1\u548c\u8bad\u7ec3RNN\u65f6\u6700\u5e38\u89c1\u7684\u6311\u6218\u3002\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u6311\u6218\uff0c\u5f15\u5165\u4e86\u4e00\u79cd\u79f0\u4e3aLSTM\u7684\u7279\u6b8a\u5f62\u5f0f\u7684\u9012\u5f52\u7f51\u7edc[10]\uff0c\u5e76\u6210\u529f\u5e94\u7528\u4e8e\u7ffb\u8bd1[3\uff0c30]\u548c\u5e8f\u5217\u751f\u6210[9]\u3002 LSTM\u6a21\u578b\u7684\u6838\u5fc3\u662f\u5b58\u50a8\u5355\u5143 c \uff0c\u5b83\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u4e0a\u7f16\u7801\u77e5\u8bc6\uff0c\u76f4\u5230\u8be5\u6b65\u4e3a\u6b62\u90fd\u89c2\u5bdf\u5230\u4e86\u54ea\u4e9b\u8f93\u5165\uff08\u53c2\u89c1\u56fe2\uff09\u3002\u5355\u5143\u7684\u884c\u4e3a\u7531\u201c\u95e8\u201d\u63a7\u5236\uff0c\u201c\u95e8\u201d\u662f\u76f8\u4e58\u7684\u5c42\uff0c\u56e0\u6b64\u5982\u679c\u95e8\u4e3a1\u5219\u53ef\u4ee5\u4fdd\u7559\u95e8\u63a7\u5c42\u7684\u503c\uff0c\u5982\u679c\u95e8\u4e3a0\u5219\u53ef\u4ee5\u4fdd\u6301\u6b64\u503c\u4e3a\u96f6\u3002\u7279\u522b\u662f\uff0c\u6b63\u5728\u4f7f\u7528\u4e09\u4e2a\u95e8\u7528\u4e8e\u63a7\u5236\u662f\u5426\u5fd8\u8bb0\u5f53\u524d\u5355\u5143\u683c\u503c\uff08\u5fd8\u8bb0\u95e8 f\uff09\uff0c\u662f\u5426\u5e94\u8bfb\u53d6\u5176\u8f93\u5165\uff08\u8f93\u5165\u95e8 i\uff09\u4ee5\u53ca\u662f\u5426\u8f93\u51fa\u65b0\u5355\u5143\u683c\u503c\uff08\u8f93\u51fa\u95e8 o \uff09\u3002\u95e8\u7684\u5b9a\u4e49\u4ee5\u53ca\u5355\u5143\u66f4\u65b0\u548c\u8f93\u51fa\u5982\u4e0b\uff1a<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img src=\"https:\/\/img-blog.csdnimg.cn\/20200422073117622.png?x-oss-process=image\/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl8zOTY1Mzk0OA==,size_16,color_FFFFFF,t_70#pic_center\" alt=\"\"\/><\/figure>\n\n\n\n<p>\u5176\u4e2d \u2299\u8868\u793a\u95e8\u503c\u7684\u4e58\u79ef\uff0c\u800c\u5404\u79cd W\u77e9\u9635\u90fd\u662f\u7ecf\u8fc7\u8bad\u7ec3\u7684\u53c2\u6570\u3002\u8fd9\u6837\u7684\u4e58\u6cd5\u95e8\u4f7f\u8bad\u7ec3\u9c81\u68d2\u7684LSTM\u6210\u4e3a\u53ef\u80fd\uff0c\u56e0\u4e3a\u8fd9\u4e9b\u95e8\u5f88\u597d\u5730\u5904\u7406\u4e86\u7206\u70b8\u548c\u6d88\u5931\u7684\u68af\u5ea6[10]\u3002\u975e\u7ebf\u6027\u4e3aS\u578b\u03c3(\u22c5)\u548c\u53cc\u66f2\u6b63\u5207 h ( \u22c5 ) \u3002\u6700\u540e\u4e00\u4e2a\u65b9\u7a0b mt\u662f\u8f93\u5165\u7ed9Softmax\u7684\u65b9\u7a0b\uff0c\u5b83\u5c06\u4ea7\u751f\u6240\u6709\u5355\u8bcd\u4e0a\u7684\u6982\u7387\u5206\u5e03\u3002<\/p>\n\n\n\n<p>LSTM\u6a21\u578b\u7ecf\u8fc7\u8bad\u7ec3\uff0c\u53ef\u4ee5\u5728\u770b\u5230\u56fe\u50cf\u540e\u9884\u6d4b\u53e5\u5b50\u4e2d\u7684\u6bcf\u4e2a\u5355\u8bcd\u4ee5\u53ca\u901a\u8fc7 p ( S t \u2223 I , S 0 , . . . , S t \u2212 1 ) ) \u9884\u6d4b\u6240\u6709\u5148\u524d\u5355\u8bcd\u3002\u4e3a\u6b64\uff0c\u4ee5\u5c55\u5f00\u5f62\u5f0f\u8003\u8651LSTM\u662f\u6709\u542f\u53d1\u6027\u7684\u2013\u4e3a\u56fe\u50cf\u548c\u6bcf\u4e2a\u53e5\u5b50\u5355\u8bcd\u521b\u5efaLSTM\u5b58\u50a8\u5668\u7684\u526f\u672c\uff0c\u4ee5\u4fbf\u6240\u6709LSTM\u5728\u65f6\u95f4t\u5171\u4eab\u76f8\u540c\u7684\u53c2\u6570\u548cLSTM\u7684\u8f93\u51fa mt-1<br>\u5728\u65f6\u95f4 t \u5c06\u9988\u9001\u5230LSTM\uff08\u89c1\u56fe3\uff09\u3002\u5728\u5c55\u5f00\u7248\u672c\u4e2d\uff0c\u6240\u6709\u7ecf\u5e38\u6027\u8fde\u63a5\u90fd\u5c06\u8f6c\u6362\u4e3a\u524d\u9988\u8fde\u63a5\u3002\u66f4\u8be6\u7ec6\u5730\u8bb2\uff0c\u5982\u679c\u6211\u4eec\u7528I\u8868\u793a\u8f93\u5165\u56fe\u50cf\uff0c\u800c\u7528 S = ( S 0 , . . . , S N ) \u8868\u793a\u63cf\u8ff0\u8be5\u56fe\u50cf\u7684\u771f\u5b9e\u53e5\u5b50\uff0c\u5219\u5c55\u5f00\u8fc7\u7a0b\u4e3a\uff1a<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img src=\"https:\/\/img-blog.csdnimg.cn\/20200422074921731.png#pic_center\" alt=\"\"\/><\/figure>\n\n\n\n<p>\u5728\u8fd9\u91cc\uff0c\u6211\u4eec\u5c06\u6bcf\u4e2a\u5355\u8bcd\u8868\u793a\u4e3a\u4e00\u7ef4\u5411\u91cf S t \uff0c\u5176\u7ef4\u6570\u7b49\u4e8e\u5b57\u5178\u7684\u5927\u5c0f\u3002\u6ce8\u610f\uff0c\u6211\u4eec\u7528 S 0 \u8868\u793a\u4e00\u4e2a\u7279\u6b8a\u7684\u5f00\u59cb\u8bcd\uff0c\u7528 S N \u8868\u793a\u4e00\u4e2a\u7279\u6b8a\u7684\u505c\u6b62\u8bcd\uff0c\u5b83\u6307\u5b9a\u53e5\u5b50\u7684\u5f00\u5934\u548c\u7ed3\u5c3e\u3002\u7279\u522b\u662f\u901a\u8fc7\u53d1\u51fa\u505c\u7528\u8bcd\uff0cLSTM\u53d1\u51fa\u4fe1\u53f7\uff0c\u8868\u660e\u5df2\u751f\u6210\u5b8c\u6574\u7684\u53e5\u5b50\u3002\u56fe\u50cf\u548c\u5355\u8bcd\u90fd\u6620\u5c04\u5230\u76f8\u540c\u7684\u7a7a\u95f4\uff0c\u4f7f\u7528\u89c6\u89c9CNN\u6620\u5c04\u56fe\u50cf\uff0c\u4f7f\u7528\u5355\u8bcd\u5d4c\u5165 W e\u6620\u5c04\u5230\u5355\u8bcd\u3002\u56fe\u50cf I \u4ec5\u5728 t = \u2212 1 \u65f6\u8f93\u5165\u4e00\u6b21\uff0c\u4ee5\u901a\u77e5LSTM\u6709\u5173\u56fe\u50cf\u5185\u5bb9\u3002\u6211\u4eec\u51ed\u7ecf\u9a8c\u9a8c\u8bc1\u4e86\uff0c\u7531\u4e8e\u7f51\u7edc\u53ef\u4ee5\u663e\u5f0f\u5229\u7528\u56fe\u50cf\u4e2d\u7684\u566a\u58f0\u5e76\u66f4\u5bb9\u6613\u8fc7\u5ea6\u62df\u5408\uff0c\u56e0\u6b64\u5728\u6bcf\u4e2a\u65f6\u95f4\u6b65\u5e45\u4e0a\u4f5c\u4e3a\u989d\u5916\u7684\u8f93\u5165\u6765\u9988\u9001\u56fe\u50cf\u4f1a\u4ea7\u751f\u8f83\u5dee\u7684\u7ed3\u679c<\/p>\n\n\n\n<p>Our loss is the sum of the negative log likelihood of the correct word at each step as follows:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img src=\"https:\/\/img-blog.csdnimg.cn\/20200422075847645.png\" alt=\"\"\/><\/figure>\n\n\n\n<p>The above loss is minimized w.r.t. all the parameters of the LSTM, the top layer of the image embedder CNN and word embeddings We<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img src=\"https:\/\/img-blog.csdnimg.cn\/20200422075023517.png?x-oss-process=image\/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl8zOTY1Mzk0OA==,size_16,color_FFFFFF,t_70#pic_center\" alt=\"\"\/><\/figure>\n\n\n\n<p>\u4f7f\u7528NIC\uff0c\u6709\u591a\u79cd\u65b9\u6cd5\u53ef\u4ee5\u7528\u4e8e\u751f\u6210\u7ed9\u5b9a\u56fe\u50cf\u7684\u53e5\u5b50\u3002\u7b2c\u4e00\u4e2a\u662f\u62bd\u6837\uff0c\u6211\u4eec\u53ea\u662f\u6839\u636ep1\u5bf9\u7b2c\u4e00\u4e2a\u5355\u8bcd\u8fdb\u884c\u62bd\u6837\uff0c\u7136\u540e\u63d0\u4f9b\u76f8\u5e94\u7684\u5d4c\u5165\u4f5c\u4e3a\u8f93\u5165\uff0c\u7136\u540e\u5bf9p2\u8fdb\u884c\u62bd\u6837\uff0c\u8fd9\u6837\u4e00\u76f4\u8fdb\u884c\u4e0b\u53bb\uff0c\u76f4\u5230\u6211\u4eec\u5bf9\u7279\u6b8a\u7684\u8bed\u53e5\u7ed3\u675f\u6807\u8bb0\u6216\u67d0\u4e2a\u6700\u5927\u957f\u5ea6\u8fdb\u884c\u62bd\u6837\u3002\u7b2c\u4e8c\u79cd\u65b9\u6cd5\u662fBeamSearch:\u8fed\u4ee3\u5730\u8003\u8651k\u4e2a\u6700\u597d\u7684\u53e5\u5b50\uff0c\u76f4\u5230\u65f6\u95f4t\uff0c\u4f5c\u4e3a\u5019\u9009\uff0c\u751f\u6210\u5927\u5c0f\u4e3at + 1\u7684\u53e5\u5b50\uff0c\u5e76\u53ea\u4fdd\u7559\u5176\u4e2d\u6700\u597d\u7684k\u4e2a\u3002\u8fd9\u66f4\u63a5\u8fd1\u4e8eS = arg maxS0 p(S0|I)\u3002\u5728\u63a5\u4e0b\u6765\u7684\u5b9e\u9a8c\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528\u4e86\u6ce2\u675f\u641c\u7d22\u65b9\u6cd5\uff0c\u6ce2\u675f\u5927\u5c0f\u4e3a20\u3002\u4f7f\u7528\u5149\u675f\u5927\u5c0f\u4e3a1(\u5373(\u8d2a\u5a6a\u641c\u7d22)\u964d\u4f4e\u4e86\u5e73\u57472\u4e2aBELU\u70b9\u3002<\/p>\n\n\n\n<h2>\u7ed3\u6784\u6574\u4f53\u7ed3\u6784\uff1a<\/h2>\n\n\n\n<p>\u5305\u62ecEncoder\u3001decoder \u3001Attention\u3001 Beam Search\uff08\u675f\u641c\u7d22\uff09<\/p>\n\n\n\n<figure class=\"wp-block-image is-style-default\"><img src=\"https:\/\/github.com\/sgrvinod\/a-PyTorch-Tutorial-to-Image-Captioning\/raw\/master\/img\/model.png\" alt=\"Putting it all together\"\/><\/figure>\n\n\n\n<p>Generation Results<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img src=\"https:\/\/img-blog.csdnimg.cn\/20200422083032291.png?x-oss-process=image\/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl8zOTY1Mzk0OA==,size_16,color_FFFFFF,t_70#pic_center\" alt=\"\"\/><\/figure>\n\n\n\n<p>\u6211\u4eec\u5728\u88681\u548c\u88682\u4e2d\u62a5\u544a\u4e86\u6240\u6709\u76f8\u5173\u6570\u636e\u96c6\u7684\u4e3b\u8981\u7ed3\u679c\u3002\u7531\u4e8ePASCAL\u6ca1\u6709\u8bad\u7ec3\u96c6\uff0c\u6240\u4ee5\u6211\u4eec\u4f7f\u7528\u4f7f\u7528MSCOCO\u8bad\u7ec3\u7684\u7cfb\u7edf(\u5bf9\u4e8e\u8fd9\u4e2a\u4efb\u52a1\u6765\u8bf4\uff0cMSCOCO\u53ef\u80fd\u662f\u6700\u5927\u548c\u6700\u9ad8\u8d28\u91cf\u7684\u6570\u636e\u96c6)\u3002PASCAL\u548cSBU\u7684\u6700\u65b0\u7814\u7a76\u7ed3\u679c\u5e76\u6ca1\u6709\u4f7f\u7528\u57fa\u4e8e\u6df1\u5ea6\u5b66\u4e60\u7684\u56fe\u50cf\u7279\u5f81\uff0c\u56e0\u6b64\u53ef\u4ee5\u8bf4\uff0c\u8fd9\u4e9b\u5206\u6570\u4e0a\u7684\u4e00\u4e2a\u5de8\u5927\u8fdb\u6b65\u4ec5\u4ec5\u6765\u81ea\u4e8e\u8fd9\u79cd\u6539\u53d8\u3002Flickr\u6570\u636e\u96c6\u6700\u8fd1\u624d\u88ab\u4f7f\u7528[11,21,14]\uff0c\u4f46\u5927\u591a\u6570\u662f\u5728\u68c0\u7d22\u6846\u67b6\u4e2d\u8bc4\u4f30\u7684\u3002\u4e00\u4e2a\u503c\u5f97\u6ce8\u610f\u7684\u4f8b\u5916\u662f[21]\uff0c\u5b83\u4eec\u5728\u5176\u4e2d\u8fdb\u884c\u68c0\u7d22\u548c\u751f\u6210\uff0c\u5e76\u4e14\u5728Flickr\u6570\u636e\u96c6\u4e0a\u4ea7\u751f\u4e86\u8fc4\u4eca\u4e3a\u6b62\u6700\u597d\u7684\u6027\u80fd\u3002<\/p>\n\n\n\n<p>\u88682\u4e2d\u7684\u4eba\u7c7b\u8bc4\u5206\u662f\u901a\u8fc7\u6bd4\u8f83\u5176\u4e2d\u4e00\u4e2a\u4eba\u7c7b\u5b57\u5e55\u548c\u53e6\u5916\u56db\u4e2a\u5b57\u5e55\u8ba1\u7b97\u51fa\u6765\u7684\u3002\u6211\u4eec\u4e3a\u4e94\u4e2a\u6253\u5206\u8005\u4e2d\u7684\u6bcf\u4e00\u4e2a\u6253\u5206\uff0c\u5e76\u5bf9\u4ed6\u4eec\u7684BLEU\u5206\u6570\u8fdb\u884c\u5e73\u5747\u3002\u7531\u4e8e\u8fd9\u7ed9\u6211\u4eec\u7684\u7cfb\u7edf\u5e26\u6765\u4e86\u4e00\u70b9\u4f18\u52bf\uff0c\u8003\u8651\u5230BLEU\u5206\u6570\u662f\u6839\u636e5\u4e2a\u53c2\u8003\u53e5\u8ba1\u7b97\u7684\uff0c\u800c\u4e0d\u662f4\u4e2a\uff0c\u6211\u4eec\u5c065\u4e2a\u53c2\u8003\u53e5\u800c\u4e0d\u662f4\u4e2a\u53c2\u8003\u53e5\u7684\u5e73\u5747\u5dee\u5f02\u52a0\u56de\u4eba\u7c7b\u5206\u6570\u3002<\/p>\n\n\n\n<p>\u9274\u4e8e\u8be5\u9886\u57df\u5728\u8fc7\u53bb\u51e0\u5e74\u4e2d\u53d6\u5f97\u4e86\u91cd\u5927\u8fdb\u5c55\uff0c\u6211\u4eec\u8ba4\u4e3a\u62a5\u544aBLEU-4\u66f4\u6709\u610f\u4e49\uff0c\u8fd9\u662f\u673a\u5668\u7ffb\u8bd1\u5411\u524d\u53d1\u5c55\u7684\u6807\u51c6\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u62a5\u544a\u4e86\u886814\u4e2d\u663e\u793a\u7684\u4e0e\u4eba\u5de5\u8bc4\u4f30\u5173\u8054\u66f4\u597d\u7684\u5ea6\u91cf\u6807\u51c6\u3002\u5c3d\u7ba1\u6700\u8fd1\u5728\u66f4\u597d\u7684\u8bc4\u4ef7\u6307\u6807[31]\u7684\u52aa\u529b\uff0c\u6211\u4eec\u7684\u6a21\u578b\u4e0e\u4eba\u7c7b\u8bc4\u5206\u8005\u76f8\u6bd4\u8868\u73b0\u826f\u597d\u3002\u7136\u800c\uff0c\u5f53\u4f7f\u7528\u4eba\u5de5\u8bc4\u5206\u5458\u6765\u8bc4\u4f30\u6211\u4eec\u7684\u5b57\u5e55\u65f6\uff08\u89c14.3.6\u8282)\uff0c\u6211\u4eec\u7684\u6a21\u578b\u8868\u73b0\u5f97\u66f4\u5dee\uff0c\u8fd9\u8868\u660e\u6211\u4eec\u9700\u8981\u505a\u66f4\u591a\u7684\u5de5\u4f5c\u6765\u83b7\u5f97\u66f4\u597d\u7684\u6307\u6807\u3002\u5728\u5b98\u65b9\u6d4b\u8bd5\u96c6\u4e0a\uff0c\u6211\u4eec\u7684\u6807\u7b7e\u53ea\u80fd\u901a\u8fc7\u5b98\u65b9\u7f51\u7ad9\u83b7\u5f97\uff0c\u6211\u4eec\u7684\u578b\u53f7\u670927.2\u7684BLEU-4\u3002<\/p>\n\n\n\n<p>Conclusion<\/p>\n\n\n\n<p>\u6211\u4eec\u63d0\u51fa\u4e86NIC\uff0c\u4e00\u4e2a\u7aef\u5230\u7aef\u7684\u795e\u7ecf\u7f51\u7edc\u7cfb\u7edf\uff0c\u53ef\u4ee5\u81ea\u52a8\u67e5\u770b\u56fe\u50cf\u5e76\u751f\u6210\u5408\u7406\u7684\u63cf\u8ff0\u3002NIC\u57fa\u4e8e\u5377\u79ef\u795e\u7ecf\u7f51\u7edc\uff0c\u5c06\u56fe\u50cf\u7f16\u7801\u6210\u7d27\u51d1\u7684\u8868\u793a\u5f62\u5f0f\uff0c\u7136\u540e\u662f\u9012\u5f52\u795e\u7ecf\u7f51\u7edc\uff0c\u751f\u6210\u76f8\u5e94\u7684\u53e5\u5b50\u3002\u8be5\u6a21\u578b\u7ecf\u8fc7\u8bad\u7ec3\u4ee5\u6700\u5927\u5316\u7ed9\u5b9a\u56fe\u50cf\u7684\u53e5\u5b50\u7684\u53ef\u80fd\u6027\u3002\u5728\u591a\u4e2a\u6570\u636e\u96c6\u4e0a\u7684\u5b9e\u9a8c\u8868\u660e\uff0cNIC\u5728\u5b9a\u6027\u7ed3\u679c(\u751f\u6210\u7684\u53e5\u5b50\u975e\u5e38\u5408\u7406)\u548c\u5b9a\u91cf\u8bc4\u4f30\u65b9\u9762\u5177\u6709\u9c81\u68d2\u6027\uff0c\u53ef\u4ee5\u4f7f\u7528\u6392\u540d\u6307\u6807\uff0c\u4e5f\u53ef\u4ee5\u4f7f\u7528\u673a\u5668\u7ffb\u8bd1\u4e2d\u7528\u6765\u8bc4\u4f30\u751f\u6210\u53e5\u5b50\u8d28\u91cf\u7684BLEU\u6307\u6807\u3002\u4ece\u8fd9\u4e9b\u5b9e\u9a8c\u4e2d\u53ef\u4ee5\u6e05\u695a\u5730\u770b\u5230\uff0c\u968f\u7740\u7528\u4e8e\u56fe\u50cf\u63cf\u8ff0\u7684\u53ef\u7528\u6570\u636e\u96c6\u7684\u5927\u5c0f\u7684\u589e\u52a0\uff0cNIC\u7b49\u65b9\u6cd5\u7684\u6027\u80fd\u4e5f\u4f1a\u63d0\u9ad8\u3002\u6b64\u5916\uff0c\u89c2\u5bdf\u5982\u4f55\u4f7f\u7528\u975e\u76d1\u7763\u6570\u636e(\u5355\u72ec\u6765\u81ea\u56fe\u50cf\u548c\u5355\u72ec\u6765\u81ea\u6587\u672c)\u6765\u6539\u8fdb\u56fe\u50cf\u63cf\u8ff0\u65b9\u6cd5\u4e5f\u5f88\u6709\u8da3\u3002<\/p>\n\n\n\n<p class=\"has-light-pink-background-color has-background\">github\u5b9e\u73b0\uff1a<a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/sgrvinod\/Deep-Tutorials-for-PyTorch\" target=\"_blank\">h<\/a><a href=\"https:\/\/github.com\/sgrvinod\/Deep-Tutorials-for-PyTorch\">ttps:\/\/github.com\/sgrvinod\/Deep-Tutorials-for-PyTorch<\/a><\/p>\n\n\n\n<p>References<br>[1] A. Aker and R. Gaizauskas. Generating image descriptions using dependency relational patterns. In ACL, 2010.<br>[2] D. Bahdanau, K. Cho, and Y . Bengio. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2014.<br>[3] K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y . Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, 2014.<br>[4] J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, 2014.<br>[5] D. Elliott and F. Keller. Image description using visual dependency representations. In EMNLP, 2013.<br>[6] A. Farhadi, M. Hejrati, M. A. Sadeghi, P . Y oung, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.<br>[7] R. Gerber and H.-H. Nagel. Knowledge representation for the generation of quantified natural language descriptions of vehicle traffic in image sequences. In ICIP. IEEE, 1996.<br>[8] Y . Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In ECCV, 2014.<br>[9] A. Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850, 2013.<br>[10] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), 1997.<br>[11] M. Hodosh, P . Y oung, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 47, 2013.<br>[12] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In arXiv:1502.03167, 2015.<br>[13] A. Karpathy, A. Joulin, and L. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. NIPS, 2014.<br>[14] R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. In arXiv:1411.2539, 2014.<br>[15] R. Kiros and R. Z. R. Salakhutdinov. Multimodal neural language models. In NIPS Deep Learning Workshop, 2013.<br>[16] G. Kulkarni, V . Premraj, S. Dhar, S. Li, Y . Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. In CVPR, 2011.<br>[17] P . Kuznetsova, V . Ordonez, A. C. Berg, T. L. Berg, and Y . Choi. Collective generation of natural image descriptions. In ACL, 2012. [18] P . Kuznetsova, V . Ordonez, T. Berg, and Y . Choi. Treetalk: Composition and compression of trees for image descriptions. ACL, 2(10), 2014.<br>[19] S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y . Choi. Composing simple image descriptions using web-scale n-grams. In Conference on Computational Natural Language Learning, 2011.<br>[20] T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P . Perona, D. Ramanan, P . Doll\u00e1r, and C. L. Zitnick. Microsoft coco: Common objects in context. arXiv:1405.0312, 2014.<br>[21] J. Mao, W. Xu, Y . Yang, J. Wang, and A. Y uille. Explain images with multimodal recurrent neural networks. In arXiv:1410.1090, 2014. [22] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR, 2013.<br>[23] M. Mitchell, X. Han, J. Dodge, A. Mensch, A. Goyal, A. C. Berg, K. Yamaguchi, T. L. Berg, K. Stratos, and H. D. III. Midge: Generating image descriptions from computer vision detections. In EACL, 2012.<br>[24] V . Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.<br>[25] K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. BLEU: A method for automatic evaluation of machine translation. In ACL, 2002.<br>[26] C. Rashtchian, P . Y oung, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon\u2019s mechanical turk. In NAACL HLT Workshop on Creating Speech and Language Data with Amazon\u2019s Mechanical Turk, pages 139\u2013 147, 2010.<br>[27] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge, 2014.<br>[28] P . Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y . LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229, 2013.<br>[29] R. Socher, A. Karpathy, Q. V . Le, C. Manning, and A. Y . Ng. Grounded compositional semantics for finding and describing images with sentences. In ACL, 2014.<br>[30] I. Sutskever, O. Vinyals, and Q. V . Le. Sequence to sequence learning with neural networks. In NIPS, 2014.<br>[31] R. V edantam, C. L. Zitnick, and D. Parikh. CIDEr: Consensus-based image description evaluation. In arXiv:1411.5726, 2015.<br>[32] B. Z. Yao, X. Yang, L. Lin, M. W. Lee, and S.-C. Zhu. I2t: Image parsing to text description. Proceedings of the IEEE, 98(8), 2010.<br>[33] P . Y oung, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In ACL, 2014.<br>[34] W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. In arXiv:1409.2329, 2014.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Abstract\uff1a \u81ea\u52a8\u63cf\u8ff0\u56fe\u50cf\u7684\u5185\u5bb9\u662f\u8fde\u63a5\u8ba1\u7b97\u673a\u89c6\u89c9\u548c\u81ea\u7136\u8bed\u8a00\u5904\u7406\u7684\u4eba\u5de5\u667a\u80fd\u7684\u57fa\u672c\u95ee\u9898\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e86 &hellip; <a href=\"http:\/\/139.9.1.231\/index.php\/2021\/12\/26\/learning-cnn-lstm-architectures-for-image\/\" class=\"more-link\">\u7ee7\u7eed\u9605\u8bfb<span class=\"screen-reader-text\">Learning CNN-LSTM Architectures for Image\u8bba\u6587\u9605\u8bfb<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,9],"tags":[],"_links":{"self":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/259"}],"collection":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/comments?post=259"}],"version-history":[{"count":63,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/259\/revisions"}],"predecessor-version":[{"id":912,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/posts\/259\/revisions\/912"}],"wp:attachment":[{"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/media?parent=259"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/categories?post=259"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/139.9.1.231\/index.php\/wp-json\/wp\/v2\/tags?post=259"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}