國家寶藏松 - 後端需求

編輯歷史

時間 作者 版本
2017-08-09 19:44 – 19:44 Hsin Hsiao r1154 – r1157
顯示 diff
(16 行未修改)
*createdAt
*updatedAt
+ *completedAt
*status: enum['dispatched', 'complete', 'incomplete', 'error']
(49 行未修改)
2017-08-09 18:34 – 18:41 Hsin Hsiao r976 – r1153
顯示 diff
(8 行未修改)
*If everything above checks out, check our flag with `unstarted` `started but incomplete` `complete`
*dispatch unstarted and incomplete ones to volunteer client
+
+ NoSQL Table schema for TNT-Dispatch
+ *uid: unique id for the dispatch
+ *catalogId: unique id of the record in the TNT-Catalog
+ *userId: unique id of the requesting user/volunteer
+ *naId: associated naId of the record
+ *createdAt
+ *updatedAt
+ *status: enum['dispatched', 'complete', 'incomplete', 'error']
+
+ Endpoints of the Dispatch Server
+ *Request a dispatch: [Endpoint]/request
+ *Update a dispatch status: [Endpoint]/update
(44 行未修改)
2017-08-06 19:27 Hsin Hsiao r975
顯示 diff
(56 行未修改)
2017-01-10 02:44 – 03:54 Hsin Hsiao r502 – r974
顯示 diff
國家寶藏松 - 後端需求
+
+ 志工發配 server
+
+ Given a naId (every document has a unique naId) query its detail here https://catalog.archives.gov/api/v1/?naIds={naId}
+ *check if `results.result[0].objects` exists; if yes, it has already been digitized and the image files are under `results.result[0].objects.object` array
+ *if it hasn't been digitized, check if `results.result[0].description.series.fileUnit` exists and has a number > 1. If yes, that means there are multiple files under this naId and you need to query `https://catalog.archives.gov/api/v1/?description.fileUnit.parentSeries.naId={naId}` to get the sub files and their naIds.
+ *Next, need to check `accessRestriction.status.termName` under `description` or `description.fileUnit` . If it's `restricted` then we don't want to dispatch this naId to volunteer.
+ *If everything above checks out, check our flag with `unstarted` `started but incomplete` `complete`
+ *dispatch unstarted and incomplete ones to volunteer client
+
Document Index Server
(43 行未修改)
2016-12-11 02:04 – 02:36 Hsin Hsiao r471 – r501
顯示 diff
(35 行未修改)
.2
16 更新
+
OCR server is up and running at
https://nationa-treasure-vision.herokuapp.com/vision
+
+ Source Code
+ https://github.com/national-treasures-tw/vision
test by posting with { types: 'text', imageUrl: 'YOU_IMAGE_URL' }
+ please do not abuse this as we have only 1000 free request quota
2016-12-10 19:02 – 19:42 Hsin Hsiao r366 – r470
顯示 diff
(31 行未修改)
網站要工程師嗎?
+
+ 12.1
+ .2
+ 16 更新
+ OCR server is up and running at
+ https://nationa-treasure-vision.herokuapp.com/vision
+
+ test by posting with { types: 'text', imageUrl: 'YOU_IMAGE_URL' }
2016-11-15 21:50 – 21:50 Hsin Hsiao r364 – r365
顯示 diff
(2 行未修改)
Document Index Server
+ NARA API Github (query example)
Nation Archive Api recorder (first 100 row only out of 12600)
https://github.com/hsin421/tw-national-treasure
(26 行未修改)
2016-11-15 20:23 – 20:24 Hsin Hsiao r303 – r363
顯示 diff
(5 行未修改)
https://github.com/hsin421/tw-national-treasure
+ Suggested by Simon Liu
+ Digital archive management system (open source): Fedora commons
OCR SERVER
(21 行未修改)
2016-11-08 16:25 – 16:25 Hsin Hsiao r288 – r302
顯示 diff
國家寶藏松 - 後端需求
+
+ Document Index Server
+
+ Nation Archive Api recorder (first 100 row only out of 12600)
+ https://github.com/hsin421/tw-national-treasure
+
OCR SERVER
(21 行未修改)
2016-11-07 15:31 – 15:32 Hsin Hsiao r285 – r287
顯示 diff
(24 行未修改)
2016-11-06 20:56 – 20:56 雨蒼 林 r282 – r284
顯示 diff
(21 行未修改)
*或者可以先略過這個問題,因為字的順序不會影響關鍵字搜尋的結果。
- j;
+ 網站要工程師嗎?
2016-11-06 20:56 (unknown) r281
顯示 diff
(24 行未修改)
2016-11-06 20:56 雨蒼 林 r280
顯示 diff
(20 行未修改)
*寫一個app在OCR前先將翻拍文件旋轉一適當角度
*或者可以先略過這個問題,因為字的順序不會影響關鍵字搜尋的結果。
+
+ j;
2016-11-06 20:17 – 20:23 Ti-Yen Lan r277 – r279
顯示 diff
(22 行未修改)
2016-11-06 14:02 – 15:22 Ti-Yen Lan r44 – r276
顯示 diff
(9 行未修改)
申請API Key:
https://developers.google.com/api-client-library/python/guide/aaa_apikeys
+
+ 初步test:
+ *相對乾淨的文件:OCR結果
+ *大部分的內容都可以正確被抓到。
+ *含手寫內容的文件:OCR結果
+ *如果手寫內容太潦草或模糊,則沒辦法被抓到。
+ *翻拍時有陰影的文件:OCR結果
+ *可抓到陰影部分的字,但因翻拍角度不夠水平,有一些字和文件內的順序不一致。
+ *可能解決方案:
+ *寫一個app在OCR前先將翻拍文件旋轉一適當角度
+ *或者可以先略過這個問題,因為字的順序不會影響關鍵字搜尋的結果。
2016-11-05 20:49 – 20:54 Ti-Yen Lan r25 – r43
顯示 diff
(4 行未修改)
OCR with Python and Google Cloud Vision API reference:
https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d
+ repo:
+ https://github.com/tl578/g0v-nyc
+
+ 申請API Key:
+ https://developers.google.com/api-client-library/python/guide/aaa_apikeys
2016-11-05 20:02 – 20:46 Hsin Hsiao r8 – r24
顯示 diff
國家寶藏松 - 後端需求
- OCR with Python and Google Cloud Vision API:
+
+ OCR SERVER
+
+ OCR with Python and Google Cloud Vision API reference:
https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d
2016-11-05 20:02 Ti-Yen Lan r7
顯示 diff
國家寶藏松 - 後端需求
+ OCR with Python and Google Cloud Vision API:
+ https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d
2016-11-05 19:57 Hsin Hsiao r6
顯示 diff
國家寶藏松 - 後端需求
-
- This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!
2016-10-30 10:09 – 10:10 Hsin Hsiao r1 – r5
顯示 diff
- Untitled
+ 國家寶藏松 - 後端需求
This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!
2016-10-30 10:09 (unknown) r0
顯示 diff
+ Untitled
+ This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!