國家寶藏松 - 後端需求
編輯歷史
| 時間 | 作者 | 版本 |
|---|---|---|
| 2017-08-09 19:44 – 19:44 | r1154 – r1157 | |
顯示 diff(16 行未修改)
*createdAt
*updatedAt
+ *completedAt
*status: enum['dispatched', 'complete', 'incomplete', 'error']
(49 行未修改)
|
||
| 2017-08-09 18:34 – 18:41 | r976 – r1153 | |
顯示 diff(8 行未修改)
*If everything above checks out, check our flag with `unstarted` `started but incomplete` `complete`
*dispatch unstarted and incomplete ones to volunteer client
+
+ NoSQL Table schema for TNT-Dispatch
+ *uid: unique id for the dispatch
+ *catalogId: unique id of the record in the TNT-Catalog
+ *userId: unique id of the requesting user/volunteer
+ *naId: associated naId of the record
+ *createdAt
+ *updatedAt
+ *status: enum['dispatched', 'complete', 'incomplete', 'error']
+
+ Endpoints of the Dispatch Server
+ *Request a dispatch: [Endpoint]/request
+ *Update a dispatch status: [Endpoint]/update
(44 行未修改)
|
||
| 2017-08-06 19:27 | r975 | |
顯示 diff(56 行未修改)
|
||
| 2017-01-10 02:44 – 03:54 | r502 – r974 | |
顯示 diff 國家寶藏松 - 後端需求
+
+ 志工發配 server
+
+ Given a naId (every document has a unique naId) query its detail here https://catalog.archives.gov/api/v1/?naIds={naId}
+ *check if `results.result[0].objects` exists; if yes, it has already been digitized and the image files are under `results.result[0].objects.object` array
+ *if it hasn't been digitized, check if `results.result[0].description.series.fileUnit` exists and has a number > 1. If yes, that means there are multiple files under this naId and you need to query `https://catalog.archives.gov/api/v1/?description.fileUnit.parentSeries.naId={naId}` to get the sub files and their naIds.
+ *Next, need to check `accessRestriction.status.termName` under `description` or `description.fileUnit` . If it's `restricted` then we don't want to dispatch this naId to volunteer.
+ *If everything above checks out, check our flag with `unstarted` `started but incomplete` `complete`
+ *dispatch unstarted and incomplete ones to volunteer client
+
Document Index Server
(43 行未修改)
|
||
| 2016-12-11 02:04 – 02:36 | r471 – r501 | |
顯示 diff(35 行未修改)
.2
16 更新
+
OCR server is up and running at
https://nationa-treasure-vision.herokuapp.com/vision
+
+ Source Code
+ https://github.com/national-treasures-tw/vision
test by posting with { types: 'text', imageUrl: 'YOU_IMAGE_URL' }
+ please do not abuse this as we have only 1000 free request quota
|
||
| 2016-12-10 19:02 – 19:42 | r366 – r470 | |
顯示 diff(31 行未修改)
網站要工程師嗎?
+
+ 12.1
+ .2
+ 16 更新
+ OCR server is up and running at
+ https://nationa-treasure-vision.herokuapp.com/vision
+
+ test by posting with { types: 'text', imageUrl: 'YOU_IMAGE_URL' }
|
||
| 2016-11-15 21:50 – 21:50 | r364 – r365 | |
顯示 diff(2 行未修改)
Document Index Server
+ NARA API Github (query example)
Nation Archive Api recorder (first 100 row only out of 12600)
https://github.com/hsin421/tw-national-treasure
(26 行未修改)
|
||
| 2016-11-15 20:23 – 20:24 | r303 – r363 | |
顯示 diff(5 行未修改)
https://github.com/hsin421/tw-national-treasure
+ Suggested by Simon Liu
+ Digital archive management system (open source): Fedora commons
OCR SERVER
(21 行未修改)
|
||
| 2016-11-08 16:25 – 16:25 | r288 – r302 | |
顯示 diff 國家寶藏松 - 後端需求
+
+ Document Index Server
+
+ Nation Archive Api recorder (first 100 row only out of 12600)
+ https://github.com/hsin421/tw-national-treasure
+
OCR SERVER
(21 行未修改)
|
||
| 2016-11-07 15:31 – 15:32 | r285 – r287 | |
顯示 diff(24 行未修改)
|
||
| 2016-11-06 20:56 – 20:56 | r282 – r284 | |
顯示 diff(21 行未修改)
*或者可以先略過這個問題,因為字的順序不會影響關鍵字搜尋的結果。
- j;
+ 網站要工程師嗎?
|
||
| 2016-11-06 20:56 | r281 | |
顯示 diff(24 行未修改)
|
||
| 2016-11-06 20:56 | r280 | |
顯示 diff(20 行未修改)
*寫一個app在OCR前先將翻拍文件旋轉一適當角度
*或者可以先略過這個問題,因為字的順序不會影響關鍵字搜尋的結果。
+
+ j;
|
||
| 2016-11-06 20:17 – 20:23 | r277 – r279 | |
顯示 diff(22 行未修改)
|
||
| 2016-11-06 14:02 – 15:22 | r44 – r276 | |
顯示 diff(9 行未修改)
申請API Key:
https://developers.google.com/api-client-library/python/guide/aaa_apikeys
+
+ 初步test:
+ *相對乾淨的文件:OCR結果
+ *大部分的內容都可以正確被抓到。
+ *含手寫內容的文件:OCR結果
+ *如果手寫內容太潦草或模糊,則沒辦法被抓到。
+ *翻拍時有陰影的文件:OCR結果
+ *可抓到陰影部分的字,但因翻拍角度不夠水平,有一些字和文件內的順序不一致。
+ *可能解決方案:
+ *寫一個app在OCR前先將翻拍文件旋轉一適當角度
+ *或者可以先略過這個問題,因為字的順序不會影響關鍵字搜尋的結果。
|
||
| 2016-11-05 20:49 – 20:54 | r25 – r43 | |
顯示 diff(4 行未修改)
OCR with Python and Google Cloud Vision API reference:
https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d
+ repo:
+ https://github.com/tl578/g0v-nyc
+
+ 申請API Key:
+ https://developers.google.com/api-client-library/python/guide/aaa_apikeys
|
||
| 2016-11-05 20:02 – 20:46 | r8 – r24 | |
顯示 diff 國家寶藏松 - 後端需求
- OCR with Python and Google Cloud Vision API:
+
+ OCR SERVER
+
+ OCR with Python and Google Cloud Vision API reference:
https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d
|
||
| 2016-11-05 20:02 | r7 | |
顯示 diff 國家寶藏松 - 後端需求
+ OCR with Python and Google Cloud Vision API:
+ https://gist.github.com/dannguyen/a0b69c84ebc00c54c94d
|
||
| 2016-11-05 19:57 | r6 | |
顯示 diff 國家寶藏松 - 後端需求
-
- This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!
|
||
| 2016-10-30 10:09 – 10:10 | r1 – r5 | |
顯示 diff- Untitled
+ 國家寶藏松 - 後端需求
This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!
|
||
| 2016-10-30 10:09 | r0 | |
顯示 diff+ Untitled
+ This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!
|
||