- https://docs.python.org/3.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit
- https://en.wikipedia.org/wiki/Typographic_ligature
- https://docs.python.org/3/library/codecs.html
- https://en.wikipedia.org/wiki/Unicode_equivalence
- https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize
- https://github.com/Coleridge-Initiative/adrf-onto/wiki
- https://www.w3.org/TR/vocab-dcat/
- https://pav-ontology.github.io/pav/
- https://sparontologies.github.io/cito/current/cito.html
- https://sparontologies.github.io/fabio/current/fabio.html
- http://xmlns.com/foaf/spec/
- https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
- http://nlpprogress.com/english/entity_linking.html
- https://github.com/NYU-CI/RCDatasets
- https://www.crummy.com/software/BeautifulSoup/
- https://www.metachris.com/pdfx/
- https://github.com/DerwenAI/spaCy_tuTorial/blob/master/Extract_Text_from_PDF.ipynb
- https://colab.research.google.com/notebooks/welcome.ipynb#recent=true
- https://pypi.org/project/PyPDF2/
- https://github.com/euske/pdfminer
- https://tika.apache.org/
- https://allennlp.org/elmo
- https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
- https://medium.com/syncedreview/baidus-ernie-tops-google-s-bert-in-chinese-nlp-tasks-d6a42b49223d
- https://arxiv.org/pdf/1906.08237.pdf
- https://openai.com/blog/gpt-2-6-month-follow-up/
- https://ai.facebook.com/blog/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems/
- https://arxiv.org/abs/1910.01108
- https://arxiv.org/abs/1901.11504
- https://www.microsoft.com/en-us/research/blog/robust-language-representation-learning-via-multi-task-knowledge-distillation/
- https://www.microsoft.com/en-us/research/publication/improving-multi-task-deep-neural-networks-via-knowledge-distillation-for-natural-language-understanding/
- https://github.com/namisan/mt-dnn
- https://blog.jupyter.org/jupytercon-2018-nyc-august-21-25-5571d7454d5b
- https://drive.google.com/file/d/0By83v5TWkGjvQkpBcXJKT1I1TTA/view
- http://jupyter.org/
- https://arrow.apache.org/
- http://ericjonas.com/project/numpywren/
- https://rise.cs.berkeley.edu/projects/ray/
- https://twitter.com/parente/status/1099725144048762885
(no slide links)
(no transcription)
Paco Nathan
2019-11-19 11:02:09
An introduction to natural language work based on the spaCy library in Python.
Disclaimer: all trademarks, service marks, trade names, trade dress, product names, and logos appearing above are the property of their respective owners