Differences

This shows you the differences between two versions of the page.

--- ai:nlp:start [2021/09/15 12:05] – add literature overview sarah001
+++ ai:nlp:start [2023/06/30 11:52] (current) – [Notes on NLP] rolf.becker
@@ Line 1: / Line 1: @@
 ====== Notes on NLP ======
+The next cool thing: NVIDIA RIVA + RASA ChatBot (2022):
+  * https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/sample-apps/virtual-assistant-rasa/README.html
+Two architectural options because of overlapping services:
+  * Option 1: Riva ASR + Riva TTS + Riva NLP + Rasa dialog manager ("more RIVA")
+  * Option 2: Riva ASR + Riva TTS + Rasa NLU + Rasa dialog manager ("more RASA")
+Alternatives to be investigated:
+  * Whisper
+  * Llama
+  * [[https://github.com/mozilla/DeepSpeech|Mozilla DeepSpeech]]
+  * https://fosspost.org/open-source-speech-recognition/
+  * https://medium.com/analytics-vidhya/top-5-speech-recognition-open-source-projects-and-libraries-with-most-stars-on-github-d705408b834
+Abbreviations:
+  * ASR: Automatic Speech Recognition
+  * TTS: Text to Speech
+  * NLU: Natural Language Understanding
+  * NLG: Natural Language Generation
+  * NLP: Natural Language Processing
 ===== Papers / Websites =====
@@ Line 47: / Line 76: @@
 | Rogers et al. (2020) | [[https://arxiv.org/pdf/2002.12327.pdf|A Primer in BERTology: What We Know About How BERT Works]] | - | This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about **how BERT works**, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression. |
 | Brown et al. (2020) | [[https://arxiv.org/pdf/2005.14165.pdf|Language Models are Few-Shot Learners]] | https://github.com/openai/gpt-3 | Demonstration that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train **GPT-3**, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. |
-| Schick and Schütze (2020) | [[https://arxiv.org/pdf/2009.07118.pdf|It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners]] | https://github.com/timoschick/pet | We show that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. |
+| Schick and Schütze (2020) | [[https://arxiv.org/pdf/2009.07118.pdf|It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners]] | https://github.com/timoschick/pet | We show that **performance similar to GPT-3** can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. |
 | Jaegle et al. (2021) | [[https://arxiv.org/pdf/2107.14795.pdf|Perceiver IO: A General Architecture for Structured Inputs & Outputs]] | https://github.com/deepmind/deepmind-research/tree/master/perceiver | The recently-proposed Perceiver model obtains good results on several domains (images, audio, multimodal, point clouds) while scaling linearly in compute and memory with the input size. While the Perceiver supports many kinds of inputs, it can only produce very simple outputs such as class scores. **Perceiver IO** overcomes this limitation without sacrificing the original’s appealing properties by learning to flexibly query the model’s latent space to produce outputs of arbitrary size and semantics. |
+| Callaghan et al. (2021)  | [[https://www.nature.com/articles/s41558-021-01168-6|Machine-learning-based evidence and attribution mapping of 100,000 climate impact studies]]  | -  | Increasing evidence suggests that climate change impacts are already observed around the world. Global environmental assessments face challenges to appraise the growing literature. Here the language model **BERT was used** to identify and classify studies on observed climate impacts, producing a comprehensive machine-learning-assisted evidence map.   |
 ===== Specific overview =====
@@ Line 62: / Line 92: @@
 ^  Author      ^  Title       ^  Link to code          ^  Abstract (short)  ^
-| Anthofer (2017) | [[https://diglib.tugraz.at/download.php?id=5aa2461eb16a0&location=browse|A Neural Network for Open Information Extraction from German Text]] | https://github.com/danielanthofer/nnoiegt | Systems that extract information from natural language texts usually need to consider language-dependent aspects like vocabulary and grammar. Compared to the develop ment of individual systems for different languages, development of multilingual information extraction (IE) systems has the potential to reduce cost and effort. One path towards IE from different languages is to port an IE system from one language to another. PropsDE is an open IE (OIE) system that has been ported from the English system PropS to the German language. |
+| Anthofer (2017) | [[https://diglib.tugraz.at/download.php?id=5aa2461eb16a0&location=browse|A Neural Network for Open Information Extraction from German Text]]  | https://github.com/danielanthofer/nnoiegt | Systems that extract information from natural language texts usually need to consider language-dependent aspects like vocabulary and grammar. Compared to the develop ment of individual systems for different languages, development of multilingual information extraction (IE) systems has the potential to reduce cost and effort. One path towards IE from different languages is to port an IE system from one language to another. PropsDE is an open IE (OIE) system that has been ported from the English system PropS to the German language. |
-| Riedl and Padó (2018) |  [[https://aclanthology.org/P18-2020.pdf|A Named Entity Recognition Shootout for German]] | https://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/german-ner/ | We ask how to practically build a model for German named entity recognition (NER) that performs at the state of the art for both contemporary and historical texts, i.e., a big-data and a small-data scenario. |
+| Riedl and Padó (2018) | [[https://aclanthology.org/P18-2020.pdf|A Named Entity Recognition Shootout for German]]  | https://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/german-ner/ | We ask how to practically build a model for German named entity recognition (NER) that performs at the state of the art for both contemporary and historical texts, i.e., a big-data and a small-data scenario. |
-| Torge et al. (2021) |  [[https://ieeexplore.ieee.org/document/9357262|Transfer Learning for Domain-Specific Named Entity Recognition in German]] | - | Investigation of different transfer learning approaches to recognize unknown domain-specific entities, including the influence on varying training data size. |
+| Torge et al. (2021) | [[https://ieeexplore.ieee.org/document/9357262|Transfer Learning for Domain-Specific Named Entity Recognition in German]]  | - | Investigation of different transfer learning approaches to recognize unknown domain-specific entities, including the influence on varying training data size. |
+====== Links to Websites and Videos ======
+^  Author      ^  Title       ^  Link          ^  Information  ^
+| Manning et al. (2008)  | Introduction to Information Retrieval  | https://nlp.stanford.edu/IR-book/html/htmledition/irbook.html | Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). |
+| Olah and Carter (2016)  | Attention and Augmented Recurrent Neural Networks  | https://distill.pub/2016/augmented-rnns/ | Recurrent neural networks are one of the staples of deep learning, allowing neural networks to work with sequences of data like text, audio and video. They can be used to boil a sequence down into a high-level understanding, to annotate sequences, and even to generate new sequences from scratch. |
+| Alexander Rush  | The Annotated Transformer  | https://nlp.seas.harvard.edu/2018/04/03/attention.html | In this post Alexander Rush presents an “annotated” version of the paper in the form of a line-by-line implementation. He has reordered and deleted some sections from the original paper and added comments throughout. This document itself is a working notebook, and should be a completely usable implementation. In total there are 400 lines of library code which can process 27,000 tokens per second on 4 GPUs. |
+| Ruder (2018)  | NLP’s ImageNet moment has arrived  | https://thegradient.pub/nlp-imagenet/  | Big changes are underway in the world of Natural Language Processing (NLP). The long reign of word vectors as NLP’s core representation technique has seen an exciting new line of challengers emerge: ELMo, ULMFiT, and the OpenAI transformer. These works made headlines by demonstrating that pretrained language models can be used to achieve state-of-the-art results on a wide range of NLP tasks.  |
+| Garbade (2018)  | A Simple Introduction to Natural Language Processing  | https://becominghuman.ai/a-simple-introduction-to-natural-language-processing-ea66a1747b32  | This post gives a simple introduction to Natural Language Processing. |
+| Jay Alammar  | Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)  | https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/  | Sequence-to-sequence models are deep learning models that have achieved a lot of success in tasks like machine translation, text summarization, and image captioning.  |
+| Jay Alammar  | The Illustrated Transformer  | http://jalammar.github.io/illustrated-transformer/  | In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformers outperforms the Google Neural Machine Translation model in specific tasks.  |
+| Jay Alammar  | The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)  | https://jalammar.github.io/illustrated-bert/  | This post gives an introduction and overview of the BERT model and Transfer Learning.  |
+| Jay Alammar  | A Visual Guide to Using BERT for the First Time | https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/  | This post is a simple tutorial for how to use a variant of BERT to classify sentences. This is an example that is basic enough as a first intro, yet advanced enough to showcase some of the key concepts involved.  |
+| Schreiner (2018)  | Deepmind: Mit Perceiver IO auf dem Weg zur Multi-KI  | https://mixed.de/deepmind-mit-perceiver-io-auf-dem-weg-zur-multi-ki/  | Deepmind stellt Perceiver IO vor, ein echtes Multitalent unter den neuronalen Netzen. Es könnte die weit verbreitete Transformer-Architektur ablösen.  |
+| Sanagapati (2020)  | Knowledge Graph & NLP Tutorial - (BERT, spaCy, NLTK)  | https://www.kaggle.com/pavansanagapati/knowledge-graph-nlp-tutorial-bert-spacy-nltk  | This post is an introduction to NLP and Knowledge Graphs and also a tutorial how to use BERT, spaCy and NLTK. |
+| Sebastian Raschka | Transformers from the Ground Up - Sebastian Raschka at PyData Jeddah | https://www.youtube.com/watch?v=OGGhpLBeCuI | VIDEO - This talk will explain how transformers work. Then, some popular transformers like GPT and BERT will be examined and their differences will be outlined. Equipped with this understanding, it will be explained how fine-tuning of a BERT model for sentiment classification in Python works. |
+| Komarraju (2021)  | DeepMind’s Perceiver IO is Now an Open-Source Deep Learning Model  | https://www.analyticsinsight.net/deepminds-perceiver-io-is-now-an-open-source-deep-learning-model/  | To leverage developments in deep learning, DeepMind has open-sourced Perceiver IO. It’s a general-purpose deep learning model architecture for various types of inputs and outputs. As described on DeepMind’s blog, Perceiver IO can serve as a replacement for transformers, using attention to map inputs into a latent representation space. Eliminating the drawbacks of a transformer, Perceiver IO facilitates longer input sequences without incurring quadratic compute and memory loss.  |
+| Bastian (2021)  | KI-Start-up Cohere will Sprach-KI zum Massenmarkt machen  | https://mixed.de/sprach-ki-millionen-invest-fuer-gpt-3-konkurrenz/  | Das US-Start-up Cohere widmet sich der Entwicklung fortschrittlicher Sprach-KI und geht in den Wettbewerb mit etablierten großen Playern wie OpenAI. Es startet mit reichlich Rückenwind.  |
+| Akash (2021)  | “Ok, Google!”— Speech to Text in Python with Deep Learning in 2 minutes  | https://www.analyticsvidhya.com/blog/2021/09/ok-google-speech-to-text-in-python-with-deep-learning-in-2-minutes/  | This blog post is a tutorial to build a very simple speech recognition system that takes our voice as input and produces the corresponding text by hearing the input.  |
+| Hugging Face   | Transformers  | https://huggingface.co/transformers/  | This page gives an overview about the transformer architecture and the models provided by Hugging Face.  |
+| KDnuggets  | Text Preprocessing Methods for Deep Learning  | https://www.kdnuggets.com/2021/09/text-preprocessing-methods-deep-learning.html  | This post focuses on the pre-processing pipeline for NLP tasks like classification.  |
+| Wiggers (2021)  | Microsoft and Nvidia team up to train one of the world’s largest language models  | https://venturebeat.com/2021/10/11/microsoft-and-nvidia-team-up-to-train-one-of-the-worlds-largest-language-models/  | Microsoft and Nvidia announced that they trained what they claim is the largest and most capable AI-powered language model to date: Megatron-Turing Natural Language Generation (MT-NLP). The successor to the companies’ Turing NLG 17B and Megatron-LM models, MT-NLP contains 530 billion parameters and achieves “unmatched” accuracy in a broad set of natural language tasks, Microsoft and Nvidia say — including reading comprehension, commonsense reasoning, and natural language inferences.  |
+| Tang (2021)  | DeepSpeech for Dummies - A Tutorial and Overview  | https://www.assemblyai.com/blog/deepspeech-for-dummies-a-tutorial-and-overview-part-1/  | This post shows basic examples of how to use DeepSpeech for asynchronous and real time transcription.  |
+| Dickson (2021)  | What are graph neural networks (GNN)?  | https://bdtechtalks.com/2021/10/11/what-is-graph-neural-network/  | Basically, anything that is composed of linked entities can be represented as a graph. Graphs are excellent tools to visualize relations between people, objects, and concepts. Beyond visualizing information, however, graphs can also be good sources of data to train machine learning models for complicated tasks. This article gives an overview of how graph neural networks (GNN) can be used to extract important information from graphs and make useful predictions. |