×

    Natural Language Processing Overview

    자연어처리(Natural Language Processing)의 큰 그림을 그려보자.

    Background

    Hypothesis

    분산 가설(distributional hypothesis)

    You shall know a word by the company it keeps

    Firth, 1957 (Studies in Linguistic Analysis)

    Meanings of words are (largely) determined by their distributional patterns (Distributional Hypothesis)

    Harris, 1968 (Mathematical Structures of Language)

    Words that occur in similar contexts will have similar meanings (Strong Contextual Hypothesis)

    Miller and Charles, 1991 (Language and Cognitive Processes)

    Various extensions…

    Ref

    Contexts

    기준이 되는 string 단위

    windows(n size), 문장, 문단, 문서 etc

    Lexical Features

    N-gram

    Dictionary based Tokenization

    Unsupervised Segmentation

    Unsupervised Segmentation

    Co-occurrences


    Models

    Bag of words

    Word Weighting

    Vector Space Model

    Context representations

    first order vector

    Term-Document Matrix

    second order vector

    Term-Co-occurence Matrix

    Dimensionality Reduction (차원축소)

    SVD (Singular Value Decomposition) and LSA (Latent semantic analysis)
    MDS (Multi-Dimensional Scaling)
    PCA (principal component analysis), unsupervised learning
    ICA (Independent Components Analysis)
    LDA (Linear Discriminant Analysis, Fisher’s LDA)
    LDA (Latent Dirichelt Allocation)

    Similarity

    Measuring Similarity

    Distance

    Generative model


    Semantics

    Word Embedding

    Word Embedding

    Sequence-to-Sequence


    Applications

    Collocations

    Collocations

    Topic Modeling

    Comparing Corpuses


    Measures

    Measures of Association

    Probability

    T-score

    Z-score

    Chi-Square Statistic (χ2)

    관찰값과 기대값 사이의 거리(Distance)

    \[\chi^2=\sum_{k=1}^{n} \frac{(O_k - E_k)^2}{E_k}\]

    log-likelihood ratio G2

    Information Theory

    Entropy

    KL divergence

    MI (Mutual information)


    Open problems

    Word-sense disambiguation


    Visualization

    Data Visualization


    Lectures

    자연어처리 강의들

    ... ... ... ...
    Back