Praktik analisis data tidak terstruktur menggunakan machine learning berbasis python serta penerapannya pada bidang Natural Language Processing (NLP). Dilengkapi pembahasan studi kasus penggunaan python, penerapan data cleaning, penerapan data visualization, pengenalan & penerapan NLP, serta pengenalan beberapa algoritma unsupervised learning untuk analisis data teks.
CONTENT
1. Gaining Early Insights from Textual Data
1.1 Exploratory Data Analysis
1.2 Introducing the Dataset
1.3 Blueprint: Getting an Overview of the Data with Pandas
1.4 Blueprint: Building a Simple Text Preprocessing Pipeline
1.5 Blueprints for Word Frequency Analysis
1.6 Blueprint: Finding a Keyword-in-Context
1.7 Blueprint: Analyzing N-Grams
1.8 Blueprint: Comparing Frequencies Across Time Intervals and Categories
2. Extracting Textual Insights with APIs
2.1 Application Programming Interfaces
2.2 Blueprint: Extracting Data from an API Using the Requests Module
2.3 Blueprint: Extracting Twitter Data with Tweepy
3. Scraping Websites and Extracting Data
3.1 Scraping and Data Extraction
3.2 Introducing the Reuters News Archive
3.3 URL Generation
3.4 Blueprint: Downloading and Interpreting robots.txt
3.5 Blueprint: Finding URLs from sitemap.xml
3.6 Blueprint: Finding URLs from RSS
3.7 Downloading Data
3.8 Blueprint: Downloading HTML Pages with Python
3.9 Blueprint: Downloading HTML Pages with wget
3.10 Extracting Semistructured Data
3.11 Blueprint: Extracting Data with Regular Expressions
3.12 Blueprint: Using an HTML Parser for Extraction
3.13 Blueprint: Spidering
3.14 Density-Based Text Extraction
3.15 All-in-One Approach
3.16 Blueprint: Scraping the Reuters Archive with Scrapy
3.17 Possible Problems with Scraping
4. Preparing Textual Data for Statistics and Machine Learning
4.1 A Data Preprocessing Pipeline
4.2 Introducing the Dataset: Reddit Self-Posts
4.3 Cleaning Text Data
4.4 Tokenization
4.5 Linguistic Processing with spaCy
4.6 Feature Extraction on a Large Dataset
4.7 There Is More
5. Feature Engineering and Syntactic Similarity
5.1 A Toy Dataset for Experimentation
5.2 Blueprint: Building Your Own Vectorizer
5.3 Bag-of-Words Models
5.4 TF-IDF Models
5.5 Syntactic Similarity in the ABC Dataset
6. Text Classication Algorithms
6.1 Introducing the Java Development Tools Bug Dataset
6.2 Blueprint: Building a Text Classification System
6.3 Final Blueprint for Text Classification
6.4 Blueprint: Using Cross-Validation to Estimate Realistic Accuracy Metrics
6.5 Blueprint: Performing Hyperparameter Tuning with Grid Search
6.6 Blueprint Recap and Conclusion
7. How to Explain a Text Classier
7.1 Blueprint: Determining Classification Confidence Using Prediction Probability
7.2 Blueprint: Measuring Feature Importance of Predictive Models
7.3 Blueprint: Using LIME to Explain the Classification Results
7.4 Blueprint: Using ELI5 to Explain the Classification Results
7.5 Blueprint: Using Anchor to Explain the Classification Results
8. Unsupervised Methods: Topic Modeling and Clustering
8.1 Our Dataset: UN General Debates
8.2 Nonnegative Matrix Factorization (NMF)
8.3 Latent Semantic Analysis/Indexing
8.4 Latent Dirichlet Allocation
8.5 Blueprint: Using Word Clouds to Display and Compare Topic Models
8.6 Blueprint: Calculating Topic Distribution of Documents and Time Evolution
8.7 Using Gensim for Topic Modeling
8.8 Blueprint: Using Clustering to Uncover the Structure of Text Data
8.9 Further Ideas
9. Text Summarization
9.1 Text Summarization
9.2 Blueprint: Summarizing Text Using Topic Representation
9.3 Blueprint: Summarizing Text Using an Indicator Representation
9.4 Measuring the Performance of Text Summarization Methods
9.5 Blueprint: Summarizing Text Using Machine Learning
10. Exploring Semantic Relationships with Word Embeddings
10.1 The Case for Semantic Embeddings
10.2 Blueprint: Using Similarity Queries on Pretrained Models
10.3 Blueprints for Training and Evaluating Your Own Embeddings
10.4 Blueprints for Visualizing Embeddings
11. Performing Sentiment Analysis on Text Data
11.1 Sentiment Analysis
11.2 Introducing the Amazon Customer Reviews Dataset
11.3 Blueprint: Performing Sentiment Analysis Using Lexicon-Based Approaches
11.4 Supervised Learning Approaches
11.5 Blueprint: Vectorizing Text Data and Applying a Supervised Machine Learning Algorithm
11.6 Pretrained Language Models Using Deep Learning
11.7 Blueprint: Using the Transfer Learning Technique and a Pretrained Language Model
12. Building a Knowledge Graph
12.1 Knowledge Graphs
12.2 Introducing the Dataset
12.3 Named-Entity Recognition
12.4 Coreference Resolution
12.5 Blueprint: Creating a Co-Occurrence Graph
12.6 Relation Extraction
12.7 Creating the Knowledge Graph
13. Using Text Analytics in Production
13.1 Blueprint: Using Conda to Create Reproducible Python Environments
13.2 Blueprint: Using Containers to Create Reproducible Environments
13.3 Blueprint: Creating a REST API for Your Text Analytics Model
13.4 Blueprint: Deploying and Scaling Your API Using a Cloud Provider
13.5 Blueprint: Automatically Versioning and Deploying Builds
Course Features
- Lectures 0
- Quizzes 0
- Duration 4 days
- Skill level All levels
- Language English
- Students 0
- Certificate No
- Assessments Yes