Sistem keamanan biometrik berbasis pengenalan wajah semakin banyak digunakan dalam berbagai sektor, seperti perbankan, akses keamanan, dan pemerintahan. Namun, teknologi ini menghadapi tantangan besar dalam mengatasi serangan spoofing, di mana penyerang menggunakan gambar, video, atau topeng 3D untuk menipu sistem autentikasi. Salah satu penyebab utama kegagalan dalam sistem anti-spoofing adalah kurangnya kemampuan generalisasi model deteksi terhadap berbagai skenario serangan. Selain itu, penggunaan dataset yang terbatas serta ketergantungan pada metode konvensional tanpa pemanfaatan teknologi seperti machine learning dan deep learning menjadi hambatan dalam pengembangan sistem yang lebih andal.
Pelatihan Text Analytics NLP Fundamentals ini dirancang untuk memberikan pemahaman mendalam mengenai konsep, implementasi, dan optimasi sistem anti-spoofing berbasis teknologi kontainer dan orkestrasi dengan Kubernetes serta Docker. Peserta akan diperkenalkan pada dasar-dasar kontainerisasi, mulai dari teori hingga implementasi aplikasi dalam lingkungan kontainer. Selanjutnya, pelatihan akan membahas bagaimana teknologi Kubernetes dan Docker dapat digunakan untuk mengelola infrastruktur sistem anti-spoofing secara lebih efisien dan skalabel. Pemahaman mengenai image management, overlay filesystem, serta container orchestration menjadi bagian krusial dalam pelatihan ini guna memastikan sistem tetap aman dan optimal.
Dalam sesi praktik, peserta akan mempelajari bagaimana membangun dan mengelola klaster Kubernetes untuk mendukung sistem anti-spoofing yang terdistribusi. Materi pelatihan mencakup instalasi Kubernetes, pengelolaan volume, penggunaan ConfigMaps dan Secrets, serta penerapan Ingress untuk meningkatkan keamanan komunikasi antar layanan. Selain itu, peserta akan mempelajari konsep Docker Swarm dan penerapan high availability untuk memastikan sistem tetap beroperasi meskipun terjadi gangguan pada beberapa node. Dengan pendekatan yang komprehensif ini, peserta akan memiliki keterampilan teknis yang dibutuhkan untuk merancang, mengimplementasikan, dan mengelola sistem anti-spoofing yang canggih, aman, serta dapat diskalakan dengan baik dalam lingkungan produksi.
OBJECTIVES
1. Memahami Dasar-Dasar Text Analytics & NLP
2. Mampu Mengenal Teknik Pemrosesan Teks
3. Mampu Menganalisis Struktur dan Makna Teks
4. Mengenal Teknologi dan Alat NLP Modern
5. Mampu Meningkatkan Kualitas Model NLP dengan Data yang Optimal
AUDIENCE
1. Data Scientist
2. Business Analyst
3. Software Developer
4. Cybersecurity Analyst
PREREQUISITES
1.Tidak ada training khusus yang dipersyaratkan
CONTENT
1. Gaining Early Insights from Textual Data
1.1 Exploratory Data Analysis
1.2 Introducing the Dataset
1.3 Blueprint: Getting an Overview of the Data with Pandas
1.4 Blueprint: Building a Simple Text Preprocessing Pipeline
1.5 Blueprints for Word Frequency Analysis
1.6 Blueprint: Finding a Keyword-in-Context
1.7 Blueprint: Analyzing N-Grams
1.8 Blueprint: Comparing Frequencies Across Time Intervals and Categories
2. Extracting Textual Insights with APIs
2.1 Application Programming Interfaces
2.2 Blueprint: Extracting Data from an API Using the Requests Module
2.3 Blueprint: Extracting Twitter Data with Tweepy
3. Scraping Websites and Extracting Data
3.1 Scraping and Data Extraction
3.2 Introducing the Reuters News Archive
3.3 URL Generation
3.4 Blueprint: Downloading and Interpreting robots.txt
3.5 Blueprint: Finding URLs from sitemap.xml
3.6 Blueprint: Finding URLs from RSS
3.7 Downloading Data
3.8 Blueprint: Downloading HTML Pages with Python
3.9 Blueprint: Downloading HTML Pages with wget
3.10 Extracting Semistructured Data
3.11 Blueprint: Extracting Data with Regular Expressions
3.12 Blueprint: Using an HTML Parser for Extraction
3.13 Blueprint: Spidering
3.14 Density-Based Text Extraction
3.15 All-in-One Approach
3.16 Blueprint: Scraping the Reuters Archive with Scrapy
3.17 Possible Problems with Scraping
4. Preparing Textual Data for Statistics and Machine Learning
4.1 A Data Preprocessing Pipeline
4.2 Introducing the Dataset: Reddit Self-Posts
4.3 Cleaning Text Data
4.4 Tokenization
4.5 Linguistic Processing with spaCy
4.6 Feature Extraction on a Large Dataset
4.7 There Is More
5. Feature Engineering and Syntactic Similarity
5.1 A Toy Dataset for Experimentation
5.2 Blueprint: Building Your Own Vectorizer
5.3 Bag-of-Words Models
5.4 TF-IDF Models
5.5 Syntactic Similarity in the ABC Dataset
6. Text Classication Algorithms
6.1 Introducing the Java Development Tools Bug Dataset
6.2 Blueprint: Building a Text Classification System
6.3 Final Blueprint for Text Classification
6.4 Blueprint: Using Cross-Validation to Estimate Realistic Accuracy Metrics
6.5 Blueprint: Performing Hyperparameter Tuning with Grid Search
6.6 Blueprint Recap and Conclusion
7. How to Explain a Text Classier
7.1 Blueprint: Determining Classification Confidence Using Prediction Probability
7.2 Blueprint: Measuring Feature Importance of Predictive Models
7.3 Blueprint: Using LIME to Explain the Classification Results
7.4 Blueprint: Using ELI5 to Explain the Classification Results
7.5 Blueprint: Using Anchor to Explain the Classification Results
8. Unsupervised Methods: Topic Modeling and Clustering
8.1 Our Dataset: UN General Debates
8.2 Nonnegative Matrix Factorization (NMF)
8.3 Latent Semantic Analysis/Indexing
8.4 Latent Dirichlet Allocation
8.5 Blueprint: Using Word Clouds to Display and Compare Topic Models
8.6 Blueprint: Calculating Topic Distribution of Documents and Time Evolution
8.7 Using Gensim for Topic Modeling
8.8 Blueprint: Using Clustering to Uncover the Structure of Text Data
8.9 Further Ideas
9. Text Summarization
9.1 Text Summarization
9.2 Blueprint: Summarizing Text Using Topic Representation
9.3 Blueprint: Summarizing Text Using an Indicator Representation
9.4 Measuring the Performance of Text Summarization Methods
9.5 Blueprint: Summarizing Text Using Machine Learning
10. Exploring Semantic Relationships with Word Embeddings
10.1 The Case for Semantic Embeddings
10.2 Blueprint: Using Similarity Queries on Pretrained Models
10.3 Blueprints for Training and Evaluating Your Own Embeddings
10.4 Blueprints for Visualizing Embeddings
11. Performing Sentiment Analysis on Text Data
11.1 Sentiment Analysis
11.2 Introducing the Amazon Customer Reviews Dataset
11.3 Blueprint: Performing Sentiment Analysis Using Lexicon-Based Approaches
11.4 Supervised Learning Approaches
11.5 Blueprint: Vectorizing Text Data and Applying a Supervised Machine Learning Algorithm
11.6 Pretrained Language Models Using Deep Learning
11.7 Blueprint: Using the Transfer Learning Technique and a Pretrained Language Model
12. Building a Knowledge Graph
12.1 Knowledge Graphs
12.2 Introducing the Dataset
12.3 Named-Entity Recognition
12.4 Coreference Resolution
12.5 Blueprint: Creating a Co-Occurrence Graph
12.6 Relation Extraction
12.7 Creating the Knowledge Graph
13. Using Text Analytics in Production
13.1 Blueprint: Using Conda to Create Reproducible Python Environments
13.2 Blueprint: Using Containers to Create Reproducible Environments
13.3 Blueprint: Creating a REST API for Your Text Analytics Model
13.4 Blueprint: Deploying and Scaling Your API Using a Cloud Provider
13.5 Blueprint: Automatically Versioning and Deploying Builds
Course Features
- Lectures 0
- Quizzes 0
- Duration 4 days
- Skill level All levels
- Language English
- Students 0
- Certificate No
- Assessments Yes