Exploring Data Science: A Journey Through Curiosity, Code, and Cybersecurity
Why I Fell in Love with Data Science
As a new software engineer currently pursuing my second Master’s degree in Computer Science with a focus on Data Science and Artificial Intelligence, I’ve embarked on an exciting journey into the world of Data Science over the past month and a half. Along the way, I’ve been diving into a range of projects that reflect my learning and growth in this dynamic field. This blog is my way of sharing those experiences, documenting my progress, and reflecting on what I’ve discovered so far. For those interested in exploring my full Data Science journey, including all the projects I’ve worked on, feel free to check out my main Data Science portfolio on GitHub.
Building a Learning Portfolio
One of the most important parts of my journey has been building a learning portfolio on GitHub. It’s my space to showcase everything I’ve been working on—from data analysis scripts to machine learning models. Each project represents a new challenge I’ve taken on and an opportunity to grow. But beyond just the projects, this portfolio is a testament to my ongoing commitment to lifelong learning—a principle that fuels my passion for Data Science.
For me, Data Science is more than just crunching numbers or building models. It’s about transforming raw data into actionable insights that can solve real-world problems. The field itself is always evolving, and I love the creative and curious mindset it requires. Every dataset offers a chance to explore, analyze, and learn, which is what keeps me motivated to continue pushing the boundaries of what’s possible with data.
The Accessibility of Data Science: A Double-Edged Sword
Data Science has become more accessible than ever. Tools like Python and libraries such as Pandas, NumPy, and Scikit-learn abstract away much of the complex mathematics, enabling motivated learners to dive right in without needing deep knowledge of statistics, linear algebra, or calculus. This lowers the barriers to entry and makes it possible for beginners to start building models and analyzing data right away.
However, while these tools make it easier to get started, understanding the math behind the models is essential for becoming a proficient Data Scientist. As a pragmatic engineer, you’ll want to understand the algorithms' foundations, enabling you to make informed decisions, troubleshoot effectively, and create more efficient models. Though Data Science today is generous in terms of getting started, true mastery comes from balancing convenience with a deeper understanding of the theory that powers these tools.
Key Influences on My Data Science Journey
Throughout my journey, several resources and influences have played a crucial role in shaping my approach to Data Science and coding in general. One such influential resource has been the book The Pragmatic Programmer. This book has profoundly impacted my learning, particularly in areas like system design, debugging, and developing a methodical thought process. Its principles have been invaluable in helping me approach problems with clarity and efficiency—skills that are critical in both Data Science and software development as a whole.
Additionally, I am incredibly fortunate to be surrounded by a team of peers who inspire and challenge me daily. The collective knowledge and support of my colleagues have been instrumental in pushing me to constantly improve. They serve as a constant reminder of the power of collaboration and community in achieving growth, and I am grateful for their influence on my work.
From Basics to Big Data: Understanding Data
I started my journey with exploratory data analysis (EDA)—learning how to clean and visualize data in ways that reveal underlying patterns and trends. From analyzing Netflix’s global content trends to exploring the volatility of Bitcoin prices, I was able to uncover insights that weren’t immediately obvious.
Favorite Projects:
Netflix EDA (Genres, Release Trends, Certifications)
Bitcoin Market Behavior (Volatility, Volume, Time-Based Analysis)
These initial projects helped me master tools like Pandas, NumPy, Matplotlib, and Plotly while honing my ability to ask better questions of the data.
Taking it Further with Machine Learning
As I got comfortable with data analysis, I started wanting to do more than just analyze the past. I wanted to predict the future. That’s where machine learning (ML) came in.
I worked on several projects that introduced me to ML concepts like model selection, training, and evaluation:
Customer Churn Prediction using models like Random Forest and XGBoost.
Sentiment Analysis of IMDB reviews using Natural Language Processing (NLP).
Spam Detection using Naive Bayes to classify messages as spam or ham.
These experiences showed me the nuances of building effective models, and they sparked my interest in understanding how algorithms make predictions.
The Fascination with Deep Learning
After grasping the basics of ML, I dove deeper into deep learning. I started with classic problems like the MNIST handwritten digits, but I was really captivated by the potential of deep learning in cybersecurity. I applied convolutional neural networks (CNNs) to classify malware by transforming binary data into images—a process that opened my eyes to the fusion of AI and security.
This sparked a new passion: AI Red Teaming.
Red Teaming AI: Where Data Science Meets Cybersecurity
I’ve always believed in the power of cross-disciplinary learning, which led me to complete the AI Red Teamer Job Role Path through Hack The Box and Google, see my Proof of Completion. This training gave me valuable insights into the vulnerabilities in AI systems—from prompt injection attacks to data poisoning—and provided me with the tools to ethically test and secure them.
Standout Projects:
Data Poisoning Attacks on spam classifiers.
Prompt Injection Experiments targeting large language models (LLMs).
Model Manipulation & Adversarial Attacks using red team tactics.
These efforts align with frameworks like Google’s Secure AI Framework (SAIF) and the OWASP Top 10 for LLMs, and they’re preparing me to contribute to the rapidly evolving field of AI security. If you’re interested in the intersection of AI and cybersecurity, explore my specialized repositories on Red Teaming AI and AI Prompt Injection for more detailed projects.
My Evolving Data Science Toolkit
Along the way, I’ve become fluent in a wide range of tools and techniques that power modern data science and AI research. These include:
Languages:
Python (my go-to for everything from data wrangling to prototyping ML models)
Core Libraries:
Pandas, NumPy, Scikit-learn, TensorFlow, Keras, NLTK, PyTorch
Specializations:
Exploratory Data Analysis, Machine Learning, Deep Learning, Reinforcement Learning, Natural Language Processing (NLP)
Data Visualization:
Seaborn, Plotly, Tableau—to transform raw numbers into engaging visual narratives.
Security & HPC:
CUDA for GPU acceleration, OpenAI Gym for RL environments, and Red Teaming TTPs for security-focused applications.
I continue to reinforce my learning with hands-on Jupyter notebooks, research papers, and technical books. Currently, I’m diving into Malware Data Science the book, where I bridge the gap between offensive security and AI, with plans to integrate Context Model Protocol (CMP) to automate software reverse engineering at scale through GHIDRA.
For a deeper dive into my work and to explore my latest projects, check out my GitHub repositories:
The Road Ahead
If you’re new to Data Science, or considering diving into the field, I encourage you to start small, experiment, and learn by doing. Don’t be afraid to fail forward. The journey has its obstacles, and I’ve encountered more than a few. But each challenge has contributed to my growth, and I’m excited to see where this path takes me.
As I continue my graduate studies, my focus is shifting toward the intersection of theory, mathematics, and practical application. I’m also exploring automated software reverse engineering by integrating GHIDRA with Data Science, which I believe will be transformative for cybersecurity and malware analysis.
The beauty of Data Science is that it’s an ever-evolving field—constantly changing with new tools, techniques, and insights. I’m not sure where this journey will lead, but that’s part of the excitement.
Stay curious, keep learning, and see where it takes you!