Washington D.C.  ·  George Washington University

Sayan
Patra

Data Scientist ML Engineer Business Analyst Researcher

MS in Data Science  ·  he/him

Scroll

About

Building at the edge of
theory and impact.

I'm Sayan Patra — a data science professional with a strong mathematical foundation and hands-on expertise in machine learning, financial modelling, and business strategy.

My research revolves around Diffusion Models, PINNs, Manifold Dimension Estimation, Stochastic Differential Equations, and Topological Data Analysis.

Currently open to research collaborations — especially on Diffusion Model Enhancements and MDE. Contact me directly.

The George Washington University · Washington, D.C.

Master of Science in Data Science

University of Calcutta · India

Bachelor of Science (Hons) in Mathematics

679+
Contributions last year
15+
Repositories
4+
Research abstracts
8+
Domains explored

Research

Abstracts &
ongoing work.

Ideas in progress — papers I'm writing, questions I can't stop asking.

Diffusion ModelsPINNsManifold Dimension EstimationResidual AnalysisSDETDANLPComputer VisionGraph Neural NetworksTime SeriesPredictive ModellingTheoretical Deep Learning
01

CANISNET: Diffusion Model Enhancements via ResNet50 Integration

Graduate research investigating hybrid architectures combining diffusion generative priors with ResNet50 feature extractors for improved image synthesis quality and training stability. Exploring convergence under stochastic noise schedules and manifold structure in latent space.

Diffusion ModelsResNet50PyTorchGenerative AI
Active
02

Generalised Partial Autocorrelation (GPAC) Package

Building a Python package for Generalised Partial Autocorrelation — extending classical PACF methods to nonlinear and non-stationary time series settings for more robust signal analysis.

Time SeriesStatisticsPython Package
Active
03

Diffusion State Modelling for Sequential Decision Making

Applying diffusion-based state representations to reinforcement learning environments, aiming to improve policy stability and sample efficiency in high-dimensional state spaces.

Diffusion ModelsRLSDE
Draft
04

Topological Data Analysis for High-Dimensional Financial Signal Detection

Applying persistent homology and TDA tools to high-frequency trading data, identifying structural patterns invisible to classical statistical methods and building persistence diagrams as features for downstream classifiers.

TDAPersistent HomologyFinance
In Review

Technical Skills

The craft
I wield.

Languages
Python
R
SQL
MATLAB
Bash / Shell
ML Frameworks
PyTorch
TensorFlow / Keras
Scikit-learn
HuggingFace
XGBoost / LightGBM
Data & Visualization
Pandas / NumPy / SciPy
Tableau
Matplotlib / Seaborn
OpenCV
Plotly / Dash
Research & Math
GUDHI (TDA)
Statsmodels
NetworkX / PyG
Jupyter / Colab
LaTeX
Cloud & MLOps
AWS / GCP
Docker
MLflow
FastAPI
Git / GitHub
Databases
PostgreSQL / MySQL
MongoDB
Pinecone (Vector DB)
Redis
Spark / BigQuery

Work

Projects that
move the needle.

Public repos = completed work. Private repos = in-progress research. Toggle visibility below.

Click "Edit Visibility" to show hide/show buttons on each project card.

Hidden — click Edit Visibility to restore

Graduate Research · Diffusion Models

CANISNET

Diffusion Models + ResNet50 implementation. Hybrid generative architecture investigating improved image synthesis quality, training convergence under stochastic noise schedules, and manifold structure in latent space.

Public · Completed
PythonPyTorchDiffusersResNet50
GitHub →
Hidden — click Edit Visibility to restore

NLP · Political Rhetoric

US President Speech Analysis

Final NLP project analyzing political rhetoric patterns across US presidential speeches. Sentiment evolution, topic modelling, and linguistic fingerprinting.

Public · Completed
PythonNLTKTransformers
GitHub →
Hidden — click Edit Visibility to restore

Data Analysis · Transportation

Capital Bikeshare DC

End-to-end data analysis and predictive modelling on Capital Bikeshare DC dataset. Demand forecasting, usage clustering, and geospatial flow visualization.

Public · Completed
PythonHTMLFoliumScikit-learn
GitHub →
Hidden — click Edit Visibility to restore

Deep Learning · MIT 6.S191

MIT Intro to Deep Learning

Lab materials and projects from MIT 6.S191: Introduction to Deep Learning — completed in-person. Covers CNNs, RNNs, GANs, reinforcement learning, and beyond.

Public · Completed
JupyterTensorFlowPython
GitHub →
Hidden — click Edit Visibility to restore

Computer Vision · Satellite · Abstract

Cloud Mapping with Diffusers

Geospatial cloud detection using diffusion-based image generation combined with satellite imagery. Updated 2 days ago — active research.

Private · In Progress
PythonDiffusersOpenCV
Private Repo
Hidden — click Edit Visibility to restore

Visualization · Tableau · Abstract

Data Visualization Research

Research on Visualisation — used platform: Tableau. Graduate level. 51 commits in February 2026 alone. 1 open issue.

Private · In Progress
TableauJupyterPython
Private Repo
Hidden — click Edit Visibility to restore

Robotics · Research · Abstract

Neural Code Robotics

Research-level neural architecture for robotics control systems. Bridging deep learning representations with physical system constraints. 17 commits this month.

Private · In Progress
PythonPyTorchROS
Private Repo
Hidden — click Edit Visibility to restore

Finance · RL · Abstract

AGQTS — Autonomous Trading Agent

An autonomous trading agent leveraging reinforcement learning and quantitative signals. Connects financial domain expertise with modern RL methods.

Private · In Progress
PythonRLQuant Finance
Private Repo
Hidden — click Edit Visibility to restore

Diffusion · RL · Abstract

Diffusion State

Diffusion-based state representations for sequential decision-making. GPL v3 licensed research project exploring policy stability and sample efficiency.

Private · In Progress
PythonDiffusersRL
Private Repo
Hidden — click Edit Visibility to restore

Statistics · Python Package · Abstract

GPAC — Generalised Partial Autocorrelation

A package for Generalised Partial Autocorrelation extending classical PACF methods to nonlinear and non-stationary time series settings.

Private · In Progress
PythonStatisticsTime Series
Private Repo
Hidden — click Edit Visibility to restore

NLP · Healthcare · Abstract

Trauma-Informed Care Chatbot

Chatbot based on Trauma Informed Care principles. Collaborative project with @ichaudh bridging NLP and healthcare communication design.

Private · In Progress
PythonNLPLLM
Private Repo

Side Quests

Weird experiments
& playful builds.

Not everything needs to ship. Sometimes the best learning starts with play.

01

Mood-to-Music ML

A model that generates Spotify playlists from facial expression via webcam and transfer learning. Surprisingly accurate at detecting "I should be studying" energy.

Explore →
02

ArXiv Digest Bot

A Telegram bot that summarizes any ArXiv paper into a structured 5-point thread. Because reading 40 pages daily is not a sustainable research strategy.

Try it →
03

Realtor Data ML

Predictive modelling on real estate data — property price estimation, neighbourhood clustering, and investment signal detection using ensemble methods.

GitHub →
04

DeepLearning MIT Projects

Projects completed in MIT 6.S191 in-class sessions covering sequence models, music generation, facial recognition, and reinforcement learning game agents.

GitHub →

Writing

Medium
stories.

@sayan.patra on Medium

I write about data science, machine learning research, mathematical intuition behind ML models, and experiences as a graduate researcher. Stories coming soon.

No stories published yet — follow to stay updated when the first article drops.

Experience

Where I've
done the work.

The George Washington University
Current · Washington D.C.
Graduate Instructional Assistant — Machine Learning I
  • Supported graduate coursework in Machine Learning I alongside faculty
  • Assisted in grading, developing learning materials, and clarifying key concepts in shallow and deep learning
  • Mentored students through complex model selection, evaluation, and research methodology
Amazon
Prior · EMEA
Associate — Business Strategy & Analytics
  • Analyzed eCommerce transaction patterns to improve business strategy and concessions
  • Focused on EMEA market client relations and data-driven operational improvements
  • Delivered actionable insights from transactional data to senior stakeholders
ICICI Securities
Prior · India
Authorized Person — Equity & Options
  • Managed equity portfolios worth millions in asset value
  • Provided advisory support for retail investment strategies and portfolio rebalancing
  • Experienced junior equity and options trader with derivatives exposure

GitHub

679 contributions
& counting.

Open to collaboration — especially on Diffusion Model Enhancements and MDE.

SP

Sayanpatraa

github.com/Sayanpatraa  ·  Washington DC  ·  he/him

679
Contributions
15
Repositories
8
Following
0
Followers
CANISNET
Diffusion Models + ResNet50. Graduate Research.
PythonPublic1 star
data-viz-project
Research on Visualisation. Graduate level. Tableau. 51 commits Feb 2026.
Jupyter NotebookPrivate
neural-code-robotics
Research on neural architectures for robotics control. 17 recent commits.
PythonPrivate
US-President-Speech-Analysis
NLP · Political rhetoric across presidential speeches. 1 star.
PythonPublic
AGQTS
Autonomous Trading Agent — RL meets quantitative finance.
PythonPrivate
GPAC
Package for Generalised Partial Autocorrelation.
Private
Research

Diffusion Models & MDE

Actively seeking collaborators. Contact me directly.

Open Source

Tools & Libraries

Building tools that help the ML community. PRs welcome.

Writing

Co-authorship

Open to co-writing papers on ML theory, NLP, TDA, or causal inference.

Teaching

Mentorship

GWU Graduate IA. Happy to help early-career data scientists.

Beyond Work

What keeps
me human.

Reading
Sci-fi & philosophy of mind
Chess
vs humans & my own RL agents
Music
Curation & generative audio
Travel
Collecting global perspectives
Puzzles
Math olympiad & logic

References

What others
say.

Academic and professional references available on request. Update the names below with your actual references.

Prof. [Faculty Name]

Faculty · The George Washington University — Data Science

"Sayan brings rare mathematical depth to applied ML problems. His ability to connect theoretical frameworks like TDA and SDEs to practical implementations makes him a standout research collaborator."

The George Washington University · Washington D.C.

Prof. [ML Instructor]

Instructor · Machine Learning I — GWU

"As a Graduate IA, Sayan consistently helped students bridge the gap between theory and implementation. His clarity on gradient descent and model evaluation was the best I've seen from a student assistant."

The George Washington University · Machine Learning I

[Manager Name]

Senior Manager · Amazon EMEA

"Sayan's analytical approach to eCommerce data stood out immediately. He consistently translated complex transaction patterns into actionable business strategy that the team actually implemented."

Amazon · EMEA Market Division

[Senior Name]

Portfolio Manager · ICICI Securities Private Limited

"A meticulous, data-driven mind in equity analysis. Sayan's ability to synthesize market signals with portfolio risk models — well beyond his years of experience — impressed our entire advisory team."

ICICI Securities Private Limited · India

Contact

Let's
talk.

sayan.patra@gwmail.gwu.edu
LinkedIn GitHub ORCID Kaggle Medium

"Turning data into decisions, and decisions into meaningful change."

EN  ·  中文  ·  ES  ·  FR  ·  DE