Professional Info

Summary

Lead/Senior Software Engineer with experience in building large scale distributed data infrastructure (Petabyte scale) right from event level data ingestion to actionable insights.

Engineer with proven academic and professional credentials, and bright background, looking to make an impact with strong technical, analytical and professional skills. Intrigued by new technology and its adoption, would like to leave a long lasting footprint on the technological and entrepreneurial roadmap.

Skills and Tech : Apache Spark, Flink, Presto, Hadoop MapReduce, Java, Python, Hbase, Hive, Pig, MySQL, Tableau, Kafka, Solr

Currently Learning : NLP with Deep Learning (Stanford CS 224), Production using Transformer Models

Specialties: Strong Analytic and Mathematical Skills. Love for Computer Science Fundamentals, Emphasis on clean and scalable code, Fast Grasping ability, Strong Prototyping skills, Idea Generation, Leadership Ability and Experience

Experience

Data Infra/Data Engineering Apple - (Apple AI/ML Org - Data and ML Innovation)
Nov 2020 - Present
Cupertino, CA

Data Infra/Engineering for Apple AI/ML Data - Apple AI/ML Organization

Member of Technical Staff Rubrik
May 2019 - Nov 2020
Palo Alto, CA

Working on massive data processing pipelines in Spark/MapReduce for telemetry callhome data from Rubrik deployments. Launched a new, Generic Ingest Data Pipeline to ingest any JSON data to Snowflake. This serves the company wide use case to enable any sort of analytics, data visualization for any data generated by any teams.
Providing Data Warehousing solutions using Snowflake at Petabyte scale with ownership of all Product Metrics Datasets, powering all Product Management, Support, Sales use cases.
Created tooling for Deployment, Orchestration, Stability, Monitoring and Observability (logging and job metrics) for these Spark Data Pipelines.
Led the AWS Cost Optimization efforts to optimize the utilization of EMR and EC2 resources and cut team AWS spend by 40 percent.
Led the cloud services availability monitoring and Observability project. Driving the SLA definition and enforcement process for all cloud services.
Skills: Python, Scala, Spark, AWS (EMR, EC2/ECS), Snowflake, ElasticSearch

Senior Software Engineer LinkedIn
Public Company; 10,001+ employees; Part of msft;

Dec 2017 - May 2019 Sunnyvale CA

Worked on massive data flow architectures for LinkedIn Marketing Solutions (Linkedin Ads)
Creating scalable Data Warehousing solutions based on Spark and Presto.
Mentored intern project to develop tooling to automate tasks around creation of dataflow archetypes required for data pipelines in the LinkedIn Stack
Working on Autonomous Data Quality/Consistency Frameworks
Part of Proposal Review Committee for Microsoft MLADS(Machine Learning and Data Science) Conference - Fall 18
Skills: Java, Hadoop, Hive, Pig, Spark, Azkaban, Apache Gobblin, Apache Calcite

Engineering Manager (previously Senior Software Engineer) Drawbridge
Startup Company, 100+ employees

August 2016 - December 2017 San Mateo, CA

Led and Managed the backend reporting and analytics team of Drawbridge's Ad Platform products.

Also let the ads targeting team for brief time as a joint responsibility.

Both hands on development of the data reporting platform to develop near real time capabilities and also managing a team to achieve state of the art reporting capabilities.

Worked on reporting pipelines/Data pipelines that process more than 100 Billion events per day.
Product facing enhancements for the data pipeline.

Drove and engineered a Presto based Data Warehouse, that enabled Business Analysts, BI engineers and Data Scientists to query the datasets on HDFS directly as against a serving layer columnar relational DB thereby increasing the amount people can query and use in their models.

Analytics dashboards based on Data Warehousing

Backend Technologies : Hadoop, Hive, Pig, Spark, Hbase, Couchbase, Oozie, Columnar DBs, etc.

Key Project Contributions :

1. Conversion Tracking/Attribution https://drawbridge.com/blog/p/the-drawbridge-approach-to-attribution
2. Cross Device Insights with Real time Attribution Metrics https://drawbridge.com/c/insights&attribution

Software Engineer @WalmartLabs
Public Company; 10,001+ employees; wmt;

December 2013 - August 2016 Sunnyvale, CA

@Labs - WMX (Walmart Exchange)
- Data Mining and Analytics for Online Ads
- Ad Impression Ingestion and Processing : As an early stage member on the team implemented and fully owned (dev, test, and product) the Ingestion pipeline processing upto 1B impressions per day from various sources.
- Designed and implemented terabyte scale analytic pipeline
- Worked in a small team to build a lambda-architecture realtime analytics and OLAP platform from the ground up
- Worked with Hadoop/Hive and Apache Spark to process ad impressions data.
- Pioneered use of Apache Spark on the team for improved pipeline performance and maintainability
Real time impression ingestion with Apache Storm (experimental).
Real time analytics.
- Built a querying system using Apache Solr to build analytics item sets using Clustering of Short Strings in Large Datasets
Languages used: Java, Python

Tech Yahoo! - Software Dev Engineer Yahoo!
Public Company; 10,001+ employees; yhoo; Internet industry

July 2012 – December 2013 Sunnyvale, CA

Media Foundation - Content

Development related to Content Enrichment for all ingested content in the Content Agility pipeline, and served up on all of Yahoo Media properties (Yahoo News, Yahoo Sports, Yahoo Finance, etc.)

End to end Design and Development related to selection of most relevant Canonical URLs and contextual clickthrough URLs for content. These power all Yahoo hosted content urls on all content streams including the Yahoo homepage.

Sole developer for Canonical URL generation for all content in the Enrichment Workflow to achieve Search Engine Optimization for each content (story, photo, video)

Independently Developed Content Enrichment Libraries for Categorization, Content Quality marking, Geo Classification, Named Entity Recognition, etc. on content (stories, photos, videos) through Feed Ingestion and Editorial Ingestion Workflows using Yahoo’s Contextual Analysis Platform.

Enabled the Ads Categorization to be tagged for each content.

Development within Category Taxonomy Management Workflow for Yahoo Content Taxonomy.

Setting up of Continuous Integration Environment for Content Track’s Libraries.

Ramping up towards development related to Cassandra and HBase. (Performance Testing of Hector and Astyanax client and different data storage schemas for low latency).

Graduate Teaching Assistant Carnegie Mellon University
Educational Institution; 5001-10,000 employees; Higher Education industry

August 2011 – September 2012 (1 year 2 months)

Fundamentals of Embedded Systems

Software Engineer Intern Samsung Telecommunications America
Public Company; 1001-5000 employees; Telecommunications industry

June 2011 – August 2011 (3 months)

Development of Browser Performance Bench-marking Suite for Android Devices

Software Engineer Avaya
Privately Held; 10,001+ employees; Telecommunications industry

July 2009 – July 2010 (1 year 1 month)

Worked for Avaya's R&D (TS&D) team which develop tools & components for easy maintenance & fault-resolution of Avaya's suite of products. Worked on the Expert Systems and HealthCheck products.

University Representative Vishwakarma Institute of Technology, Pune
Educational Institution; 201-500 employees; Higher Education industry

August 2008 – May 2009 (10 months)

Held the apex position in the College Student Council. Student coordinator for the University led implementation of various schemes in the college. Also responsible for the various social, cultural, technical and sports related activities, events and festivals in the college.

aditya_resume.pdf
File Size:	109 kb
File Type:	pdf

Download File