DiscoverData Science Tech Brief By HackerNoon
Data Science Tech Brief By HackerNoon
Claim Ownership

Data Science Tech Brief By HackerNoon

Author: HackerNoon

Subscribed: 23Played: 82
Share

Description

Learn the latest data science updates in the tech world.
142 Episodes
Reverse
This story was originally published on HackerNoon at: https://hackernoon.com/from-data-fragmentation-to-billion-dollar-insights-the-vision-of-manish-ravindra-sharath. Manish Ravindra Sharath unified fragmented enterprise data using PySpark & cloud-native systems,boosting efficiency 99% and driving multimillion-dollar growth. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #enterprise-data-engineering, #manish-ravindra-sharath, #pyspark-data-pipeline, #cloud-data-architecture, #data-modernization-strategy, #hybrid-data-infrastructure, #enterprise-analytics, #good-company, and more. This story was written by: @sanya_kapoor. Learn more about this writer by checking @sanya_kapoor's about page, and for more stories, please visit hackernoon.com. Manish Ravindra Sharath transformed enterprise decision-making by architecting a unified PySpark-powered data pipeline that cut reporting time from 30+ hours to 30 minutes. His system achieved 99% efficiency, 40% cost reduction, and 30% faster deal closures—turning fragmented data into billion-dollar insights driving global business performance.
This story was originally published on HackerNoon at: https://hackernoon.com/building-a-layered-defense-against-web-scraping. Discover how a three-layer data-protection model blends AI, risk-based gating, and legal context to stop web scraping while preserving user trust. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #web-scraping, #data-protection, #ai-security, #product-strategy, #web-scraping-protection, #bot-mitigation, #risk-based-gating, #data-security-strategy, and more. This story was written by: @areejit1. Learn more about this writer by checking @areejit1's about page, and for more stories, please visit hackernoon.com. The web-scraping industry is no longer niche. Valued at USD 1.03 billion in 2025, it is projected to nearly double by 2030. Traditional defenses rate limiting, CAPTCHAs, IP bans are brittle against modern toolkits. A layered defense acknowledges this tension.
This story was originally published on HackerNoon at: https://hackernoon.com/cosmo-the-graph-visualization-tool-built-for-your-terminal. Cosmo is a terminal-based interactive graph visualizer that automatically layouts and displays complex data structures for quick exploration. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #visualization, #terminal, #cli, #graphs, #tui, #cosmo, #complex-data-structures, #gui-visualizer, and more. This story was written by: @hacker227143. Learn more about this writer by checking @hacker227143's about page, and for more stories, please visit hackernoon.com. Cosmo is a fast, interactive graph visualizer that makes graphs and trees easy to understand, beautifully arranged, and fully explorable without ever leaving your command line. Pass your data structures directly from code or file and see them come to life.
This story was originally published on HackerNoon at: https://hackernoon.com/how-businesses-are-turning-space-data-into-a-tool-for-risk-resilience-and-sustainability. Satellites are reshaping insurance, supply chains, and sustainability—here’s how space data became core to global business strategy. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #business-intelligence, #space-economy, #satellite-data, #sustainability-reporting, #supply-chain-analytics, #geospatial-intelligence, #space-technology, #earth-observation, and more. This story was written by: @150sec. Learn more about this writer by checking @150sec's about page, and for more stories, please visit hackernoon.com. The global space economy is evolving from exploration to infrastructure. Businesses across insurance, sustainability, and supply chains now rely on satellite data for real-time insights that help manage risk, track biodiversity, forecast disruptions, and meet new reporting standards. As costs drop and access expands, space data has become an essential layer of corporate intelligence—turning orbit into opportunity.
This story was originally published on HackerNoon at: https://hackernoon.com/how-data-innovation-changed-a-states-infrastructure-engine. Deepak Chanda modernized Massachusetts’ infrastructure systems through data-driven process innovation—turning inefficiency into lasting operational reform. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-innovation-in-government, #infrastructure-analytics, #data-transformation, #process-automation, #massachusetts-transportation, #sql-data-pipeline-optimization, #real-time-anomaly-detection, #good-company, and more. This story was written by: @jonstojanjournalist. Learn more about this writer by checking @jonstojanjournalist's about page, and for more stories, please visit hackernoon.com. Amid bureaucratic stagnation in Massachusetts’ public works, Senior Data Analyst Deepak Chanda led a quiet revolution. By digitizing blueprint reviews and adding a simple SQL field to track project sign-offs, he cut delays and saved taxpayer dollars. His philosophy—good data should shape the world, not just describe it—continues to drive progress across healthcare and insurance.
This story was originally published on HackerNoon at: https://hackernoon.com/how-to-optimize-your-marketing-budget-using-just-three-letters-mmm. Marketing Mix Modeling is a statistical analysis method used in marketing to determine the optimal allocation of resources. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-science, #marketing-analytics, #machine-learning, #marketing, #marketing-budget, #marketing-mix-modeling, #media-mix-modelling, #adstock-and-saturation, and more. This story was written by: @radiokocmoc_l45iej08. Learn more about this writer by checking @radiokocmoc_l45iej08's about page, and for more stories, please visit hackernoon.com. Marketing Mix Modeling is a statistical analysis method used in marketing to determine the optimal allocation of resources. The goal of media mix modelling is to understand the impact of different marketing channels on the overall campaign effectiveness. Join me to discover how to optimise the marketing budget by implementing Robyn MMM.
This story was originally published on HackerNoon at: https://hackernoon.com/heres-how-sharechat-scaled-their-ml-feature-store-1000x-without-scaling-the-database. How ShareChat scaled its ML feature store to 1B features/sec on ScyllaDB, achieving 1000X performance without scaling the database. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #sharechat-ml-feature-store, #scylladb-scaling-case-study, #ml-feature-store-optimization, #sharechat-moj, #low-latency-ml-infrastructure, #scylladb-database-optimization, #p99-conf-sharechat-talk, #good-company, and more. This story was written by: @scylladb. Learn more about this writer by checking @scylladb's about page, and for more stories, please visit hackernoon.com. ShareChat scaled its ML feature store from failure at 1M features/sec to 1B features/sec using ScyllaDB optimizations, caching hacks, and relentless tuning. By rethinking schemas, tiling, and caching strategies, engineers avoided scaling the database, cut latency, and boosted cache hit rates—proving performance engineering beats brute-force scaling.
This story was originally published on HackerNoon at: https://hackernoon.com/why-you-shouldnt-judge-by-pnl-alone. PnL can lie. This hands-on guide shows traders how hypothesis testing separate luck from edge, with a Python example and tips on how not to fool yourself. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #quantitative-research, #trading, #algorithmic-trading, #pnl, #udge-pnl, #profit-and-loss, #judge-profit-and-loss, #hackernoon-top-story, and more. This story was written by: @ruslan4ezzz. Learn more about this writer by checking @ruslan4ezzz's about page, and for more stories, please visit hackernoon.com. I’ve spent years building and evaluating systematic strategies across highly adversarial markets. When you iterate on a trading system, PnL is the goal but a terrible day-to-day signal. It’s too noisy, too path-dependent, and too easy to cherry-pick. A simple framework—form a hypothesis, measure a test statistic, translate it into a probability under a “no-effect” world (the p-value)—helps you avoid false wins, iterate faster, and ship changes that actually stick. Below I’ll show a concrete example where two strategies look very different in cumulative PnL charts, yet standard tests say there’s no meaningful difference in their average per-trade outcome. I’ll also demystify the t-test in plain language: difference of means, scaled by uncertainty.
This story was originally published on HackerNoon at: https://hackernoon.com/from-decentralized-to-unified-supcon-uses-seatunnel-to-build-an-efficient-data-collection-frame. SUPCON dumped siloed data tools for Apache SeaTunnel—now core sync tasks run 0-failure! Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #bigdata, #apacheseatunnel, #supcon, #data-sync, #high-availability, #data-engineering, #cdc, #hackernoon-top-story, and more. This story was written by: @williamguo. Learn more about this writer by checking @williamguo's about page, and for more stories, please visit hackernoon.com. 99% lower failures, 100% consistency, 70% less O&M cost. Big thanks to @ApacheSeaTunnel!
This story was originally published on HackerNoon at: https://hackernoon.com/enterprise-data-pipeline-revolution-suresh-pallis-metadata-driven-automation-success. Suresh Palli revolutionized enterprise data pipelines with metadata-driven automation, cutting dev time 40% and boosting scalability 5x. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #suresh-palli, #metadata-driven-automation, #enterprise-data-pipelines, #data-pipeline-automation, #metadata-governance, #enterprise-data-architecture, #scalable-data-processing, #good-company, and more. This story was written by: @sanya_kapoor. Learn more about this writer by checking @sanya_kapoor's about page, and for more stories, please visit hackernoon.com. Suresh Palli led a metadata-driven automation project that cut pipeline development time by 40% and scaled data processing 5x. His centralized metadata governance enabled dynamic adaptation, seamless orchestration, and cross-unit alignment. The success earned industry recognition, consulting opportunities, and set new benchmarks for enterprise data automation.
This story was originally published on HackerNoon at: https://hackernoon.com/unified-data-smarter-agentsis-your-architecture-future-proof. A hands-on guide to architecting unified, governed and AI-ready data platforms using open table formats, semantic layers and multicloud governance. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data, #big-data-analytics, #product, #ai, #etl, #azure, #aws, #data-engineering, and more. This story was written by: @@QueryAndConquer. Learn more about this writer by checking @@QueryAndConquer's about page, and for more stories, please visit hackernoon.com. A hands-on guide to architecting unified, governed and AI-ready data platforms using open table formats, semantic layers and multicloud governance.
This story was originally published on HackerNoon at: https://hackernoon.com/data-driven-decisions-at-scale-ab-testing-best-practices-for-engineering-and-data-science-teams. Ship features like scientists: randomize, measure, and learn fast. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-science, #big-data, #experimentation, #experimental-design, #product-development, #software-engineering, #machine-learning, #statistics, and more. This story was written by: @sayantan. Learn more about this writer by checking @sayantan's about page, and for more stories, please visit hackernoon.com. Ship features like scientists: randomize, measure, and learn fast. Good A/B tests aren’t just stats — they’re the engine driving smarter products.
This story was originally published on HackerNoon at: https://hackernoon.com/why-you-should-almost-always-choose-sync-gunicorn-over-workers-ze9c32wj. Anyone working on a WSGI web application frameworks like Flask would know that as a best practice it is very important to use a WSGI HTTP Server like Gunicorn to deploy the app outside your development servers. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #python-programming, #gevent, #gunicorn, #python-web-development, #flask, #flask-deployment, #latest-tech-stories, #what-are-gunicorn-worker-types, and more. This story was written by: @shamik-ray. Learn more about this writer by checking @shamik-ray's about page, and for more stories, please visit hackernoon.com. Gunicorn is a widely popular WSGI Server and its popularity is because it is lightweight, fast, simple yet can support most of the requirements you would have to host an app on production. The default worker type is Sync and I will be arguing for it. Async workers like Gevent create new greenlets (lightweight pseudo threads) Every time a new request comes they are handled by greenlets spawned by the worker threads. At the same time, the resources needed to serve the requests will be less.
This story was originally published on HackerNoon at: https://hackernoon.com/beyond-the-ten-blue-links-how-generative-ai-rewires-our-brains-for-search. The age of searching is ending. A deep dive into the psychology of AI search, how it centralizes truth & why becoming a trusted source is key to brand survival Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #user-behavior-analytics, #ai-integrated-search, #digital-marketing, #seo, #geo, #future-tech, #psychology, #product-management, and more. This story was written by: @a_belova. Learn more about this writer by checking @a_belova's about page, and for more stories, please visit hackernoon.com. Generative AI isn't just a new feature in search; it's a fundamental psychological shift. By providing direct, synthesized answers, it caters to our brain's deep-seated desire to reduce cognitive load and trust authoritative narratives. This "great untraining" is rendering the classic marketing playbook obsolete. For businesses, developers, and marketers, the battle is no longer for clicks on blue links, but for becoming a trusted, citable source inside the AI's "brain." The age of persuasion is ending; the age of becoming a machine-readable source of truth has begun.
This story was originally published on HackerNoon at: https://hackernoon.com/need-web-data-here-are-the-3-methods-everyones-using. Discover the three best, most modern methods to access and harness web data for your projects. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #web-data, #ai, #web-scraping, #sdk, #api, #mcp, #python, #good-company, and more. This story was written by: @brightdata. Learn more about this writer by checking @brightdata's about page, and for more stories, please visit hackernoon.com. Need web data? APIs, SDKs, and MCP provide flexible, scalable, and automated ways to access, scrape, and integrate web data for scripts, backends, web apps, pipelines, or AI agents.
This story was originally published on HackerNoon at: https://hackernoon.com/applying-transitive-closure-to-sort-products-into-categories-considering-nesting-and-overlaps. A guide to efficiently managing nested categories and overlapping products, ensuring fast retrieval without duplicates in e-commerce systems. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-management, #software-architecture, #product-categorization, #graph-theory, #microservices, #optimize-data-storage, #transitive-closure, #advanced-indexing, and more. This story was written by: @egorgrushin. Learn more about this writer by checking @egorgrushin's about page, and for more stories, please visit hackernoon.com. Handling product categorization in e-commerce can be quite the task, especially when nested categories and overlapping products make efficient retrieval without duplicates a real challenge. The method I found has a major impact on performance: setting up proper data storage, separating data for reading and modification, using relational and NoSQL databases, and applying graph theory to handle complex category nesting. The step-by-step guide shows how to sort out efficient data storage, use transitive closure for advanced indexing, build a service to maintain and update the graph, and take advantage of database indexing to avoid unnecessary sorting in RAM.
This story was originally published on HackerNoon at: https://hackernoon.com/98percent-of-data-strategies-fail-lets-fix-it. Learn how to fix failing data strategies using the '5 W's' framework. Transform your approach to KPIs and drive real business value with actionable insights. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-strategy, #kpi-management, #business-intelligence, #data-driven-decisions, #executive-leadership, #analytics-roi, #data-roi, #data-governance, and more. This story was written by: @liorb. Learn more about this writer by checking @liorb's about page, and for more stories, please visit hackernoon.com. Even the most well-equipped organizations can find themselves serving up a mess instead of actionable insights. Here's a step-by-step process of fixing your data strategy, ensuring that you're serving up actionable data instead of a recipe for disaster. In the following sections, we'll dive into the common data strategy nightmares.
This story was originally published on HackerNoon at: https://hackernoon.com/how-to-measure-the-results-of-in-app-events-when-onelinks-dont-work. How To Measure The Results Of In-App Events When Onelinks Don’t Work Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #analytics, #onelink, #inapp-events, #marketing, #app-store, #mobile-apps, #digital-marketing, #good-company, and more. This story was written by: @socialdiscoverygroup. Learn more about this writer by checking @socialdiscoverygroup's about page, and for more stories, please visit hackernoon.com. Many app developers and marketing managers face the challenge of accurately measuring the impact of In-App Events (IAEs) on the App Store. While IAEs have proven effective for re-engaging users, attracting new downloads, and increasing revenue, traditional tracking methods like OneLink don’t actually include IAEs. Major mobile attribution platforms confirm that currently there is no way to track IAEs properly. At Social Discovery Group, our portfolio of 60+ dating and entertainment brands is supported by a team of over 100 marketers dedicated to app growth and development. We’re used to measuring all our marketing efforts in terms of financial value. Eventually, we’ve managed to develop our own composite way to evaluate IAEs, and are going to share it with you.
This story was originally published on HackerNoon at: https://hackernoon.com/how-ai-powered-data-mapping-is-democratizing-data-management. Learn how AI-powered data mapping is transforming data management, making it more accessible and efficient for everyone. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-mapping, #data-management, #big-data, #ai-powered, #ai-powered-data-management, #democratizing-data-management, #data-science, #ai-powered-data-mapping, and more. This story was written by: @kristenburke. Learn more about this writer by checking @kristenburke's about page, and for more stories, please visit hackernoon.com. AI is revolutionizing data mapping by automating and simplifying the process, making data management more efficient and accessible for businesses and non-technical users alike.
This story was originally published on HackerNoon at: https://hackernoon.com/data-engineering-whats-the-value-of-api-security-in-the-generative-ai-era. Discover the importance of API security in the age of Generative AI. Learn how robust API protection ensures data integrity. Check more stories related to data-science at: https://hackernoon.com/c/data-science. You can also check exclusive content about #data-engineering, #generative-ai, #ai-regulation, #api-security, #data-security, #data-privacy, #threat-detection, #cybersecurity-best-practices, and more. This story was written by: @karthikrajashekaran. Learn more about this writer by checking @karthikrajashekaran's about page, and for more stories, please visit hackernoon.com. API security is crucial in the era of Generative AI, ensuring data integrity, protecting user privacy, and enabling secure and efficient AI integration. Robust API protection helps prevent unauthorized access, data breaches, and potential misuse of AI capabilities.
loading
Comments