javisantana.com
Extracto
Today there are 40 days left to my 40th birdthday. I’ve been working with data for 20+ years now and I feel like trying to summarize what I learned in a few points.
Resumen
Resumen Principal
El contenido presenta una reflexión personal y profesional del autor en Javisantana.com con motivo del próximo 40º aniversario, a tan solo 40 días de distancia. Con más de 20 años de experiencia trabajando en el ámbito de los datos, el autor se propone sintetizar conocimientos clave adquiridos a lo largo de su carrera. Este tipo de publicación combina introspección personal con valor profesional, ofreciendo una oportunidad para compartir aprendizajes acumulados en una disciplina tan crítica como es la ciencia de datos. La cercanía al hito de los 40 años sirve como marco temporal para una evaluación significativa de su trayectoria, lo que sugiere una intención tanto de balance personal como de legado profesional. El enfoque en la síntesis de conocimientos indica una madurez técnica y una disposición a compartir perspectivas valiosas con la comunidad, lo cual puede resultar especialmente relevante para profesionales emergentes o interesados en el análisis de datos. Este tipo de contenido refuerza la importancia del autoanálisis estructurado como herramienta de crecimiento y transmisión de conocimiento.
Elementos Clave
- Experiencia de más de 20 años en datos: El autor destaca una trayectoria extensa en el campo de los datos, lo que le otorga una perspectiva consolidada sobre la evolución y las mejores prácticas del sector.
- Próximo 40º aniversario como marco temporal: El hito personal de los 40 años actúa como catalizador para la reflexión y síntesis de conocimientos, añadiendo un componente emocional y humano al contenido técnico.
- Intención de resumir aprendizajes clave: La voluntad de condensar décadas de experiencia en puntos destacados sugiere una intención pedagógica y de valor compartido, útil para otros profesionales del área.
- Plataforma personal como medio de difusión: El uso de su propio dominio (javisanana.com) refuerza la autenticidad del mensaje y posiciona al autor como una figura relevante en su campo, con voz propia y autoridad establecida.
Análisis e Implicaciones
Este tipo de contenido refuerza la tendencia de los profesionales senior a compartir conocimiento acumulado, lo cual enriquece el ecosistema tecnológico y fomenta el aprendizaje continuo. Además, el enfoque en la síntesis en lugar de la exposición exhaustiva hace que el mensaje sea más accesible y aplicable para audiencias diversas. También puede servir como inspiración para otros profesionales a realizar ejercicios similares de reflexión y documentación de su propia trayectoria.
Contexto Adicional
La combinación de un hito personal con una reflexión profesional es una estrategia efectiva para generar contenido auténtico y con valor duradero. Este enfoque no solo posiciona al autor como experto, sino también como una figura humana y accesible dentro de su disciplina.
Contenido
≗ 40 Things I Learned About Data — @javisantana
Today there are 40 days left to my 40th birdthday. I’ve been working with data for 20+ years now and I feel like trying to summarize what I learned in a few points.
I’ll share one thing every day until I turn 40.
1. It’s hard to capture reality with data
Trying to recreate an accurate version of the reality, no matter what that is or how simple looks like, is hard.
Other way to see it: modeling reality always get complex. There are always small nuances, special conditions, things that changed, edge cases and, of course, errors (which sometimes became features)
The only models I found easy to work with and understand are the ones that reflect computer things.
2. There is no “the best data format”
We format the data to move it around. It could be hundreds of kilometers or a few nanometers but we always need to encode information somehow. I never found “El dorado” of data formats.
Text formats are easy to read by an human but harder and slower to parse.
Binary formats are fast to parse but hard to debug.
XML is a good container but it’s to verbose.
JSON is easy but does not have basic data types.
Serializable formats are not good to keep them in memory but specific formats for in memory operations are not binary compatible with other laguages.
The most important thing I learned is: you need find the right balance between speed, flexibility, compatibility and human-computer interface.
3. Good data models make good products
When the data model is not well designed, everything that goes after feels wrong. You feel like you are doing hacks and tweaks all the time.
When the data model is the right one everything flows, it’s easy to explain, when you make a change it just fits like a good Tetris play. Only time can tell if the data model was the right one. If after some years you still use the same data model (maybe not the same database or same code) you did it right. It’s not that different to cars, buildings, companies…
Designing a good data model takes time, prototypes and a well understanding of the reality your are modelling (see point 1 for more info)
4. The second most important rule of working with data: the faster data is the one you don’t read
As simple as it sounds, most people forget about using one of the most important database features: indices. Well, you also need to think about what’s the actual data you need, a lot of apps are full of select * from table.
The problem is, as your system grows, so do the amount and complexity of queries. Know what data you need becomes harder. To avoid that you need… yes, data about how you query your data.
5. When in doubt, use Postgres as your database.
It’s quite typical when you start a project to decide what DBMS to use. Elastic, Mongo, some key/value like redis, funny things like Neo4J. If you have an use case that clearly fits with a database, fine, otherwise, use postgres or anything relational. Of course, there will be someone that says “but it does not scale”. Anyone who has worked with a system at scale knows there is no storage system that scales well (except it’s simple as hell and is eventually consistent, but not even that)
I love Postgres because of many things: solid, battle tested, support transactions (will write about them), feature complete, fast, it’s not owned by a VC backed company, guided by the comunity, calm and steady progress, great tooling, cloud services providing infra, companies with expertise…
When you pick something funny, you end up developing half of the features a solid RDBMS system provide but just worse.
I decided to use redis as the storage for Tinybird and it’s working great but as the project evolves you miss many of the builtin features postgres provides. Probably a mistake.
6. Behind every null value there is a story
When you join a company, just ask about it, you’ll learn a lot
7. When I try to understand data I always end up using an histogram
When visualizing data you have to pick the right visualization type but before that you need to understand the data.
I start using an avg, then avg plus stddev, then min-max and finally I go with an histogram.
It captures min, max, avg and most important, the data distribution.
8. Analytics it’s a product not a department
When you have people asking for metrics and people extracting then from data. For the same metric you’ll have as many definitions for a metric as people you have in the company.
Reporting is something that requires the same thing as a digital product needs: owners, maintenance, clear definitions, improvements and you know, give people what they want in a way is useful for everybody in the company.
Many companies don’t consider analytics as a first class citizen and end up spending more to have less quality.