When I interview a Java developer, if I see Spring Boot or Spring on the candidate’s resume, I may start with a simple question: “What is the default scope for a Spring bean”? Most people would get it right. I would then follow with a tricky question: “Does Spring make sure a Singleton bean thread-safe?” or “Does developer need to do anything to make sure a Singleton bean thread-safe?”.
When I say “tricky”, not because it’s tricky technically, but because half interviewees have no idea. The other half who correctly answered don’t always demonstrate solid understanding of Singleton and thread-safety. It’s okay to guess at an interview I guess.
Spring Boot is one of those popular frameworks for Java developers. Like most other Java frameworks, it provides proven reusable libraries and increases productivity. Some developers can probably make a living by simply being good at it.
However, because it encapsulates the interpretation of various Java Specifications, hides the complexity of design and implementation, often the framework itself imposes a serious impediment for developers to understand the underlining fundamentals.
Many Spring Boot developers don’t know Spring Boot is just a framework on top of another popular framework Spring Framework, which was initially a framework for Java Servlet applications. Most freshly-minted Spring Boot developers never heard of Servlet, not to mention web.xml. They only know their Spring Boot applications, “just run”. They never know why and how it runs.
Because of that, they never think of what the underlining Servlet Container is, what the default configurations (like Max Concurrent Requests) are, and how to fine tune those configurations. Imagine asking them to write a Java Web Application without Spring Boot?
Frameworks tend to wrap a lot of default features and behaviors under the hood, just to name a few: default Encryption Algorithm, default Socket Timeout, default Retry Strategy.
In the past, Frameworks might have configuration property for each “feature”, but this has changed in the recent years. Nowadays, Framework authors tend to favor “Convention over configuration”. Old configuration files are replaced by annotations with “sensible defaults”. Moreover, many of the features and behaviors are “discovered” automatically based on your running environments, like system properties, environment variables and what is in the class-path.
Several years back, I led a framework team. We built a Framework as the foundation for a slew of web applications that support multi-million $ business. We worked very hard to support all major features by default, and still allow each application to extend and override each feature by configuration and automatic discovery. I learned first hand, it’s even harder for application developers to fully understand how each feature worked and how to extend or override them.
Naturally, due to the lack of visibility and transparency of frameworks, people makes a lot of assumptions about frameworks, such as Singleton bean thread-safety. Some of the assumptions will definitely haunt the team down the road if the technical leads on the team didn’t review the design and code carefully.
Overtime, frameworks will evolve or die. If you ever worked with Struts 1.x framework, and if you didn’t understand Java Servlet, you would have a difficult time to migrate your applications to Struts 2.x or Spring.
Frameworks are your tools, not your crutches. If you don’t think out of the box of Spring Boot, you can’t professionally outgrow Spring Boot. Simple. Period.
That is true to other frameworks too.
Frameworks can help you get started quickly, but understanding the underlining principles will help you in the long run.
To build resilient software applications, when architecting the integration points with downstream services, we shall consider all error scenarios. Robust error handling is essential. Retrying remote API calls is an important part.
A retry can be done either synchronously or asynchronously. If the clients require a response of the execution status, not just the acknowledgement of the receipt of the request, it’s appropriate to implement synchronous retries with limits on total retry numbers and time. On the other hand, if the clients don’t care about the actual execution status, or have ways to receive responses asynchronously, it is almost always a good idea to adopt asynchronous retry architecture. Of course, before putting a request into asynchronous retry process, we can always implement synchronous retry first whenever it makes sense.
In this article, we will focus on the asynchronous retry architecture.
2. Queuing for Asynchronous Retry Architecture
Queuing mechanism is the center of the Asynchronous Retry Architecture.
The originating service constructs a Retry Message that includes the original request info, the destination URL and other metadata, puts the Retry Message into an Async Retry Queue based on the chosen queuing system. A trigger could be configured in the queue to trigger a processor. Or, an Async Retry Processor can pull the queuing system for new messages. The Async Retry Processor can then utilize the message received from the Async Retry Queue and make another call to the destination downstream service.
A Dead Letter Queue is used to hold Retry Messages for certain period time after a (configurable) maximum number of retries have been reached.
The below figure is a very high level workflow and message flow:
3. Asynchronous Retry Architecture Diagram
Asynchronous Retry Architecture
In the above diagram, Service A is the calling service and Service B is the destination downstream service. If the initial call in Step 1 fails, Service A will put a Retry Message into the Async Retry Queue.
Depends on what Queuing System is chosen, either a trigger can be configured in the Async Retry Queue to trigger the Processor (3.1), or a Processor can be configured to poll the Async Retry Queue (3.2). If AWS SQS is chosen as the queuing system, a Lambda function can be configured to trigger the processor when a new message arrives.
Once the Async Retry Processor receives the Retry Message, it can use the request info in the message to reconstruct the request, and send the request to the destination URL that is also included in the retry message.
A Retry Message will be moved to the Dead Letter Queue if the maximum retry attempts have been reached as detected by the Async Retry Processor or the Async Retry Queue.
4.Generic Data Model for Retry Message
A Retry Message can have a generic data model as below:
<span id="60ab" class="hc wp tx so wq b gg wr ws x wt" data-selectable-paragraph="">{
"request":{
"url":"http[s]://$host:$port/$destitnation_endpoint_including_query_parameters",
"method":"GET|POST|PUT|DELETE|PATCH",
"payload":"$json_string",
"headers":[$headers_to_pass_to_the_target_service]
},
"receivedCount": "$number",
"async-retry-queue":"$async-retry-queue[optional]",
"dead-letter-queue":"$dead-letter-queue[optional]"
}</span>
With this generic data model design for Retry Message, an Async Retry Processor can be designed to process any Retry Message constructed by any originating services (Service A) to any destination services (Service B).
5. Retryable Errors
Only non-functional errors are retryable. Below are some examples:
a. No response at all;
b. Temporary Network issue, usually 5xx (http status) errors;
c. Request timeout: http status 408 errors;
d. Conflict: http status 409 errors;
e. Too many request: http status 429
f. If there is a Retry-After header in the http response of the downstream service;
h. Unauthorized: http status 401 errors with expired token error code/message. These kind of errors usually require a new token. In this case, the Async Retry Processor is responsible for getting the proper token.
6. Conclusion
Asynchronous Retry Architecture can be used to handle all retryable errors when the client is not expecting the execution result in the response of the call. It is extremely useful if a function may need to be tried many times for a long period of time.
The number of maximum retry attempts, the async retry queue name/url, and the dead letter queue name/url can all be configurable. The configurable values can make architecture flexible for many different applications.
You and a partner go into business together and split the equity 50/50. You do all the work and your partner slacks off. He owns half your business- now what?
Slicing Pie outlines a process for calculating exactly the right number of shares each founder or employee in an early stage company deserves.
You will learn:
How to value the time and resources an individual brings to the company relative to the contributions of others
The right way to value intangible things like ideas and relationships
What to do when a founder leaves your company
How to handle equity when you have to fire someone
Important issues to discuss with your lawyer
Much more
Research shows that dynamic equity split models, like the one outlined in Slicing Pie, is the best way to avoid conflicts as the company grows.
The new and improved Version 2.3 contains updated information about legal issues, idea valuation, retrofitting and much more!
Often downplayed in the excitement of starting up a new business venture is one of the most important decisions entrepreneurs will face: should they go it alone, or bring in cofounders, hires, and investors to help build the business? More than just financial rewards are at stake. Friendships and relationships can suffer. Bad decisions at the inception of a promising venture lay the foundations for its eventual ruin. The Founder’s Dilemmas is the first book to examine the early decisions by entrepreneurs that can make or break a startup and its team. Drawing on a decade of research, Noam Wasserman reveals the common pitfalls founders face and how to avoid them. He looks at whether it is a good idea to cofound with friends or relatives, how and when to split the equity within the founding team, and how to recognize when a successful founder-CEO should exit or be fired. Wasserman explains how to anticipate, avoid, or recover from disastrous mistakes that can splinter a founding team, strip founders of control, and leave founders without a financial payoff for their hard work and innovative ideas. He highlights the need at each step to strike a careful balance between controlling the startup and attracting the best resources to grow it, and demonstrates why the easy short-term choice is often the most perilous in the long term. The Founder’s Dilemmas draws on the inside stories of founders like Evan Williams of Twitter and Tim Westergren of Pandora, while mining quantitative data on almost ten thousand founders. People problems are the leading cause of failure in startups. This book offers solutions.
As each new generation of entrepreneurs emerges, there is a renewed interest in how venture capital deals come together. Yet there is little reliable information focused on venture capital deals. Nobody understands this better than authors Brad Feld and Jason Mendelson. For more than twenty years, they’ve been involved in hundreds of venture capital financings, and now, with the Second Edition of Venture Deals, they continue to share their experiences in this field with you.
Engaging and informative, this reliable resource skillfully outlines the essential elements of the venture capital term sheet–from terms related to economics to terms related to control. It strives to give a balanced view of the particular terms along with the strategies to getting to a fair deal. In addition to examining the nuts and bolts of the term sheet, Venture Deals, Second Edition also introduces you to the various participants in the process and discusses how fundraising works.
Fully updated to reflect the intricacies of startups and entrepreneurship in today’s dynamic economic environment
Offers valuable insights into venture capital deal structure and strategies
Brings a level of transparency to a process that is rarely well understood
Whether you’re an experienced or aspiring entrepreneur, venture capitalist, or lawyer who partakes in these particular types of deals, you will benefit from the insights found throughout this new book.
模式识别=机器学习。两者的主要区别在于前者是从工业界发展起来的概念,后者则主要源自计算机学科。在著名的《Pattern Recognition And Machine Learning》这本书中,Christopher M. Bishop在开头是这样说的“模式识别源自工业界,而机器学习来自于计算机学科。不过,它们中的活动可以被视为同一个领域的两个方面,同时在过去的10年间,它们都有了长足的发展”。
Steve McConnell所著的《代码大全2》 就像是为软件开发者所编写的《烹饪的乐趣》。能够去阅读这本书,说明你很享受自己的工作,并且在认真地对待自己的工作。同时,你还想要不断的进步。在《代 码大全》中Steve写到,普通程序员每年阅读的技术书籍不到一本。仅仅是阅读这本书的行为,就可能已经把你和你90%的程序员同事们区别开来了。
程序员认为自己在设计产品可用性时能够代表“普通”用户作出某些决定,但是在现实世界中,他们完全不能代表用户。程序员是一群怪人,充其量能算是是 一种极端的用户——就好比“逻辑人(Homo Logicus)” vs. “现代智人(Homo Sapiens)”。除非你碰巧开发的是一款编译器,因为编译器的用户也是程序员。
这本书有一个隐含的观点,有的时候,无论你的设计有多好,就像由 Alan 担任顾问并在此书中用作案例的这两款软件:扫描仪软件以及网页开发软件,在市场上都没有能够取得成功,但这和软件的可用性无关,因为它们的可用性已经被证明是非常优秀的了。有些时候,非常优秀的产品同样会失败,而其失败的原因是你无法掌控的,无论你多么努力。对于此书中的一些华而不实的词藻,你可以用以上事实将自己拉回到现实当中。
我有书中图片里的同款 USB 扫描仪,设备配套对软件令我印象深刻。后来我把这台扫描仪送给了我父亲。有一次和他打电话,我并没有提到任何关于扫描仪的事情,但是他却提到他很喜欢这个扫描软件。这一切都发生在这本书出版之前!
…都是物有所值的。利用TRS-80 与 DEC Alpha 的对比来阐释48n和n3算法的差别?各位,真的没有比这样做更合适的了。能和大师一起工作一年是最好的了,退而求其次,你也可以读读《编程珠玑》。这本书将很多软件工程师的智慧提炼成了简洁易懂的文字,纳入其中。
我不会骗你:有一些章节是可以完全略过的。比如说,第11、13和14章分别介绍了如何实现排序,堆和哈希算法,考虑到现如今这些基本算法都有成熟 的库可以使用,我无法想象再去实现它们有什么意义。 对于那些和教科书一样恼人的习题,这里有一个很实在的建议。浏览一下这本书,跳过代码部分。有件事可能会让你失望,第八章“粗略估算”(Column 8, “Back of the Envelope” )是必须要看的。这里有我见过的最佳的估算方法。这章还解释了一些疯狂的面试问题,一些公司很喜欢用这些问题提问我们。
The roots of entrepreneurship are old. But, the fruits were never so lucrative as they have been recently. Until 2010, not many of us had heard of the term ‘start-up’. And now, not a day goes by when business newspapers don’t quote them. There is sudden gush in the level of courage which people possess.
Today, I see 1 out of 5 person talking about a new business idea. Some of them even succeed too in establishing their dream company. But, only the determined ones sustain. In data science, the story is bit different.
The success in data science is mainly driven by knowledge of the subject. Entrepreneurs are not required to work at ground level, but must have sound knowledge of how it is being done. What algorithms, tools, techniques are being used to create products & services.
In order to gain this knowledge, you have two ways:
You work for 5-6 years in data science, get to know things around and then start your business.
You start reading books along the way and become confident to start in first few years.
I would opt for second option.
Why read books ?
Think of our brain as a library. And, it’s a HUGE library.
How would an empty library look like? If I close my eyes and imagine, I see dust, spider webs, brownian movement of dust particles and darkness. If this imagination horrifies you, then start reading books.
The books listed below gives immense knowledge and motivation in technology arena. Reading these books will give you the chance to live many different entrepreneurial lives. Take them one by one. Don’t get overwhelmed. I’ve displayed a mix of technical and motivational books for entrepreneurs in data science. Happy Reading!
List of Books
Data Science For Business
This book is written by Foster Provost & Tom Fawcett. It gives a great head start to anyone, who is serious about doing business with big data analytics. It makes you believe, data is now business. No business in the world, can now sustain without leveraging the power of data. This books introduces you to real side of data analysis principles and algorithms without technical stuff. It gives you enough intuition and confidence to lead a team of data scientists and recommend what’s required. More importantly, it teaches you the winning approach to become a master at business problem solving.
This book is written by Thomas H. Davenport. It reveals the increasing importance of big data in organizations. It talks with interesting numbers, researches and statistics. So until 2009, companies worked on data samples. But with advent of powerful devices and data storage capabilities, companies now work on whole data. They don’t want to miss even a single bit of information. This book unveils the real side of big data, it’s influence on our daily lives, on companies and our jobs. As an entrepreneur, it is extremely important for you understand big data and its related terminologies.
This book is written by Alistair Croll and Benjamin Yoskovitz. It’s one of the most appreciated books on data startups. It consist of practical & detailed researches, advice, guidance which can help you to build your startup faster. It gives enough intuition to build data driven products and market them. The language is simple to understand. There are enough real world examples to make you believe, a business needs data analytics like a human needs air. To an entrepreneur, this will introduce the practical side of product development and what it takes to succeed in a red ocean market.
This book is written by Michael Lewis. It’s a brilliant tale which sprinkles some serious inspiration. A guy named billy bean does what most of the world failed to imagine, just by using data and statistics. He paved the path to victory when situations weren’t favorable. Running a business needs continuous motivation. This can be a good place to start with. However, this book involves technical aspects of baseball. Hence, if you don’t know baseball, chances are you might struggle in initial chapters. A movie also has been made on this book. Do watch it!
This book is written by Ashlee Vance. I’m sure none of us are fortunate to live the life of Elon Musk, but this book let’s us dive in his life and experience rise of fantastic future. Elon is the face behind Paypal, Tesla and SpaceX. He has dreamed of making space travel easy and cheap. Recently, he was applauded by Barack Obama for the successful landing of his spaceship in an ocean. People admire him. They want to know his secrets and this is where you can look for. As on entrepreneur, you will learn about must have ingredients which you need to a become successful in technology space.
This book is written by Thomas H Davenport and Jinho Kim. As we all know, data science is driven by numbers & maths (quants). Inspired from moneyball, this book teaches you the methods of using quantitative analysis for decision making. An entrepreneur is a terminal of decision making. One must learn to make decisions using numbers & analysis, rather than intuition. The language of this book is easy to understand and suited for non-maths background people too. Also, this book will make you comfortable with basics statistics and quantitative calculations in the world of business.
The author of this book is Nate Silver, the famous statistician who correctly predicted US Presidential elections in 2012. This books shows the real art and science of making predictions from data. This art involves developing the ability to filter out noise and make correct predictions. It includes interesting examples which conveys the ultimate reason behind success and failure of predictions. With more and more data, predictions have become prone to noise errors. Hence, it is increasingly important to understand the science behind making predictions using big data science. The chapters of this book are interesting and intuitive.
This book is written by Roger Lowenstein. It is an epic story of rise and failure of a hedge fund. For an entrepreneur, this book has ample lessons on investing, market conditions and capital management. It’s a story of a small bank, which used quantitative techniques for bond pricing throughout the world and ensured every invested made gives a profitable results. However, they didn’t sustain for long. Their quick rise was succeeded by failure. And, the impact of their failure was so devastating that US Federal bank stepped in to rescue the bank, because the fund’a bankruptcy would have large negative influence on world’s economy.
This book is written by Eric Ries. In one line, it teaches how to not to fail at the start of your business. It reveals proven strategies which are followed by startups around the world. It has abundance of stories to make you walk on the right path. An entrepreneur should read it when he/she feel like draining out of motivation. It teaches to you to learn quickly, implement new methods and act quickly if something doesn’t work out. This book applies to all industries and is not specific to data science.
This book is written by Avinash Kaushik. It is one of the best book to learn about web analytics. Internet is the fastest mode of collecting data. And, every entrepreneur must learn the art of internet accounting. Most of the businesses today face the challenge of weak presence on social media and internet platforms. Using various proven strategies and actionable insights, this book helps you to solve various challenges which could hamper your way. It also provides a winning template which can be applied in most of the situations. It focuses on choosing the right metric and ways to keep them in control.
This book is written by Eric Seigel. It is a good follow up book after web analytics 2.0. So, once you’ve understood the underlying concept of internet data, metrics and key strategies. This book teaches you the methods of using that knowledge to make predictions. It’s simple to understand and covers many interesting case studies displaying how companies predict our behavior and sell us products. It doesn’t cover technical aspects, but explains the general working on predictive analytics and its applications. You can also check out this funny rap video by Dr. Eric Seigel:
This book is written by Steven D Levitt and Stephen J Dubner. It shows the importance of numbers, data, quantitative analysis using various interesting stories. It says, there is a logic is everything which happens around us. Reading this book will make you aware of the unexplored depth at which data affects our real lives. It draws interesting analogy between school teachers and sumo wrestlers. Also, the bizarre stories featuring cases of criminal acts, real-estate, drug dealers will certainly add up to your exciting moments.
This book is written by Jessica Livingston. Again, this isn’t data science specific but a source of motivation to get you moving forward. It’s a collection of interviews with the founders of various startups across the world. The focus has been kept on early days i.e. how did they act when they started. This book will give you enough proven ideas, strategies and lessons to anticipate and avoid pitfalls in your initial days of business. It consist of stories by Steve Wozniak (Apple), Max Levchin (Paypal), Caterina Fake (Flikr) and many more. In total, there are 32 interviews listed which means you have the chance to learn from 32 mentors in one single book. Must read for entrepreneurs.
This book is written by Greg Gianforte and Marcus Gibson. It teaches about the things to do when you are running short of money and still don’t want to stop. This is a must read book for every entrepreneur. Considering the amount of investment required in data science startups, this book should have a special space in an entrepreneur’s heart. It reveals various eye opening truths and strategies which can help you build a great company. Greg and Marcus proves that money is not always the reason for startup failure, it’s all about founder’s perspective. This book has stories of success and failures, again a great chance for you to live many lives by reading this book.
This book is written by Thomas H Davenport, Jeanne G Harris and Robert Morrison. This books reveals the increased use of analytical tools & concepts by managers to make informed business decisions. The decision making process has accelerated. For a greater impact, it also consists of examples from popular companies like hotels.com, best buy and many more. It talks about recruiting, coordination with people and the use of data and analytics at an enterprise level. Many of us are aware of data and analytics. But, only a few know how to use them together. This quick book has it all !
This marks the end of this list. While compiling this list, I realized most of these books are about sharing experience and learning from the mistake of others. Also, it is immensely important to posses quantitative ability to become good in data science. I would suggest you to make a reading list and stick to it throughout the year. You can take up any book to start. I’d suggest to start with a motivational book.
Have you read any other book ? What were your key takeaways? Did you like reading this article? Do share your knowledge & experiences in the comments below.
如今,三家投资机构正在努力刺激工具和平台的开发,来提高研究者获取和使用这些数据的能力。在华盛顿特区举行的第7届医疗数据研讨会上,(美国)国立卫生研究院(National Institute ofHealth,简称NIH)、总部在英国的威康信托基金(Wellcome Trust)以及霍华德•休斯医学研究所(Howard Hughes Medical Institute)宣布了首届开放科学奖(Open Science Prize)的6支决赛队伍名单。
根据世界卫生组织(World Health Organization)的说法,空气污染是导致8分之1全球死亡病例的罪魁祸首,然而空气质量数据一直被存储在不起眼的网站上,难以访问,同时格式也不一致。OpenAQ平台(https://openaq.org/#/)原型将数据进行了合并和标准化,成为公众可得、实时的空气质量数据。它已经收集和分享了来自13个国家500多个地点的970万空气质量检测数据。
当美国食物和药品管理局(U.S Food and Drug Administration)批准一种药物时,该机构公开发布一系列关于该药物的信息,通常包含先前未公开的临床试验。尽管这些信息相当有价值,但难以获得、收集和搜索。OpenTrialFDA努力建立一个用户友好的网站界面让任何人能访问相关信息,还提供应用接口(API),允许第三方平台接入和搜索数据。(https://www.openscienceprize.org/p/s/1844843/)
在西方国家的字母体系,分成两大字族:serif(衬线字体)及sans serif(无衬线字体)。衬线字(下图中的宋体、Times New Roman)是指在字的笔画开始及结束的地方有额外的装饰,而且笔画的粗细会因直横的不同而有所不同。 相反的,无衬线字(下图中的思源黑体、Helvetica)就没有这些额外装饰,而且笔画粗细大致上是差不多。
R 简单易用。通过 R ,短短几行代码就可以筛选复杂的数据集,通过成熟的模型函数处理数据,制作精美的图表进行数据可视化。简直就是 Excel 的加强灵活版。
R 最大的价值就是围绕其开发的活跃的生态圈: R 社区在持续不断地向现存丰富的函数集增添新的包和特性。据估计 R 的使用者已经超过 200 万人,最近的一项调查也显示 R目前是数据科学领域最受欢迎的语言,大约 61% 的受访者使用 R(第二名是 Python, 占比39%)。
在华尔街,R 的使用比例也在不断增长。美国银行副总裁Niall O’Connor 说:“以往,分析员通常是熬夜研究 Excel 文件,但是现在 R 正被逐渐地应用于金融建模,尤其是作为可视化工具。R 促使了表格化分析的出局。”
作为一门数据建模语言, R 正在走向成熟,尽管在公司需要大规模产品的时候 R 能力有限,也有些人说它已经被其他语言替代了。
Metamarkets 公司的 CEO Michael Driscoll 说:“ R 擅长的是勾画,而不是搭建,在 Google 的 page rank 算法和 Facebook 的好友推荐算法实现的核心中是不会有 R 的。工程师会用 R 进行原型设计,再用 Java 或者 Python将其实现。”
Paul Butler 在 2010 年用 R 构建了一个著名的 Facebook 世界地图,证明了 R 在数据可视化上的强大能力。然而他并不经常使用 R。
Butler 说:“由于在处理较大数据集时缓慢且笨拙,R 在行业中已经有些沦为明日黄花了 ”
那么使用什么作为它的替代呢?看下去。
Python
如果 R 是个有点神经质的可爱的极客,那么 Python 就是它容易相处的欢快的表弟。融合了 R 快速成熟的数据挖掘能力以及更实际的产品构建能力, Python 正迅速地获得主流的呼声。 Python 更直观,且比 R 更易学,近几年其整体的生态系统发展也成长得很快,使其在统计分析上的能力超越了之前的 R 语言。
Butler 说:“Python 是行业人员正在转换发展的方向。过去两年里,很明显存在由 R 向 Python 转化的趋势”