I read a lot of books, but often forget what I read, even if I understand it. What I’d love to have is a perfect memory, so whenever in conversation say ‘tradeoffs between different fault tolerance abstractions in database design’ (Kleppmann - Designing Data Intensive Applications) or ‘the different ways narrow AI could develop to superintelligence’ (Bostrom - Superintelligence: Paths, Dangers, Strategies) come up I’d have all their arguments, summarised and with supporting examples ready to discuss. Crucially, though when I reference those books in a conversation, assuming I can remember them, I do not just regurgitate the whole book verbatim, instead I can extract the specific concepts that are relevant to the conversation.

Best-guess solution:

Data input

Notes are added in an unstructured text format summarising ‘atomic’ concept or idea with a couple of paragraphs with a source reference to a page or web link. Ideally, should be able to scrap books for data input, but more than simply inputting the text, instead we need to input the concepts into the database.

Data selection

Rather than tags content is ‘searchable’, because do not want to constrain selection to pre-planned input formats. However, searching should not be at a word-based level but at an idea level of abstraction.

Data aggregation

Furthermore, it shouldn’t involve simple searching for individual entries, but also allow ‘concept aggregation’ such as creating lists or comparing and contrasting different sources.

Problems with other solutions:


  1. Searches text rather than concepts. Ultimately, wrong level of abstraction where have 1. data/text/facts 2. concepts/ideas 3. arguments/essays. We want to search at the second level.

  2. We also want to be able to aggregate to level 3 abstractions e.g. you might google causes of the first world war and get 10 links, but you don’t get an aggregated list of causes drawn from multiple sources.


  1. Wikipedia stores information including level 2 and 3 abstractions in a standardised, high quality format. However, level 2 abstractions tend to be buried in long pages of text, and level 3 abstractions are rarely the ones you specifically want, although lists are pretty good.

  2. Also Wikipedia, lacks information from books (perhaps for IP reasons)


  1. Long notes which are difficult to search. Usually kept at a much higher level of aggregation such as a long article length.

  2. Require lots of manual input and upkeep.


  1. store information at too low a level of abstraction i.e. facts/ideas rather than concepts or arguments



  1. Problem with books is that they are too long, and it is not easy to move down the conceptual hierarchy from long arguments to component concepts.

  2. Also, they cannot be dynamically searched and mixed and matched.

Technical hurdles:

  • How search if do not have pre-defined graph/relational structure and not relying upon literal text search?

  • How extract concepts from a book? Need to run unsupervised machine learning but guided by feature/structural hints? Maybe if run two independent unsupervised learning algorithms on two different books how do you compare feature sets?

  • How create abstraction aggregations? Use deep learning to find them or manually add say ‘create a list’ or ‘create a set of advantages and disadvantages’ etc?


  • I’ve just started using simplenote and i’m writing little concept-level notes for myself. I’m not using any tagging but going to rely solely on search function. Goal is to see what it is like, how well it works and where it could be improved.

Screenshot from 2018-12-31 06-03-47.png

MVP Findings:

  • 03/01/2019 - Reading Ray Dalio’s Principles I want to not only store the concept, but also apply it. How can database be enhanced that I can write notes/apply ideas (perhaps from multiple sources) and then store/re-access those updates later?

  • 03/01/2019 - Difficult questions about what abstractions/notes to store. Intuition is to think about what is useful to actually use, but more precise criteria are not clear. E.g. Dalio says should systematize knowledge - is that a concept? Is that worth including? I find that lists or more extended recipes are more useful, which is interesting because it suggests itself an initial level of aggregation. Is it possible to generalise that types of concept level abstractions (which include lists, theory, etc.)?

  • 04/01/2019 - When creating comparison between derived data systems and distributed transactions with atomic commit from Kleppmann I find there are a lot of definitions I’d like to reference. Current solution is to include definitions in a list at the bottom of the page, but clearly a better structure is for them to be separate concepts/definitions which can then be referenced. This is essentially a hyperlink, but perhaps an implicit one (i.e. if one concept includes the term definition and total order broadcast and other concept uses the term total order broadcast perhaps these could be matched?) is better than a manually hard-wired explicit one? Clearly cannot just switch back and forth between different notes, so it would be useful to be able to create two windows, one for concept and second for definitions and have searching for a concept or set of concepts also bring up list of related definitions.

  • 04/01/2019 - Question is how disaggregated should concept be? Ideally you want to be able to aggregate everything up from concepts, so let us assume that for any concept in database you do not want to ever change or take a subset of that text, therefore, there needs to be a separate concept text for implementation, advantages, disadvantages, definition, comparison etc. of every concept. This quickly becomes unwieldly from a data input stand-point. And also harder, to output unless have good aggregation functions.

  • 05/01/2019 - If you could reliably classify text into type of structure e.g. definition, theory, example, advantages, disadvantages, comparison etc. and then build a graph of how that text relates to each other then I think it would be relatively easy to search/explore that space. Maybe a good starting-point would be to have users text from electronic books to ‘save’ requiring a structural tag, which would then give something to train on in the long-run to extract those concepts oneself. Or perhaps even average over lots of peoples’ hand-written notes etc.

  • 11/01/2019 - How do you have an individual’s tool to help work toward the truth? How do you have an individual tool for structured, systematic decision making? Many some concepts in think|base can act like templates which might be a series of list queries, but structured by questions e.g. what would convince me that I’m wrong, exploring the unknown/unclear/not definitely true decision space systematically. Could this be scaled to company wide decision making?

  • 11/01/2019 - Provocactive question: what would the tool look like if you knew that you have no memory about the last 24 hours? I.e. can think long and hard about problems each day, but cannot, without prompts, remember anything you read etc.

  • 13/01/2019 - want to write notes, e.g. ask myself dalio’s question of what are my principles, but how to store notes? if store as concepts then what about privacy and separation from text notes? also would like to be able to track over time - e.g. principles from 2019 principles from 2020 etc, there need an aggregate function associated with timeline. Also it would be nice to be able to have condensed, summary notes for personal musings but then also free-ranging, brain dump notes as well.

  • 13/01/2019 - so going through dalio’s simple framework of 1. what do you want 2. what is true 3 what are you going to do about it? - immediately it becomes obvious when i’m writing my personal reflections into the database i want to be able to a) have dual screen so i can see a concept and write my reflections on the concept etc. b) want to be able to link my notes to dalios principles, so that later can bring up dual screen perhaps. Currently doing manually with a reference to Dalio’s framework. How would I write notes on multiple frameworks? Probably as separate things - but it might be nice to integrate notes to compare across and integrate.

  • 17/01/2019 - If add notes at a concept level, then how pre-bake in aggregations e.g. Dalio’s 5 step process to get what you want out of life. Some aggregations will be impossible for computer to work out itself, so need to be able to add them.

  • 22/01/2019 - I want to be able to output a list, of all the different snippets of information on HDFS, H-Base, YARN, Impala, Hive etc. that I read on various books. Unfortunatley how would that work? For example, if one snippet talks about HDFS and compares it in passing to RAID, do I want to a list of RAID-related notes to include it as well as HDFS-related list? Basically, need a way to tier associations.

  • 22/01/2019 - As an MVP I think the key feature is being able to search text, and stack notes on top of each other to make a list, which can be manually edited etc. to refine.

  • 30/01/2019 - It is boring, and long to copy out meta-data (think source and summary) and write lots of small blocks of insight . e.g. rather than 4 separate advantage of HDFS blocks notes much easier to write all the HDFS blocks material at once. However, perhaps it could be easy to tag metadata/source paragraphs and when aggregating do not return whole note but paragraphs within the note. At the very least, could automatically store each paragraph as a separate note to be searched and aggregated.

  • 01/04/2019 - Writing literature review notes in overleaf but there are a number of problems 1. It is slow 2. I cannot search text within document or across documents 3. Annoying to upload images 4. Notes inevitably have less detail then I’d like. Ideal solution would be a shift-print type screenshotter which automatically converted screenshotted text (say a paragraph from an article) and put in a database as searchable text, with a given source, author title, paper name etc. so that if I want to search all possible references to say non-maximal suppression

  • 22/05/2019 - I want to prepare competencies but I find because I want them together writing in latex feels easier, not being able to see them all together is annoying as is printing them out in a nice format.

  • 19/06/2019 - Watched a debate between Jacob Rees-Mogg and Rory Stewart on ethics and Rory Stewart’s answer was very structured. Fortunately he summarised it at the end giving me the structure, so I was then able to write notes to add to my notetaker. Thinking more broadely, Google works very well for remembering facts, and the brain basically works by have only the keys stored in memory, e.g. Battle of Hastings, I dont remember the data, but I remember teh name of the battle that I can use to look it up. The problem with arguments and more complicated structures is you need to hold the whole argument in memory as well, and that is difficult. Really I want to be able to have a key to the argument which then is itself a key to details and facts. E.g. Bostrom’s stuff on Superintelligence I cannot remember anything except AI is going to develop very quickly and will kill us all (lol). Holding all those arguments in memory is too difficult however, but remember Bostrom had argument is good enough to look up, but there are few reliable online resources (even Wikipedia) that really summarise the arguments. So therefore, my preference would be to read his book again say and write notes (although time-consuming) into a database, where the notes are structured. I could structure them up front e.g. 3 reasons why AI will develop quickly, but this makes reading from the database limited. Ideally, I’d like to be able to store the elements and have the database piece those elements together itself. Because that is the other challenge with thinking which is the aggregating differnet ideas etc., comparing and contrasting because you have to hold a lot of different infromation and arguments in memory to do those comparisons, a tool which brought all the arguments and comparisons there would make that much easier.

project/article ideas

misc learning

  • Rocket textbook

  • Old maths topics

  • Python data analysis

  • R time series

  • Linux

  • Github

  • Maths history

  • Fundamentals of physics

  • agile

  • leet


  • Classification of phenomena by how easy they are to predict.

  • Properties of industry-related derivatives.

  • Automation - is it happening?

  • Economic analysis on admin data e.g. VAT data to estimate lifespan of firms.

  • test whether machine learning can 'discover' hidden structure (that is planted)

  • define usefulness & closeness to decision makers.

  • systematic way to uncover secrets knowledge about how things actually work.

  • wage data derivatives on sweden/norway wage data

  • lateral inhibition vs social norms.... implications for raising children

  • create notes on technologies across different books


  • booksonnews - multi-disciplinary analysis of news.

  • sig|nal - code to mark mathematics exams.

  • ONS outlier/trend detection - python code.

  • Dissertation - machine learning applied to macroeconomic/micro data.

  • Quantecon - translate Python code into R.

  • OBR - translate winsolve OBR code into Python.

  • Linear algebra udemy course.

  • book review web scraper classified by reviewer

  • guided journalling/

  • G-Fold algorithm

  • Classifying prediction problems

  • r package ons statistics + build in methodology + analysis

  • not big data but small data, outlier/anomaly/weirdness detection

  • youtube video scanning of questions with elon - categorising the questions

  • model different models competing against each other in fake assets markets and see who wins/ get most returns

  • case studies of where data science has worked

  • reinforcement learning in a game with changing rules. i.e. learning how to learn

  • rather than hyperlinks like google or articles like wikipedia, build sorta mind maps for topics, which can then be trained upon to generate abstractions. it would be interseting if you had umm sorta particles interacting with each other if they would generate the same words for things as we swould so if particles moved aroudn and ate food and talked and coordinated, if those particles would generate abstractions for communication in the same way.

  • unsupervised learning based database, where store elements and then it looks for relationships between elements.

  • software to help management of staff/project management thinking about incentives etc. thinking about stakeholders

  • anti-book recommender

  • rather than trying to improve the learnign algorithms, what about trying to improve the quality of the data? obvious example is if trying to differentiate between a cat and a dog spend a lot of time looking at examples that are close to the boundary, i.e. where there is a higher probaiblity of misclassifying. but you can take it further and ask how to learn higher levels of abstraction, can you learn on the abstract data. so machine learning models always have the raw data and have to learn from scratch, but humans learn history etc. reading about the abstractions direclty, i guess this is transfer learning, but that is a function of the model, rather than the data? maybe learning is transfering hte model not the data that hte model was run on?


In my whole life, I have known no wise people who didn't read all the time - none, zero.

— Charlie Munger


  • Human systems are very complex and difficult to understand requiring a multi-disciplinary approach, but mastering all of the subjects takes too much time.

Flawed current solutions

  • Academic expertise: Despite the inter-relatedness of human systems most academic knowledge is silo'ed in disciplines e.g. politics, economics, finance, technology, statistics, international relations, science, business, religion, psychology etc. that don't speak to each other. Also, who has time to read 100 different journals to mine for insight?
  • Journalism: Good job of being relevant but too much focus on breaking news rather than analysis. A lot of articles, even from top journals like the Economist, are very shallow, rarely drawing on the vast academic literature that exists. Good writing distracts from limited content.
  • Blogosphere: Lots of diverse opinions, but finding good content is very difficult: density of (diverse) quality is too low. Who cares what a random blogger thinks? Plus, it is hard to analyse an argument independently from who is doing the arguing - particularly if that person is not actively pointing out potential flaws and limitations in their thinking.
  • Conversation: People are often opinionated but it is hard to really become wiser through conversation as it is very dependent upon who you talk to and is subject to lots of biases like whether you like or relate to who is speaking. Also, it is very easy to talk to be heard, rather than to listen and learn.

booksonnews as a solution

  • Each month booksonnews would analyse a big news issue from many different perspectives/disciplines. The name comes from the idea that lots of different books would be used to analyse a news item.
  • Analysis would be brief, perhaps just 10-20 lines, but look to apply the core ideas to the news item specifically. When combined with perhaps 20-50 other books the content will still be a lot. At first each paragraph of analysis would be independent from the others but perhaps over time we will figure out a way to integrate the ideas as well.
  • For those who are interested, there would be a follow on summary of the theory, book or data that was used to analyse the issue with links to more information and perhaps places to buy books etc.
  • booksonnews may offer an interesting way to invert the learning process. Traditionally, students learn theories and tools without any sense of why they are useful and only after they have learnt them can they look to apply them. bookonnews would invert the process where the news becomes a filter for what is worth learning about.

Inversion: what types of flawed thinking is booksonnews going to try to avoid? 

  • Avoid 'man with a hammer syndrome', where a limited number of causal variables/models are weighted too heavily. Instead, systematically generate a large variety of explanations.
  • Avoid emotional attachment to theories/view points by following Charlie Munger's prescription 'I’m not entitled to have an opinion on this subject unless I can state the arguments against my position better than the people do who are supporting it. I think only when I reach that stage am I qualified to speak.'
  • Avoid silos of insight by forcing integration and comparison of different views and models. Any contradictions? Any compounding effects?
  • Avoid failing to update views and opinions with the changing facts and situation by, up front, citing the conditions which would convince you to change your mind - and then actively seek out such evidence.
  • Avoid the mistake of thinking you understand something when you don't. Try to estimate how well you understand something and where the limits of that understanding is.

Hypotheses/assumptions behind booksonnews

  • A multi-disciplinary approach is more fruitful than a single discipline approach.
  • There is a demand for deeper analysis of news, but admittedly less timely - perhaps even to the point where people would be willing to pay.
  • It is possible to analyse an issue fruitfully from the lens of a specific discipline to an audience without domain expertise in that discipline.
  • Readers will be willing to try a new and unproven news website and promote the website through word of mouth.
  • AI or some cyborg addition (e.g. neural lace) do not completely change human processing power making traditional learning via reading etc. obsolete.