Jekyll2023-09-22T11:09:01+00:00https://kareem.shehata.ca/feed.xmlKareem ShehataKareem is passionate about the discovery, synthesis, and dissemination of information. He does research on Applied Cryptography and Theoretical Computer Science, and is currently pursuing a PhD at National University of Singapore, where he sometimes teaches.How I Made This Site2023-05-10T07:17:21+00:002023-05-10T07:17:21+00:00https://kareem.shehata.ca/blog/2023/05/10/how_i_made_this_site<p>This site uses:</p>
<ul>
<li><a href="https://jekyllrb.com/">Jekyll</a></li>
<li><a href="https://pages.github.com/">GitHub Pages</a></li>
<li><a href="https://www.mathjax.org/">MathJax</a> for math formatting</li>
<li>Custom sed script for taking Notion-exported pages and fixing the math formatting</li>
<li>Jekyll Collection for talks and pubs</li>
</ul>
<h1 id="steps">Steps</h1>
<ol>
<li>
<p>Follow the <a href="https://jekyllrb.com/docs/">Jekyll docs</a> to install
Jekyll and all its dependencies. Much easier than the GitHub docs.</p>
</li>
<li>
<p>Once you have a site working locally, then follow the GitHub docs for setting up pages.</p>
</li>
<li>
<p>From <a href="https://stackoverflow.com/questions/34347818/using-mathjax-on-a-github-page">this SO answer</a>
to enable MathJax you have to add the JS for it in your head. I wrapped it in a Liquid if-statement
so that it’s only loaded when needed, but that’s optional.</p>
</li>
<li>
<p>Follow <a href="https://www.danielsieger.com/blog/2019/03/03/publication-pages-using-jekyll-collections.html">Daniel Sieger’s guide</a> to make a publications list from a collection.</p>
</li>
</ol>
<h1 id="notion-export">Notion Export</h1>
<p>For some reason, Notion exports to markdown using a single dollar
sign ($) for inline formulas. Unfortunately, this kinda breaks
everything, since most markdown expects a dollar sign to be just a
dollar sign, and a double-dollar sign for math.</p>
<p>Fortunately, <code class="language-plaintext highlighter-rouge">sed</code> exists to solve this problem. Here’s a simple script that fixes it.</p>
<div class="language-sed highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">s</span><span class="p">/</span><span class="se">\(</span><span class="p">[^</span><span class="sr">\$</span><span class="p">]</span><span class="se">\)\$\(</span><span class="p">[^</span><span class="sr">\$</span><span class="p">]</span><span class="se">\)</span><span class="p">/</span><span class="se">\1</span>$$<span class="se">\2</span><span class="p">/</span><span class="k">g</span>
<span class="k">s</span><span class="p">/</span><span class="se">\(</span><span class="p">[^</span><span class="sr">\$</span><span class="p">]</span><span class="se">\)\$\(</span><span class="p">[^</span><span class="sr">\$</span><span class="p">]</span><span class="se">\)</span><span class="p">/</span><span class="se">\1</span>$$<span class="se">\2</span><span class="p">/</span><span class="k">g</span>
<span class="k">s</span><span class="p">/</span><span class="se">\(</span><span class="p">[^</span><span class="sr">\$</span><span class="p">]</span><span class="se">\)\$\\</span><span class="p">/</span><span class="se">\1</span>$$<span class="se">\\</span><span class="p">/</span><span class="k">g</span>
<span class="k">s</span><span class="p">/</span><span class="se">\(</span><span class="p">[^</span><span class="sr">\$</span><span class="p">]</span><span class="se">\)\$</span><span class="o">$</span><span class="p">/</span><span class="se">\1</span>$$<span class="p">/</span>
<span class="k">s</span><span class="p">/</span><span class="o">^</span><span class="se">\$\(</span><span class="p">[^</span><span class="sr">\$</span><span class="p">]</span><span class="se">\)</span><span class="p">/</span>$$<span class="se">\1</span><span class="p">/</span>
</code></pre></div></div>This site uses: Jekyll GitHub Pages MathJax for math formatting Custom sed script for taking Notion-exported pages and fixing the math formatting Jekyll Collection for talks and pubsMajority Voting2022-10-17T08:53:21+00:002022-10-17T08:53:21+00:00https://kareem.shehata.ca/blog/2022/10/17/majority-voting<h1 id="majority-voting">Majority Voting</h1>
<h1 id="problem">Problem</h1>
<p>Here’s a problem that comes up very often in computer science. Let’s say you have some process that gives you the right answer <strong>**most</strong>** of the time, but not always. For example: you need to check if a particular transaction is “final” in a block chain. To keep things simple we’ll consider only “true or false” kind of questions. Let’s say the probability of getting the <strong>**<em>wrong</em></strong>** answer is \(f \le 1/2\).</p>
<p>Obviously, you want the probability of getting things right to be a lot better than a half, so you run the process \(n\) times independently, and then if the majority of the answers are true then you say the result is true, and false otherwise. How many times do you need to run the process (i.e. what must \(n\) be) in order to get the correct answer with probability at least \(1 - \delta\)?</p>
<p>This is called the “Majority Voting” problem, and while there are several ways to analyse it. Usually a few simplifying assumptions are made and people just fast-forward to the bottom line that \(n \ge O \left( \log \frac{1}{\delta} \right)\). While this is true, it hides the effect of all of the constants. How close can \(f\) be to \(1/2\)?</p>
<p>I’m going to present an analysis here that uses as few approximations as possible so that we can study the effect of the constants at the end. It’s fairly detailed, but serves as a good example of how to do this kind of analysis.</p>
<h1 id="solution">Solution</h1>
<p>Since we’re looking for a majority, we’ll assume that we do an odd number of trials, i.e. \(n = 2t + 1\). We can model this problem as a series of coin tosses, each with a probability of coming up tails \(f\), and asking what is the probability that you get heads \(t+1\) times or more? The probability of any particular combination of heads and tails can be modelled as a Binomial Distribution. Consider the probability that we get <strong>**</strong><em>exactly</em><strong>**</strong> \(t + 1\) tails:</p>
\[\Pr\left[t + 1 \text{ tails}\right] = {2t + 1 \choose t + 1} f^{t+1} (1-f)^{t} = p\]
<p>Let \(f = \frac{1-\varepsilon}{2}\) and rearrange a bit:</p>
\[\begin{split}
\Pr\left[t + 1 \text{ tails}\right] &= \frac{(2t + 1)!}{(t + 1)!t!} \cdot f \cdot \left(\frac{1-\varepsilon}{2}\right)^{t} \left(\frac{1+\varepsilon}{2}\right)^{t} \\
&= \frac{2t+1}{t+1}f \cdot \frac{(2t)!}{(t!)^2} \cdot \left(\frac{1 - \varepsilon^2}{2^2}\right)^t
\end{split}\]
<p>Let \(k = 1 - \varepsilon^2\)</p>
\[\Pr\left[t + 1 \text{ tails}\right] = \frac{2t+1}{t+1}f \cdot \frac{(2t)!}{(t!)^2} \cdot \left(\frac{k}{2^2}\right)^t\]
<p>Stirling’s Formula (<a href="https://youtu.be/TbazAJbw6RE">see this lecture</a>):</p>
\[n! = \sqrt{2 \pi n} \cdot \frac{n^n}{e^n} \cdot \left(1 \pm O(1/n) \right)\]
<p>Applying this to the above:</p>
\[\frac{(2t)!}{(t!)^2} = \frac{\sqrt{2 \pi 2t} \cdot \left( \frac{2t}{e} \right)^{2t} \cdot \left(1 \pm O(1/n) \right)}{\left( \sqrt{2 \pi t} \cdot \left( \frac{t}{e} \right)^{t} \cdot \left(1 \pm O(1/n) \right) \right)^2}
\sim \frac{2^{2t}}{\sqrt{\pi t}}\]
<p>Notice that a lot of things cancel out. We’ll assume the error terms cancel out as well, which is true asymptotically. Substituting back, we get:</p>
\[\Pr\left[t + 1 \text{ tails}\right] \sim \frac{2t+1}{t+1}f \cdot \frac{2^{2t}}{\sqrt{\pi t}} \cdot \left(\frac{k}{2^2}\right)^t
= \frac{2t+1}{t+1}f \cdot \frac{k^t}{\sqrt{\pi t}}\]
<p>Now consider what happens if we get more failures:</p>
\[\begin{split}
\Pr[t + 1 + i \text{ tails}] &= {2t + 1 \choose t + 1 + i} f^{t+i+1} (1-f)^{t - i} \\
&= \frac{(2t + 1)!}{(t + 1 + i)!(t - i)!} f^{t+1} (1-f)^t f^i (1-f)^{-i} \\
&= \frac{(2t + 1)!}{ (t + 1)! t!} f^{t+1} (1-f)^t \frac{(t - i + 1) \cdots t}{(t + 1 + i) \cdots (t + 2)} \left(\frac{f}{1-f}\right)^{i}
\end{split}\]
<p>Notice that the left side is exactly \(p\) defined above. As for the right side, notice that \(\forall x > 0, i > 0\):</p>
\[\frac{x}{x+1} > \frac{x - i}{x + 1 + i}\]
<p>We can substitute this into the middle terms.</p>
<p>Let \(\alpha = \frac{t}{t + 2} \frac{f} {1 - f}\)</p>
\[\Pr[t + 1 + i \text{ tails}] \le \frac{(2t + 1)!}{ (t + 1)! t!} f^{t+1} (1-f)^t \left(\frac{t}{t + 2}\frac{f}{1-f}\right)^{i} = p\alpha^i\]
<p>We can now bound the overall probability of failure:</p>
\[\begin{split}
\Pr\left[\text{failure}\right] &= \Pr[t + 1 \text{ tails}] + \Pr[t + 2 \text{ tails}] + \ldots \Pr[2t + 1 \text{ tails}] \\
&\le p + \alpha p + \ldots \alpha^t p
\end{split}\]
<p>Notice that this is simply a geometric series.</p>
\[\Pr\left[\text{failure}\right] \le p \frac{1 - \alpha^{t}}{1 - \alpha} \le \frac{2t+1}{t+1}f \cdot \frac{k^t}{\sqrt{\pi t}} \cdot \frac{1 - \alpha^{t}}{1 - \alpha}\]
<p>To simplify things a bit:</p>
\[\frac{1-\alpha^t}{1-\alpha} \le \frac{1}{1-\alpha}\]
<p>We can also get rid of the first term as for \(f < 1/2, t > 0\):</p>
\[\frac{2t+1}{t+1}f \le \frac{t + 1/2}{t + 1} < 1\]
<p>The problem reduces to:</p>
\[\Pr\left[\text{failure}\right] \le \frac{k^t}{\sqrt{\pi t}} \cdot \frac{1}{1 - \alpha}\]
<p>Substitute for \(\alpha = \frac{t}{t + 2} \frac{f} {1 - f} = \frac{t}{t + 2} \frac{1 - \varepsilon}{1 + \varepsilon}\)</p>
\[\Pr\left[\text{failure}\right] \le \frac{k^t}{\sqrt{\pi t}} \cdot \frac{(t + 2)(1 + \varepsilon)}{2(t \varepsilon + \varepsilon + 1)}\]
<p>If we fix some \(0 < \varepsilon < 1\) then the right side of this converges to a constant value (and pretty quickly if \(\varepsilon\) isn’t very close to zero). We’ll analyse that in more detail later, for now let’s replace it with a constant and assume that \(k\) is also some constant value too. Then notice that for large enough \(t\) the denominator can be replaced by a constant too. We can simplify and apply our desired bound.</p>
\[\Pr\left[\text{failure}\right] \le \frac{c' k^t}{\sqrt{\pi t}} \le c k^t \le \delta\]
<p>Taking log of both sides:</p>
\[t \log k + \log c \le \log \delta\]
\[t \ge \frac{\log \frac{1}{\delta} + \log c}{\log \frac{1}{k}}\]
<p>Which we can state more simply as:</p>
\[t \ge O\left( \log \frac{1}{\delta} \right)\]
<h1 id="effect-of-varepsilon">Effect of \(\varepsilon\)</h1>
<p>The analysis above works really well if we assume that \(\varepsilon\) is some fixed value greater than zero. But you may notice that if it’s <em>very</em> close to zero, things stop working. Consider the constant we threw out:</p>
\[c = \frac{(t + 2)(1 + \varepsilon)}{2(t \varepsilon + \varepsilon + 1)}\]
<p>Notice what happens as \(t\) goes to infinity:</p>
\[\lim_{t \rightarrow \infty} c = \lim_{t \rightarrow \infty} \frac{(1 + 2/t)(1 + \varepsilon)}{2(\varepsilon + \varepsilon/t + 1/t)} = \frac{1 + \varepsilon}{2 \varepsilon}\]
<p>In other words, as epsilon gets arbitrarily small our constant goes asymptotically to infinity!</p>
<p>What about the \(k\) value in the denominator? Let’s expand that:</p>
\[\log \frac{1}{1 - \varepsilon^2}\]
<p>On the flipside, this produces exponentially <em>smaller</em> values as \(\varepsilon\) gets small, which in the denominator make things <em>grow</em> exponentially! Here’s what it looks like:</p>
<p><img src="/blog/assets/plot_log_eps.png" alt="Plot of 1 / (1 - eps^2)" /></p>
<p>This by itself when applied to \(\delta\) is problematic. When combined with the effect on the constant term, you get a rather huge explosion!</p>
<p><img src="/blog/assets/plot_log_explode.png" alt="Plot of the combined effect" /></p>
<h1 id="conclusions">Conclusions</h1>
<p>The analysis usually given that \(n \ge O \left( \log \frac{1}{\delta} \right)\) is correct for reasonable values of \(\varepsilon\), but don’t forget to analyse your constants! In this case, if \(\varepsilon\) gets very small, the constant values explode very badly.</p>
<p>This makes sense when you think about it. You’re essentially trying to distinguish two distributions: correct and incorrect responses. If the two are arbitrarily close to each other, you can’t distinguish without taking an arbitrarily large number of samples. If the two are far apart, you can distinguish them in a much smaller number of samples.</p>Majority VotingMake ssh behave like local terminals on Mac2020-12-17T22:10:42+00:002020-12-17T22:10:42+00:00https://kareem.shehata.ca/blog/2020/12/17/Mac_ssh<p>Do you use a Mac and have long-running ssh session to a workstation
or server? Would you like those ssh sessions to behave just like
local terminals? How about being able to restore sessions after a
disconnection or reboot of your Mac? Then you’ve come to the right
place. Here I’ll show you how to set up iTerm and tmux to do all of
that and more.</p>
<ol>
<li>
<p>Install <a href="https://iterm2.com/">iTerm</a> on your Mac</p>
</li>
<li>
<p>Make sure
<a href="https://linuxize.com/post/getting-started-with-tmux/">tmux</a> is
installed on your server</p>
</li>
<li>
<p>Optional, but highly recommended: set up <a href="https://kb.iu.edu/d/aews">ssh public key
login</a></p>
</li>
<li>
<p>Create a new Profile in iTerm (Profiles -> Open Profiles ->
Edit Profiles -> + button), with the following options:</p>
<p>a. Name: a name that’s meaningful to you</p>
<p>b. Shortcut key: if you use this connection often, set a shortcut key
for you to start it quickly</p>
<p>c. Command: <code class="language-plaintext highlighter-rouge">ssh -tt user@server tmux -CC new -A -s iterm</code></p>
</li>
</ol>
<p>This will now reconnect to the session “iterm” or create a new one if
a session doesn’t exist.</p>
<h2 id="shell-integration">Shell Integration</h2>
<p>iTerm2 features <a href="https://iterm2.com/documentation-shell-integration.html">shell
integration</a>
which does all kinds of wonderful things from marking command results
to displaying images to automating file uploads and download.
Installing shell integration is easy. When you’re connected to the
host you want to use, go to iTerm2 -> Install Shell Integration
and it will automatically run the commands to do the installation.</p>
<p>The catch is that shell integration is disabled by default with tmux.
To fix that, add the following line to your <code class="language-plaintext highlighter-rouge">.bash_profile</code> right
before the shell integration is triggered:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">ITERM_ENABLE_SHELL_INTEGRATION_WITH_TMUX</span><span class="o">=</span>YES
</code></pre></div></div>
<h2 id="fixing-hostname">Fixing hostname</h2>
<p>Some hosts don’t set hostname correctly or don’t set the proper
domain name (I’m looking at you Andromeda). To fix this, again edit
your <code class="language-plaintext highlighter-rouge">.bash_profile</code> and add the following line above the shell
integration:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">iterm2_hostname</span><span class="o">=</span>foo.example.com
</code></pre></div></div>
<p>Obviously, replace <code class="language-plaintext highlighter-rouge">foo.example.com</code> with the correct fully qualified
hostname.</p>Do you use a Mac and have long-running ssh session to a workstation or server? Would you like those ssh sessions to behave just like local terminals? How about being able to restore sessions after a disconnection or reboot of your Mac? Then you’ve come to the right place. Here I’ll show you how to set up iTerm and tmux to do all of that and more.C++ Dev on a Mac2020-02-19T11:18:21+00:002020-02-19T11:18:21+00:00https://kareem.shehata.ca/blog/2020/02/19/C++_Dev_on_a_Mac<p><b>TL;DR</b> I wrote a guide for quickly setting up a C++ dev environment. It's over on github at <a href="https://github.com/kshehata/Mac-Cpp-Quickstart">https://github.com/kshehata/Mac-Cpp-Quickstart</a></p>
<p>I'm taking <a href="https://www.coursera.org/learn/crypto" target="_blank">Coursera / Stanford's course on cryptography</a> to brush up on the fundamentals. I figured I should do some of the example problems in C++, that way not only do I get to know crypto better, but I also brush up on C++ and learn the crypto libraries at the same time. I didn't need anything too complicated, just a C++ dev environment that will compile a class or two, unit tests, and the crypto library. Nothing too fancy, right?</p>
<p>My first attempt was Xcode. I don't need anything cross-platform, just a sandbox basically, so Xcode should work. I was almost instantly disappointed, for the following reasons:</p>
<ul>
<li>No out-of-the box support for gtest. I was hoping this would be as easy as File -> New -> C++ Test Case. Nope.</li>
<li>Difficult to configure C++ 17 support. This should be an easy setting. Instead you have to dive into compiler configs to make it happen</li>
<li>No autoformatting</li>
<li>Adding a library like Crypto++ would have been a hassle. Not difficult, just means integrating it into the project</li>
<li>Far more complicated that I would like for a simple project, without any cross-platform benefits (this is my general complaint about IDEs)</li>
</ul>
<p>Basically, Xcode is great for developing iOS and Mac apps using Apple frameworks and Swift / Objective-C, but not great for anything else and definitely not aimed at C++ development. So what other options are there?</p>
<p>I started looking around and found both <a href="https://cmake.org/" target="_blank">CMake</a> and <a href="https://bazel.build/" target="_blank">Bazel</a>. While I'm familiar with Bazel from my Google days, it seemed like overkill and it seemed like CMake is now the accepted standard and a lot more generally supported. CMake has also improved a ton since I last used it years ago and now looks like exactly what I need: just point it at a few sources and get a build. That it supports basically any platform and can even generate files for IDEs are just gravy.</p>
<p>So that solves the build problem. What about dependencies like crypto++ and googletest?</p>
<ol>
<li>Include them in the project build itself as a submodule, as <a href="https://cliutils.gitlab.io/modern-cmake/chapters/testing/googletest.html" target="_blank">described in Modern CMake</a></li>
<li>Do the old-school build and install at a system level</li>
<li>Use something like <a href="https://brew.sh/" target="_blank">homebrew</a> to install them for you</li>
</ol>
<p>None of these are great options. The first one involves a lot of management of an external dependency within your project. If it's a critical library that may be a good idea, but if your dependencies start adding up this can get unwieldy very quickly. It also doesn't make sense for every project to have its own copy of gtest. That's so general it should really be at the system level.</p>
<p>But the system level solutions are also problematic. What happens if two different projects want different version of the library? Or if you want to test changes to the library with your project? Do you really want to be installing dev packages system-wide for every dev dependency you have? More importantly: anyone who wants to build your code is going to have to do the same thing: install everything at a system level.</p>
<p>Other dev environments have this figured out. In Python I can just do "pip install" and it'll take care of everything for me. Homebrew solves some of this, but at great cost. I really don't like installing homebrew, as it always seems to bring its own set of problems. Also: someone else trying to build my code is again going to have to install everything and hope it all works.</p>
<p>Enter <a href="https://conan.io">Conan</a>. Conan is basically the C++ answer to pip. You specify what you need, and it takes care of setting it all up for you. It integrates cleanly with CMake, meaning you get all of your libraries included with very little effort. Now this is what I wanted when I set out to set up my dev environment!</p>
<p>Since it took me a while to get to this point, I figured it was worth writing out what I did. Of course, that took a lot longer than I thought it would, but hopefully it'll be worth it for someone. Check it out here:</p>
<p><a href="https://github.com/kshehata/Mac-Cpp-Quickstart">https://github.com/kshehata/Mac-Cpp-Quickstart</a></p>TL;DR I wrote a guide for quickly setting up a C++ dev environment. It's over on github at https://github.com/kshehata/Mac-Cpp-Quickstart