computer science
Jun. 10th, 2022 09:36 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
https://www.joelonsoftware.com/2001/12/11/back-to-basics/
These are all things that require you to think about bytes, and they affect the big top-level decisions we make in all kinds of architecture and strategy. This is why my view of teaching is that first year CS students need to start at the basics, using C and building their way up from the CPU. I am actually physically disgusted that so many computer science programs think that Java is a good introductory language, because it’s “easy” and you don’t get confused with all that boring string/malloc stuff but you can learn cool OOP stuff which will make your big programs ever so modular. This is a pedagogical disaster waiting to happen. Generations of graduates are descending on us and creating Shlemiel The Painter algorithms right and left and they don’t even realize it, since they fundamentally have no idea that strings are, at a very deep level, difficult, even if you can’t quite see that in your perl script. If you want to teach somebody something well, you have to start at the very lowest level.
no subject
Date: 2022-06-11 04:39 am (UTC)no subject
Date: 2022-06-12 02:40 pm (UTC)Back in 1991, when I was an undergrad, I had to do assembly language coding in my second (2nd!) semester of undergraduate Computer Science. I then went on and spent several years using C and its children (C++, Objective-C). I feel like I'm a much better developer in Java, Python, and JavaScript, thanks to this experience. At least I have some dim inkling that I might be Shlemiel The Painter sometimes (although, granted, it's often premature optimization to stop and worry about it.)
I find that my cohort in my current program seem to generally think that software development consists of typing "npm install" a bunch of time.
Do you find that modern recent grads, who started with Java and never did anything closer to the metal, have the intuitive instincts to write performant code?
no subject
Date: 2022-06-12 07:59 pm (UTC)He's right that it's easy to mess up C string handling, and he even says something very right at the start of the post, namely "you should avoid ASCIZ strings like the plague". But then he (a) thinks the alternative is Pascal strings, and (b) spends the rest of the post talking about how to optimize string handling anyway when really you should let a library do that stuff for you, and then kind of handwaves that knowing this stuff is somehow good for your code.
I might have agreed in 2001 when this was written, because I didn't yet know better, and because I still considered Joel to be a computer science authority. Joel should have known better by 1998 when the STL came out and Bjarne Stroustrup started talking about how the lack of things like a native string class was one of the biggest mistakes he made in the original C++ spec. Around 2002 I spent a bunch of time ripping out C strings from a medium sized code base and replacing them with std::strings and even though this was one of the rare cases where string processing really was the critical path performance wise it was still absolutely the right thing to do, because by then std::string had copy on write and good lord, you do not want every project writing its own copy on write semantics.
Which comes to the important point. I have had the great luxury of working on projects where performance was the issue and as a classically trained programmer this has been a lot of fun for me. But usually security and safety and maintainability and operability and UX are the issues and performance is... not. Rolling your own string processing is about as much of a good idea as rolling your own crypto: don't! I want to work with people who can pick the right library for what they're doing, and who can pick the language that has the best library support. Even when I need performance I want someone who can A/B test the code to compare the actual performance difference between unordered_map<K, V> and vector<pair<K, V>>, and not someone who thinks they know the answer already.
And here's the thing. Even where performance is the issue, Joel gets it wrong. The CPU was the bottleneck when Joel started coding. It was so slow you could do a DRAM memory fetch in a single CPU cycle. Since then CPU clock rates have gone up by a factor of 1000 and the amount of work you can do in a single CPU cycle has gone up by *another* factor of 1000. Meanwhile, memory fetches have gotten faster by a factor of *two*. I work with modern recent grads who understand how many bytes an algorithm examines, and think algorithms that examine fewer bytes are always faster than ones that examine more bytes. But they're usually not, because one memory fetch can retrieve kilobytes of data (and often effectively *can't* retrieve less than 32 or 64 bytes of data). As long as your data is contiguous there's almost no cost to examining those extra bytes. His description, in that post, of the problem with XML as a data structure at the sizes he's discussing was already starting to be wrong when he wrote that post and it's very wrong now. XML has of course been surpassed by JSON as a human readable message format, and most of the performance work on library support has gone into JSON and left XML pokey by comparison, but the fact that you have to read through the whole file to do a lookup is simply not relevant to most projects.
Joel stopped updating this blog (ETA: well, the practical advice part of it) around the time I realized it was full of bad advice. Maybe he realized that too.