Tidbits
Here I keep notes of small things I learned that do not fit into a larger overarching topic. Any time I learn something interesting that I want to remember, I put that here.
hydra ML Configs
hydra works closely with the package omegaconf for DictConfig objects. This allows you to easily import .yaml config files into a python config object and access variables either dictionary style or via object attributes.
One powerful thing I learned in the process about omegaconf is interpolation. It allows you to reference other variables across the entire config tree. For example, if I have paths defined in a group paths and then want to set a weights path in the group train using of those previously defined paths, I don't need to duplicate the path to the again, I can interpolate the strings over with ${foo.1} dot notation. By default interpolation is absolute, but relative is also possible e.g. ${..foo}. See this example:
paths:
base_dir: ${oc.env:PROJECT_ROOT,/lcrc/project/hydrosm/dma}
data_dir: ${base_dir}/data
output_dir: ${base_dir}/outputs
train:
# No need to duplicate paths here!
weights: ${paths.data_dir}/experiments/unet_wyhy12/model.pth
IP Addresses
The Internet Protocol address is like a postal address that uniquely identifies a device on a network. Hence every machine on a network has an IP address.
The most common format is IPv4 192.168.1.10 where each number is 0-255, but you also have IPv6 that can handle more devices 2001:0db8:85a3:0000:0000:8a2e:0370:7334.
Other notable things about IP:
- Public vs. Private IP - public IP is assigned by your internet provider and visible to out world. Private IP is used within a local network. Your router uses a technique known as Network Address Translation (NAT) to map your private IP to a single public IP when connecting to the internet.
- Domain Name System (DNS) - allows us to use domain names like
google.cominstead of142.250.190.78. DNS translates domain names into IP addresses (this happens implicitly every time you make a network request). 127.0.0.1orlocalhostpoints to your own computer and is useful for testing servers locally.
Ports
While IP addresses identifies machines, a port identifies a service or process on that machine.
- HTTP - port
80 - SSH - port
22
The IP 192.168.1.10:8080 specifies a custom service on port 8080.
Remote Procedural Call (RPC) Server
An RPC server simplifies communication between systems in the client/server model. It allows a client computer to request the execution of programs or servers on a remote server as if it was happening locally. The RPC server listens for requests from RPC clients, executes code upon request and sends the results back to the client. An analogy of MCP is that it is similar to RPC as a communication conduit between agents, but may not provide a solution to advanced agent-agent coordination.
Relational Database Management System (RDBMS)
An RDBMS is a software that manages relational databases. An interesting thought is that people thought of Salesforce in the past as an RDB wrapper in that to some degree it puts some interface and logic on top of relational databases which is obviously overly simplified. In the same way together GPT-wrappers or application layers of new AI models can actually provide significant value if done correctly.
Codecs
When talking about ffmpeg a term that comes up a lot is codecs. A codec is a software or algorithm that encodes raw audio/video into a compressed format for storage or transmission and decodes the compressed format back into the original form. In ffmpeg codecs are implemented as modules that handle these encoding and decoding operations. An example of a codec supported by ffmpeg is mp3! In the API codecs are abstracted as AVCodec (Audio Video Codec). Choosing the type of codec determines the compression efficiency, quality and encoding/decoding speed.
Here is a slice of the large list of available codecs you get by running ffmpeg --codecs:
DEAIL. mp3 MP3 (MPEG audio layer 3) (decoders: mp3float mp3 ) (encoders: libmp3lame libshine )
D.AIL. mp3adu ADU (Application Data Unit) MP3 (MPEG audio layer 3) (decoders: mp3adufloat mp3adu )
D.AIL. mp3on4 MP3onMP4 (decoders: mp3on4float mp3on4 )
D.AI.S mp4als MPEG-4 Audio Lossless Coding (ALS) (decoders: als )
..A.L. mpegh_3d_audio MPEG-H 3D Audio
D.AIL. musepack7 Musepack SV7 (decoders: mpc7 )
D.AIL. musepack8 Musepack SV8 (decoders: mpc8 )
DEAIL. nellymoser Nellymoser Asao
DEAIL. opus Opus (Opus Interactive Audio Codec) (decoders: opus libopus ) (encoders: opus libopus )
Strong vs. Weak Scaling
When it comes to scalability on HPCs, there are two important concepts with regards to how computational capacity scales.
Strong scaling is the speedup that can be achieved through increasing the processor resources while the problem size stays constant, and is bound by Amdahl's Law to the portion of code that is not parallelizable.
Weak scaling is the speedup that can be achieved by increasing both number of processors and increasing the problem size. The difference here is that by scaling the problem size, i.e. more processors with a constant workload per processor, we can solve larger problems on a large machine that we cannot achieve as fast on a smaller machine. The limit here is only processors and maximum problem size. An example of weak scaling is increasing the batch size (or problem size) for ML experiments.
Just as Amdahl's law governs strong scaling, Gustafson's Law governs weak scaling. The law assumes the parallel part of a problem scales linearly with resources while the serial part does not increase as problem size increases. The law claims that,
Where is proportion of time spend on serial execution, is proportion execution time on part that can be parallelized, and the number of processors. Think of it as the ratio of time to solve a larger problem (i.e. scaled by ) on a serial computation relative to having processors.
Hence, the law claims that scaled speedup increases linearly with respect to number of processors with no upper limit.
In practice, you are often not limited to a set task size. Thus, the sizes of problems scale with number of resources.
By nature if you have a data parallel problem where processors can work on its own set of data separately, this will scale weakly.
Measure strong scaling on HPC by testing how overall computational time of a job scales with resources (threads or MPI processes).
Measure weak scaling on HPC by testing computational time of a job while increasing both job size and resources. This would have resources on x axis and scaled speedup on y axis.
Scaling Variables
Speedup for strong scaling is time using one processor divided by time using N processors:
Where is the execution time on one processor, is the execution time on processors.
Efficiency is ideal time divided by measured time:
With weak scaling, the efficiency is the ratio of time to complete 1 unit of work with 1 processor relative to N work units with N processors.
Read more here.
The Bitter Lesson
The essay by Richard Sutton is famous for highlighting the tension between leveraging human knowledge and leveraging computation in the realm of AI. Search and learning, he argues, are what ultimately matters when computation is scaled enormously. Search can be thought of as brute force trial and error and learning can take the form of self play.
The bitter lesson is this: that building methods that mimic how we as humans think does not work in the long run. In the long term the breakthroughs will always come in the form of scaling search and learning. Human centric approaches thus always fall short of more general methods.
Human knowledge pales in comparison to Moore's law.