101 points by ismaeel_bashir 5 days ago | 27 comments | View on ycombinator
ray__ 5 days ago |
flounder3 5 days ago |
I'm curious about the granularity of contracts around granting/selling excess capacity. Are they short term? Can the owner evict those workloads (with a penalty)?
boringperson 5 days ago |
I wonder what is stopping datacenters from passing this benefit to customers by launching better tuned plans. For example, t series EC2 instances on AWS.
FattiMei 4 days ago |
Can you give an example of typical execution on the cluster? Is it a problem of number of hours allocated or number of compute cores?
If I'm running a PDE simulation, and I allocate n machines I want to use all of them, so there is no risk of idle machines. It's not trivial to estimate a priori the amount of time required for my simulation to complete, so I overestimate. But when the simulation is complete (even before the deadline), the resources get freed and can be used right away for another job
Maybe the problem is when many users are greedy. Also MPI simulations are difficult (if not impossible, correct me) to change dynamically: when a simulation is started with that number of ranks, I can't add new ranks at will if the resources are available
Thank you for the patience for everyone that answers
iroddis 5 days ago |
Do you do any tracking of resource consumption over the runtime of a job? We have many jobs that use the requested memory only for a portion of the runtime, and are otherwise compute bound. It would be nice to be able to learn the profiles through time of jobs and layer them to get better resource utilization.
rjpruitt16 5 days ago |
https://www.linkedin.com/posts/rahmi-pruitt-a1bb4a127_agentn...
mike_d 5 days ago |
Any competent enterprise risk team is going to give a hard no to a SaaS application being in the critical path for on-prem business critical workloads. So there goes Fortune 100 too.
If you are successful and better schedule workloads you are just deferring upgrades and expansions. The customers Dell/HPE/etc. sales rep is going to freak out, some vice presidents are going to go golfing together, and all the remaining high value customers don't renew.
What you are really left with is the "small and medium business" clusters that are purpose specific. They are running 100% on a handful of tasks that can probably be hand tuned.
This sounds like really cool technology, I just don't see the business. Hopefully you'll consider open sourcing it soon.
syngrog66 5 days ago |
mike_d 5 days ago |
undefined 3 days ago |
joemorrison607 4 days ago |
Shaurya_Sharma 5 days ago |
zeckalpha 4 days ago |
keynha 5 days ago |
jalospinoso 5 days ago |
lowellniles 5 days ago |
Ozzie-D 5 days ago |
Presumably the underlying model here is also an LLM? To what degree is it "fine-tuned", or is it just given a set of tools to build a good picture of cluster usage?