Hacker news

Top
New
Past
Ask
Show
Jobs

Show HN: The Hessian of tall-skinny networks is easy to invert (https://github.com)

30 points by rahimiali 4 days ago | 23 comments | View on ycombinator

MontyCarloHall 4 days ago |

>If the Hessian-vector product is Hv for some fixed vector v, we're interested in solving Hx=v for x. The hope is to soon use this as a preconditioner to speed up stochastic gradient descent.

Silly question, but if you have some clever way to compute the inverse Hessian, why not go all the way and use it for Newton's method, rather than as a preconditioner for SGD?

Lerc 4 days ago |

I am not a mathematician, but I do enough weird stuff that I encounter things referring to Hessians, yet I don't really know what they are, because everyone who writes about them does so in terms that assumes the reader knows what they are.

Any hints? The Battenburg graphics of matrices?

holg 4 days ago |

Great work. Making the Hessian calculation linear in depth is a solid intermediate step. Thanks for sharing this; I look forward to seeing the final results as this research matures.

jeffjeffbear 4 days ago |

I haven't looked into it in years, but would the inverse of a block bi-diagonal matrix have some semiseperable structure? Maybe that would be good to look into?

petters 4 days ago |

Would be great to see this work continued with some training runs

Swoerd 4 days ago |

[dead]