And FWIW Python has the wealthest collection of ANN tutorials among programming languages, so yours one just get lost as a grain of sand in a dune.

]]>For the second question: at that point, I’m talking abstractly. There *exists* an error contour, and conceptually what we’re doing is trying to climb down the contour. It’s a way of understanding why using the derivatives works to help us improve the weights: the derivative works, because it points down the theoretical contour towards the minimum error. Technically, what we have is one point on the contour, and for the derivative, we’re looking at the curve produced by slicing the contour with a plane that intersects both the current error value and the minimum error. We don’t know the other points on the contour.

Could you please expand on this part a bit? “For a given error value z, there’s a countour of a curve for all of the bindings that produce that level of error.” Is this saying that the error will be computed for all training inputs at this point?

]]>mxnet is probably the best neural network library in R. It’s great, but can sometimes be a little more difficult to work with than keras.

]]>