You also didn't see the point of complexity theory.
Continuind my earlier post:
I emphasize that this is an academic discussion that has no practical consequence except to show that theory has nothing to say on the robustness of SHA256.
Two theoretical fields were used in your post: the theory of complexity classes (P, NP and all that) and big-O analysis.  Neither can be applied to SHA256.
First, both theories only apply to problems where the size 
n of the input is variable and unbounded; and they consider what happens
 in the limit when n tends to infinity.  If 
n is bounded, no matter how big, all the definitions collapse and 
every problem is not only polynomial, but can be solved in O(1) time.  
In fact, if you cannot let 
n go to infinity, strictly speaking you cannot even apply the definitions; you get barred at the door.  
But suppose you generalize the problem in such a way that the input size 
n can go to infinity, and you can analyze its complexity as "polynomial" or "exponential" or "O(
n2)".   Still, these results will not say anything about the cost for any specific 
n, such as 
n = 256 bits.  
You cannot tell whether an algorithm runs in O(1) time or requires exponential time by looking at what it does forany finite set of inputs; and vice-versa.   
One can easily define two algorthms A0 and B0 such that A0 runs in O(1) time, B0 runs in exponential time, but A0 and B0 give the same output for every input with size 
n below 10
500.  One can easily define two functions F0 and G0, where F0 can be computed in O(1) time while G0 requires exponential time, but such that F0(x) = G0(x) for every input x whose size n is less than 10
500.  These are trivial exercises, that do not require any arguments or ideas more complicated than the definitions. 
In a previous post you told how replacing an O(
n^k) algorithm by an O(log 
n) one made a huge difference in running time.  I have a few such success stories myself. (Back in 1979, when working as a part-time programer for an egineering firm, I saved a large govt-funded project by substituting an O(1) table lookup for an O(
n) search in a COBOL program, with 
n about 1000, after even the COBOL experts from IBM Brazil had failed to make the program run fast enough.  I never again got so much praise in my life.  

)
However, justifying those successes in terms of asymptotic (big-O) analysis or P-vs-NP is wrong, because that kind of analysis does not actually let you conclude anything about the actual running time on a specific application.  It is like the explanation of the successful businessman in the old joke at the end of 
this post.
Your wrote:
The 
definition of polynomial time is precisely the time complexity class 
O(nk).
The relevance is that for NP complexity class, very small increases in 
n causes exponential (not polynomial) increases in computation cost.
If the best known inversions of SHA256 are NP, then NP is relevant because it means there is some smallish value of 
n for which no inversion is practical. Afaics, this is what complexity theory gains us.
The first sentence is correct, the other two are dead wrong. 
First, a formal nit:  the class NP contains P, and no one knows whether NP is equal to P or not. Therefore, proving that something is in NP does not prove that it is not P, and even proving that it is "NP-complete" (which is what people usually mean when they lazily say just "NP") does not prove that it is not P.   
Also another nit: there are many classes of asymptotic complexities between "polynomial" and "exponential", so "not polynomial" includes for example algorithms that run in O(
n log log log log n) time, which for some problems would be very interesting indeed.  
The terms "exponential" and "polynomial" can be applied to the 
increase in running time only when they can be applied to the running time itself; that is, if there is a problem size variable 
n that can grow without bounds.  And, even then, they describe how the cost increases when 
n tends to infinity.  I repeat: those terms are meaningless if the size 
n is bounded (and doubly so if 
n is fixed).
Strictly by the definitions, the "Bitcoin Mining Problem" (BMP), the partial inversion of SHA256, is O(1), because it does not matter what the algorithm outputs when 
n is not 256.  Hence, strictly by the definitions, BMP is in P, and therefore is in NP; but is not NP-complete.  And, indeed, it could be solved by a giant table.  Of course, that table is too big to be built in practice; but that obstacle is not relevant for the theory.
I am not aware of any theory that would give a useful estimate of the 
practical difficulty of solving some problem with fixed-size inputs, like the BMP.  At best, one can prove theoretically that a certain specific shortcut will not work; but there is no way to prove that there are 
no practical shortcuts.  The only "proof" that BMP is hard is that many people have tried to find such a practical shortcut, for many years, with no success.
It is a terrible misconception to think that "exponential cost" is bigger (or grows faster) than "polynomial cost".  That is true only for "sufficiently large 
n", but the theory does not say when 
n is "sufficiently large"; it could be true only for 
n greater than 10
500.  See the example algorithms A0 and B0 above, and the functions F0 and G0 above.  There is 
nothing in the theory that justifies saying that the breakpoint is "smallish".
You may object that examples like F0 and G0 are contrived and unreal.  But my statement remains correct even if the costs grow smoothly with 
n, according to some simple formulas.  Suppose that algorithm A1 finishes after 10
200×
n operations, algorithm B1 finishes after 10×
n50 operations, and algorithm C1 finishes after 10×1.00000000000001
n operations, all three formulas being exact (up to rounding) and valid for all inputs.  By the definitions, A1 runs in polynomial, O(
n) time; B1 is polynomial too, with O(
n50) time; nd C1 is exponential.  According to the "polynomial good, exponential bad" mantra, algorithm A1 should be better than B1, which should be better than C1.  However, plug 
n = 1000 in those formulas, and you get exactly the opposite order -- and only C1 will be fast enough to use.  
There are some very important real problems where the algorithms with best asymptotic complexity lose in practice to algorithms that in theory are much worse.   I recall that, perhaps 10 years ago, two students from India found a polynomial-time algorithm to decide whether a number is composite.  At the time, their algorithm ran in O(
n14) time, where 
n is the number of bits in the input.  It was a great result for the theory; but, for the sizes 
n that are relevant to cryptography, their algorithm was much slower than the super-polynmial algorithms that crypto people were using.  (It may have been improved since then, I haven't checked.)
The Closest Point Problem (CPP) is like this: given a fixed set 
S of 
n sites (points) in a square, and then a series of 
query points, find for each query 
q the site 
p in 
S that is nearest to 
q.  One solution is to scan the set 
S each time; the query cost then is proportional to 
n. A more efficient solution begins by preprocessing 
S into a quad-tree 
Q: the square is divided into 4 smaller squares, and some of these are divided in turn, recursively; until each square has at most 1 site.   Then one can find the nearest site to a given query q by sweeping down through this tree of nested squares.  Theory says that, once the quad-tree 
Qis built, each query can be processed in O(log 
n) expected time (with some reasonable assumptions on the distribution of 
S). 
The quad-tree method works well for points in a square, and the same idea can be used if the sites are scattered inside a cube, or a hipercube of any dimension 
d.  Theory says that the expected query cost remains O(log 
n), for any 
d. I have seen several computer scientists and students who, trusting that "O(log 
n) is much better than O(
n)", tried to use this  method for things like pattern matching, where each site is an observed pattern, represented by a point in some high-dimensional space.  Only to find that, for that application, the quad-tree method is often much slower than plain sequential search.
It turns out that the quad-tree begins to work only if the number of sites 
n is at least 2
d.  It works quite well for 
n = 1000 and 
d = 2 or 3; but, for  
d = 30, it is a total waste of time unless 
n is a lot greater than a billion.  Not "smallish" at all...
The big-O notation was invented by pure mathematicians, to describe the limiting behavior of functions from reals to reals, such as the solutions of differential equations.  It was adapted to algorithm analysis by Don Knuth, IIRC.  However, for Knuth it was only a working tool, that would give an idea of the type of formula that had to be determined.  For example, if the number 
t(
n) of inner block executions was found to be O(
n2), one could write 
t(
n) ≤ 
A n2 + 
B n + 
C, and then look for suitable values of 
A, 
B, and 
C.  If 
t(
n) was found to be exponential, one would look for a formula like 
t(
n) ≤ 
A × 
Bn. An so on.
However, finding those constants requires lots of boring algebra, so Knuth's lazy followers tacitly agreed that finding the big-O of the running time was good enough.  And then came the really lazy ones, who decided that "polynomial time" was all one needed to know.