__ Summary and Contributions__: Summary:
This paper considers the generalized Lasso for non-linear measurements of generative priors.
Contributions:
1. The authors show non-uniform and uniform recovery under different assumptions on the measurement function.
2. The analysis in this paper recovers known results in non-linear compressed sensing of sparse vectors and generative priors, and also has connections to Gaussian width arguments.
---Edit after rebuttal--
After reading the author feedback and discussing with the other reviewers, I agree that there is sufficient novelty to increase my score. As R4 also points out, the presentation of technical details can be improved, and section 4 should be reorganized.
One major limitation that remains is the lack of experiments in this paper, and I do not agree with the authors' claim that experiments are not important in this case. Given that this paper borrows the theoretical framework of [8,23,24,27], the lack of experiments in earlier work does not justify the lack of experiments in this paper. Specifically, the important question about whether using generative models is actually helpful for non-linear observations is still left unanswered. (the authors do get a better theoretical error bound of m^(-0.5) as opposed to m^(-0.25) as in [23,24], but it's not clear how much the additional assumptions influence the theoretical bound).
--------------------------------

__ Strengths__: + This work generalizes 1-bit compressed sensing to more general non-linear measurement functions.
+ To the best of my knowledge, although the analysis borrows from Bora et al and Liu et al, it is sufficiently novel.

__ Weaknesses__: - While the analysis is different from existing work on non-linear measurement functions, the insight gained compared to previous work is quite limited. The assumption that the set of interest is a subset of the unit sphere is a strong assumption, but is standard (and necessary) in traditional problems like 1-bit CS. The additional assumption that the measurement function applied to a sub-Gaussian is also a sub-Gaussian severely limits the set of measurement functions, and in my opinion is not a strong enough generalization of 1-bit CS. For example, the results in this paper cannot be used for phase retrieval because f(x) = |x|^2 will not satisfy the sub-Gaussian assumption.
- The connection to Gaussian width is interesting, but is in my opinion again a weak statement. For example, if one were to use Gaussian width arguments to derive results similar to Bora et al, then the number of measurements would grow quadratically with the radius r. This paper avoids this dependence by assuming that the feasible set lies within the unit ball/sphere.
- The claim that K-Lasso is more practical than the algorithm in Liu et al is a conjecture at best. The K-Lasso proposed here requires the feasible set to lie on the unit sphere, and it is not clear that a heuristic to achieve this is any better/worse than the heuristic in Liu et al.
- The statement of Corollary 1 is very confusing and requires some clarification on the quality of the claimed bound. The constant bar{mu} depends on z*, the corollary requires an additional assumption that mu G(z*)/ ||G(z*)|| lies in the unit sphere, and the sub-Gaussian constant psi also depends on the ground truth G(z*). It's not at all clear when any of these assumptions are satisfied, and how these constants depend on one another.
- No experimental evaluation.

__ Correctness__: - Lemma 1 as stated is incorrect. The Gaussian width claimed in (21) would correspond to the Gaussian width of K \cap B_2^n, and the error in the proof can be seen from eqn (100) to (101), where it is implicitly assumed that G(M) is contained in the unit ball.
If one were to not assume that K lies in the unit ball, then the Gaussian width would grow linearly with the radius r.

__ Clarity__: While all the statements are clearly written, the authors have included a variety of results that can be confusing without adding much novel content. For example, Section 3.1 to Section 4.5 are all minor variations of Theorem 1, and is confusing to read due to frequent context switching. Additionally, most of these are known results that can be rederived using Theorem 1, and hence do not add much to the paper.

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The paper theoretically proves that the solution to the generalized lasso problem with nonlinear observations (also called semi-parametric single index model) and generative prior can recover the truth under certain assumptions.

__ Strengths__: 1. The main contribution of the paper is theorem 1. It proves that the semi-parametric single index model with generative prior can be solved by a constrained least squared optimization by providing an error bound.
2. The paper also considers several variations of the results and compares the Lipschitz function settings with GMW, which is the settings used in previous work.
3. The paper provides a sufficient condition to the uniform recovery error bound.

__ Weaknesses__: 1. The results provided in this paper seem kind of incremental to me. The settings and final error bounds are similar to the ones in [23] and [24], as shown in table 1 in appendix A. The difference is just the assumption for signal set K. In [23][24], K satisfies GMW and in this paper, K is the range of a generative model. However, both settings imply that effective dimension is small, which is a key to the proof. Note the difference between the two settings is actually addressed by lemma 2 (this result is stronger than that the effective dimension is small since the input dimension of the G is small in nature), which is from [2] and proved by using a delta-net cover. It seems to me that the main result of this paper is a combination of the lemma in [2] and the results from [23][24].
2. The paper also gives the uniform recovery guarantee in section 5, but the result relies on assumption 1, which seems too strong and hard to test. I would expect the authors can give a sufficient condition that is neat and informative.
3. It would be better if the paper also provides some empirical studies.

__ Correctness__: I think the results in the paper make sense and are reasonable but I didn't get into the details of proofs in supplimentary.

__ Clarity__: The paper is generally well written. The statement of the problem and the theorems are clear. I'm actually neutral in judging the paper given its strength and weakness listed above. I might change my scores after seeing the authors' feedback and comments from other reviewers.
---- Update ----
I've read the authors' rebuttal and the comments from other reviewers, and I would like to keep my original rating on this paper.

__ Relation to Prior Work__: The paper discusses other related work in section 1.1, where authors compare the difference in the settings and results with previous related work. Table 1 in appendix A also clearly compares the difference among different papers.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The paper studies theoretical guarantees for recovering signals from nonlinear random measurements via the so-called K-LASSO. The paper assumes a generative signal model consisting of a Lipschitz function, e.g. a neural network, and bounded inputs. It gives precise bounds on the required number of measurements and shows robustness under adversarial noise. Moreover, the paper for the first time provides a so-called uniform recovery result in this context. These results extend previous contributions in several non-trivial ways.

__ Strengths__: The paper significantly extends the analysis of the K-LASSO from linear to nonlinear measurements. It makes only weak assumptions on the nonlinearity; for instance, it does not assume differentiability and, hence, includes quantized and 1-bit measurements. The K-LASSO is solved via gradient descent and hence consists in an efficient algorithm (accepting that convergence is only verified empirically so far). The proofs are non-trivial.

__ Weaknesses__: I only see minor weaknesses:
The paper does not contain any numerical evaluations. These would be useful to understand the performance in practice. In fact, there is no theoretical guarantee (yet) that gradient descent converges to the solution of the K-LASSO, so that numerical evidence in the context of nonlinear measurements would help.
I would expect that the authors expand a bit on this point beyond Remark 1.

__ Correctness__: Yes, the claims are correct as far as I can see.

__ Clarity__: The paper is very clear written and well-organized.

__ Relation to Prior Work__: The paper contains a very detailed description of the prior work and describes very clearly the advances with respect to prior contributions.

__ Reproducibility__: Yes

__ Additional Feedback__: A very minor comment about the definition of the Gaussian width in (20). If one defines it with respect to K-K, then the Gaussian width of a unit ball or unit sphere equals 2E ||g||_2 for a standard Gaussian vector (factor 2 missing). The factor 2 can be omitted if one defines it only using K (instead of K-K).
It would be good if the author could comment on the condition Lr = \Omega(\epsilon \psi n) in Theorem 1. I understand that it is just a technical condition, which is satisfied in most situations of interest, but the authors may want to say a bit more about this point.
Remark 2, line 202: The word "that" should be removed.
---------------------------------
Comment after reading author feedback and the other reviews:
I still think that this is an excellent contribution and I see no reason to change my evaluation.

__ Summary and Contributions__: This paper generalizes result on nonlinear distortion of compressive measurements to generative signal models, such as deep image priors. Results of this flavor already exist for sparse models.

__ Strengths__: The paper provide theoretical results to demonstrate that thee behavior we expect in sparse signal models is also true in generative models. This is comforting, and allows broader use of such models.

__ Weaknesses__: While the results are solid theoretically, they are not surprising. It seems like the paper provides a straightforward extension of existing results in thee references.
In addition, given the theoretical nature of the paper, I find it disappointing that most of the proofs (especially for Thm. 1) are in the appendix in the supplemental material. Since this is a theoretical only paper, important proofs should be included.

__ Correctness__: Overall the claims seem correct (although I did not check the math exhaustively)

__ Clarity__: The paper is well written.

__ Relation to Prior Work__: The paper clearly discusses prior work and places the contributions in context. However, I would prefer if table 1 from the supplementary material was included in the main paper, to provide the context of the contribution better.

__ Reproducibility__: Yes

__ Additional Feedback__: Overall a nice paper. I am satisfied with the reviewers responses and I recommend acceptance.